DEV Community

Alex Spinov
Alex Spinov

Posted on

DuckDB Has a Free In-Process Analytics Engine — Run SQL on CSV Parquet and JSON Without a Server

DuckDB Runs SQL on CSV and Parquet Without a Server

You have a 5GB CSV file. Pandas loads it all into memory and crashes. DuckDB queries it with SQL — streaming, fast, using barely any RAM.

What Makes DuckDB Special

  • In-process — runs inside your Python/Node/R script
  • No server — zero setup, zero dependencies
  • Columnar engine — vectorized execution for fast analytics
  • Direct file queries — SQL on CSV, Parquet, JSON, Excel
  • PostgreSQL compatible — familiar SQL dialect
  • Extensions — httpfs, spatial, iceberg, delta

Quick Start

import duckdb

result = duckdb.sql("""
  SELECT city, COUNT(*) as orders, SUM(amount) as revenue
  FROM 'orders.csv'
  GROUP BY city
  ORDER BY revenue DESC
  LIMIT 10
""").fetchdf()

# Query Parquet on S3
duckdb.sql("SELECT * FROM read_parquet('s3://bucket/data/*.parquet')")

# Query JSON
duckdb.sql("SELECT * FROM read_json_auto('events.json')")
Enter fullscreen mode Exit fullscreen mode

DuckDB vs Pandas

Task DuckDB Pandas
5GB CSV aggregation 3 sec OOM crash
Memory usage Streaming Full load
Syntax SQL Method chains
Joins Fast hash Slow merge

📧 spinov001@gmail.com — Data engineering consulting

Follow for more data tool reviews.

Top comments (0)