Skip to content

DEV Community

Alex Spinov

Posted on Mar 27

DuckDB Has a Free In-Process Analytics Engine — Run SQL on CSV Parquet and JSON Without a Server

#database #python #analytics #dataengineering

DuckDB Runs SQL on CSV and Parquet Without a Server

You have a 5GB CSV file. Pandas loads it all into memory and crashes. DuckDB queries it with SQL — streaming, fast, using barely any RAM.

What Makes DuckDB Special

In-process — runs inside your Python/Node/R script
No server — zero setup, zero dependencies
Columnar engine — vectorized execution for fast analytics
Direct file queries — SQL on CSV, Parquet, JSON, Excel
PostgreSQL compatible — familiar SQL dialect
Extensions — httpfs, spatial, iceberg, delta

Quick Start

import duckdb

result = duckdb.sql("""
  SELECT city, COUNT(*) as orders, SUM(amount) as revenue
  FROM 'orders.csv'
  GROUP BY city
  ORDER BY revenue DESC
  LIMIT 10
""").fetchdf()

# Query Parquet on S3
duckdb.sql("SELECT * FROM read_parquet('s3://bucket/data/*.parquet')")

# Query JSON
duckdb.sql("SELECT * FROM read_json_auto('events.json')")

DuckDB vs Pandas

Task	DuckDB	Pandas
5GB CSV aggregation	3 sec	OOM crash
Memory usage	Streaming	Full load
Syntax	SQL	Method chains
Joins	Fast hash	Slow merge

📧 spinov001@gmail.com — Data engineering consulting

Follow for more data tool reviews.

Top comments (0)

Subscribe