DEV Community

Alex Spinov
Alex Spinov

Posted on

DuckDB Has a Free Analytical Database — Run SQL on CSV, Parquet, and JSON Without a Server

DuckDB is an in-process analytical database — run complex SQL queries on files (CSV, Parquet, JSON) without any server.

What You Get for Free

  • No server — embedded database, runs in your process
  • File queries — SQL directly on CSV, Parquet, JSON, Excel files
  • Blazing fast — columnar engine optimized for analytics
  • Python/R/Node — native bindings for data science languages
  • Standard SQL — window functions, CTEs, subqueries, joins
  • Arrow integration — zero-copy data exchange with Pandas/Polars
  • Extensions — PostgreSQL scanner, HTTP, spatial, and more
  • WASM — runs in the browser via WebAssembly

Quick Start

brew install duckdb
duckdb
Enter fullscreen mode Exit fullscreen mode
-- Query CSV directly (no import!)
SELECT country, COUNT(*) as users, AVG(age) as avg_age
FROM 'users.csv'
GROUP BY country
ORDER BY users DESC;

-- Query Parquet from S3
SELECT * FROM 's3://my-bucket/events/*.parquet'
WHERE event_date > '2026-01-01';

-- Query JSON
SELECT json_extract(data, '$.name') as name
FROM read_json_auto('data.json');
Enter fullscreen mode Exit fullscreen mode

Why Developers Switch from Pandas

Pandas loads everything into memory and has its own API:

  • SQL — query with SQL, not method chains
  • Larger-than-RAM — streaming execution for big files
  • Faster — 10-100x faster than Pandas for aggregations
  • No import step — query files directly

A data scientist's Pandas notebook took 15 minutes to load and process a 5GB CSV. After DuckDB: same query, 30 seconds, without loading the entire file into memory.

Need Custom Data Solutions?

I build production-grade scrapers and data pipelines for startups, agencies, and research teams.

Browse 88+ ready-made scrapers on Apify → — Reddit, HN, LinkedIn, Google, Amazon, and more.

Custom project? Email me: spinov001@gmail.com — fast turnaround, fair pricing.

Top comments (0)