DuckDB Has a Free Analytical Database — Run SQL on CSV, Parquet, and JSON Without a Server

#analytics #data #database #sql

DuckDB is an in-process analytical database — run complex SQL queries on files (CSV, Parquet, JSON) without any server.

What You Get for Free

No server — embedded database, runs in your process
File queries — SQL directly on CSV, Parquet, JSON, Excel files
Blazing fast — columnar engine optimized for analytics
Python/R/Node — native bindings for data science languages
Standard SQL — window functions, CTEs, subqueries, joins
Arrow integration — zero-copy data exchange with Pandas/Polars
Extensions — PostgreSQL scanner, HTTP, spatial, and more
WASM — runs in the browser via WebAssembly

Quick Start

brew install duckdb
duckdb

-- Query CSV directly (no import!)
SELECT country, COUNT(*) as users, AVG(age) as avg_age
FROM 'users.csv'
GROUP BY country
ORDER BY users DESC;

-- Query Parquet from S3
SELECT * FROM 's3://my-bucket/events/*.parquet'
WHERE event_date > '2026-01-01';

-- Query JSON
SELECT json_extract(data, '$.name') as name
FROM read_json_auto('data.json');

Why Developers Switch from Pandas

Pandas loads everything into memory and has its own API:

SQL — query with SQL, not method chains
Larger-than-RAM — streaming execution for big files
Faster — 10-100x faster than Pandas for aggregations
No import step — query files directly

A data scientist's Pandas notebook took 15 minutes to load and process a 5GB CSV. After DuckDB: same query, 30 seconds, without loading the entire file into memory.

Need Custom Data Solutions?

I build production-grade scrapers and data pipelines for startups, agencies, and research teams.

Browse 88+ ready-made scrapers on Apify → — Reddit, HN, LinkedIn, Google, Amazon, and more.

Custom project? Email me: spinov001@gmail.com — fast turnaround, fair pricing.

DEV Community