DuckDB crossed 25,000 GitHub stars this month. For a database, that's insane. PostgreSQL has 17K. SQLite doesn't even have a GitHub repo.
Why is an in-process analytical database this popular?
Because it solves a problem every data person has: analyzing data locally without setting up a server.
What DuckDB Does
DuckDB is an in-process OLAP database. Think SQLite, but for analytics instead of transactions.
import duckdb
# Query a CSV file directly — no import needed
result = duckdb.sql("""
SELECT category, SUM(revenue) as total
FROM 'sales_data.csv'
GROUP BY category
ORDER BY total DESC
LIMIT 10
""")
print(result)
That's it. No server. No Docker. No connection strings. Just pip install duckdb and query files.
Why Developers Love It
1. Query Any File Format
-- CSV
SELECT * FROM 'data.csv';
-- Parquet
SELECT * FROM 'data.parquet';
-- JSON
SELECT * FROM 'data.json';
-- Even remote files
SELECT * FROM 'https://example.com/data.csv';
2. Faster Than Pandas
On a 10M row dataset:
- Pandas groupby: 2.8 seconds
- DuckDB SQL: 0.15 seconds
DuckDB uses vectorized execution and automatic parallelism. It's not even close.
3. Works Everywhere
- Python (pip install duckdb)
- Node.js, R, Java, Rust, Go
- CLI tool
- WebAssembly (runs in the browser!)
- Jupyter notebooks
4. Zero Dependencies
No server process. No config files. No Docker. The entire database is a single file (or in-memory).
Real-World Use Cases
- Data exploration — Query CSVs/Parquets without loading into Pandas
- ETL development — Test transformations locally before deploying
- Jupyter notebooks — SQL in notebooks without external databases
- CI/CD testing — Run SQL tests without a database server
- Edge computing — Embedded analytics in apps
The DuckDB + Polars + dbt Stack
The modern lightweight data stack:
Raw files (CSV/Parquet)
-> DuckDB (query + transform)
-> dbt (model + test)
-> Evidence (visualize)
Total infrastructure: zero servers. Total cost: $0.
Getting Started
pip install duckdb
import duckdb
# Create a table from a CSV
duckdb.sql("CREATE TABLE sales AS SELECT * FROM 'sales.csv'")
# Query it
duckdb.sql("SELECT * FROM sales WHERE revenue > 1000 LIMIT 5").show()
Learn More
- Awesome Data Engineering 2026 — 150+ data tools
- Python Web Scraper Template — Scrape data, analyze with DuckDB
Are you using DuckDB? What replaced — Pandas, SQLite, or a real database? Comments below.
I write about data engineering tools. Follow for weekly deep dives.
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
Top comments (0)