Alex Spinov

Posted on Mar 25

DuckDB Just Hit 25K Stars — And It Deserves Every One

#database #dataengineering #python #opensource

DuckDB crossed 25,000 GitHub stars this month. For a database, that's insane. PostgreSQL has 17K. SQLite doesn't even have a GitHub repo.

Why is an in-process analytical database this popular?

Because it solves a problem every data person has: analyzing data locally without setting up a server.

What DuckDB Does

DuckDB is an in-process OLAP database. Think SQLite, but for analytics instead of transactions.

import duckdb

# Query a CSV file directly — no import needed
result = duckdb.sql("""
    SELECT category, SUM(revenue) as total
    FROM 'sales_data.csv'
    GROUP BY category
    ORDER BY total DESC
    LIMIT 10
""")
print(result)

That's it. No server. No Docker. No connection strings. Just pip install duckdb and query files.

Why Developers Love It

1. Query Any File Format

-- CSV
SELECT * FROM 'data.csv';

-- Parquet
SELECT * FROM 'data.parquet';

-- JSON
SELECT * FROM 'data.json';

-- Even remote files
SELECT * FROM 'https://example.com/data.csv';

2. Faster Than Pandas

On a 10M row dataset:

Pandas groupby: 2.8 seconds
DuckDB SQL: 0.15 seconds

DuckDB uses vectorized execution and automatic parallelism. It's not even close.

3. Works Everywhere

Python (pip install duckdb)
Node.js, R, Java, Rust, Go
CLI tool
WebAssembly (runs in the browser!)
Jupyter notebooks

4. Zero Dependencies

No server process. No config files. No Docker. The entire database is a single file (or in-memory).

Real-World Use Cases

Data exploration — Query CSVs/Parquets without loading into Pandas
ETL development — Test transformations locally before deploying
Jupyter notebooks — SQL in notebooks without external databases
CI/CD testing — Run SQL tests without a database server
Edge computing — Embedded analytics in apps

The DuckDB + Polars + dbt Stack

The modern lightweight data stack:

Raw files (CSV/Parquet) 
  -> DuckDB (query + transform)
  -> dbt (model + test)
  -> Evidence (visualize)

Total infrastructure: zero servers. Total cost: $0.

Getting Started

pip install duckdb

import duckdb

# Create a table from a CSV
duckdb.sql("CREATE TABLE sales AS SELECT * FROM 'sales.csv'")

# Query it
duckdb.sql("SELECT * FROM sales WHERE revenue > 1000 LIMIT 5").show()

Learn More

Awesome Data Engineering 2026 — 150+ data tools
Python Web Scraper Template — Scrape data, analyze with DuckDB

Are you using DuckDB? What replaced — Pandas, SQLite, or a real database? Comments below.

I write about data engineering tools. Follow for weekly deep dives.

DEV Community