DEV Community

Alex Spinov
Alex Spinov

Posted on

DuckDB Just Hit 25K Stars — And It Deserves Every One

DuckDB crossed 25,000 GitHub stars this month. For a database, that's insane. PostgreSQL has 17K. SQLite doesn't even have a GitHub repo.

Why is an in-process analytical database this popular?

Because it solves a problem every data person has: analyzing data locally without setting up a server.


What DuckDB Does

DuckDB is an in-process OLAP database. Think SQLite, but for analytics instead of transactions.

import duckdb

# Query a CSV file directly — no import needed
result = duckdb.sql("""
    SELECT category, SUM(revenue) as total
    FROM 'sales_data.csv'
    GROUP BY category
    ORDER BY total DESC
    LIMIT 10
""")
print(result)
Enter fullscreen mode Exit fullscreen mode

That's it. No server. No Docker. No connection strings. Just pip install duckdb and query files.

Why Developers Love It

1. Query Any File Format

-- CSV
SELECT * FROM 'data.csv';

-- Parquet
SELECT * FROM 'data.parquet';

-- JSON
SELECT * FROM 'data.json';

-- Even remote files
SELECT * FROM 'https://example.com/data.csv';
Enter fullscreen mode Exit fullscreen mode

2. Faster Than Pandas

On a 10M row dataset:

  • Pandas groupby: 2.8 seconds
  • DuckDB SQL: 0.15 seconds

DuckDB uses vectorized execution and automatic parallelism. It's not even close.

3. Works Everywhere

  • Python (pip install duckdb)
  • Node.js, R, Java, Rust, Go
  • CLI tool
  • WebAssembly (runs in the browser!)
  • Jupyter notebooks

4. Zero Dependencies

No server process. No config files. No Docker. The entire database is a single file (or in-memory).

Real-World Use Cases

  1. Data exploration — Query CSVs/Parquets without loading into Pandas
  2. ETL development — Test transformations locally before deploying
  3. Jupyter notebooks — SQL in notebooks without external databases
  4. CI/CD testing — Run SQL tests without a database server
  5. Edge computing — Embedded analytics in apps

The DuckDB + Polars + dbt Stack

The modern lightweight data stack:

Raw files (CSV/Parquet) 
  -> DuckDB (query + transform)
  -> dbt (model + test)
  -> Evidence (visualize)
Enter fullscreen mode Exit fullscreen mode

Total infrastructure: zero servers. Total cost: $0.


Getting Started

pip install duckdb
Enter fullscreen mode Exit fullscreen mode
import duckdb

# Create a table from a CSV
duckdb.sql("CREATE TABLE sales AS SELECT * FROM 'sales.csv'")

# Query it
duckdb.sql("SELECT * FROM sales WHERE revenue > 1000 LIMIT 5").show()
Enter fullscreen mode Exit fullscreen mode

Learn More


Are you using DuckDB? What replaced — Pandas, SQLite, or a real database? Comments below.

I write about data engineering tools. Follow for weekly deep dives.


More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs

Top comments (0)