DuckDB is an in-process analytical database. Think SQLite but for OLAP — query CSV, Parquet, JSON files directly with SQL.
What You Get for Free
- In-process — no server needed
- Columnar engine — fast analytics
- Query files directly — CSV, Parquet, JSON, Excel
- Standard SQL — with extensions
- Python/R/Node.js — native bindings
- Zero dependencies — single file
- Parallel — multi-threaded queries
Quick Start (Python)
import duckdb
# Query a CSV file directly
result = duckdb.sql("SELECT * FROM 'data.csv' WHERE amount > 100")
# Query Parquet
result = duckdb.sql("SELECT count(*), avg(price) FROM 'sales/*.parquet'")
# Query a pandas DataFrame
import pandas as pd
df = pd.read_csv('data.csv')
result = duckdb.sql("SELECT category, sum(amount) FROM df GROUP BY 1")
DuckDB vs SQLite
| Feature | DuckDB | SQLite |
|---|---|---|
| Workload | Analytics (OLAP) | Transactions (OLTP) |
| Storage | Columnar | Row-based |
| CSV/Parquet | Direct query | Import needed |
| Parallel | Yes | Limited |
Need analytics? Check my work on GitHub or email spinov001@gmail.com for consulting.
Top comments (0)