Why DuckDB
DuckDB is an in-process analytical database — think SQLite but for OLAP. Query Parquet files, CSV, JSON, even pandas DataFrames with SQL. No server, no setup, blazing fast.
Install
pip install duckdb
# or
brew install duckdb
Query Files Directly
-- Query Parquet files
SELECT * FROM read_parquet('data/*.parquet') WHERE year = 2024;
-- Query CSV
SELECT category, SUM(revenue) as total FROM read_csv('sales.csv') GROUP BY category;
-- Query JSON
SELECT * FROM read_json('events.jsonl') WHERE event_type = 'purchase';
-- Query remote files
SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');
Python Integration
import duckdb
import pandas as pd
# Query a DataFrame
df = pd.read_csv('sales.csv')
result = duckdb.sql("SELECT category, SUM(amount) FROM df GROUP BY category").df()
# Query Parquet files
result = duckdb.sql("""
SELECT date_trunc('month', created_at) as month,
COUNT(*) as orders,
SUM(total) as revenue
FROM read_parquet('orders/*.parquet')
GROUP BY 1
ORDER BY 1
""").df()
# Persistent database
con = duckdb.connect('my_analytics.db')
con.sql("CREATE TABLE sales AS SELECT * FROM read_csv('sales.csv')")
con.sql("SELECT * FROM sales WHERE revenue > 1000").show()
CLI
duckdb my_analytics.db
D SELECT COUNT(*) FROM read_parquet('logs/*.parquet');
D COPY (SELECT * FROM sales WHERE year=2024) TO 'export.csv';
Key Features
- In-process — no server, embedded like SQLite
- Columnar — optimized for analytics (OLAP)
- Direct file queries — Parquet, CSV, JSON, Arrow
- Python/R/Node — native bindings
- S3/GCS/Azure — query cloud storage directly
- Window functions — full analytical SQL support
- MIT license — truly free
DuckDB vs SQLite
| Feature | DuckDB | SQLite |
|---|---|---|
| Workload | Analytics (OLAP) | Transactions (OLTP) |
| Storage | Columnar | Row-based |
| Aggregations | Very fast | Slow on large data |
| File queries | Parquet, CSV, JSON | No |
| Concurrent writes | Limited | WAL mode |
Resources
Need to extract analytics data, process large datasets, or query cloud storage? Check out my Apify tools or email spinov001@gmail.com for custom solutions.
Top comments (0)