DEV Community

Alex Spinov
Alex Spinov

Posted on

DuckDB Has a Free API: The SQLite for Analytics

Why DuckDB

DuckDB is an in-process analytical database — think SQLite but for OLAP. Query Parquet files, CSV, JSON, even pandas DataFrames with SQL. No server, no setup, blazing fast.

Install

pip install duckdb
# or
brew install duckdb
Enter fullscreen mode Exit fullscreen mode

Query Files Directly

-- Query Parquet files
SELECT * FROM read_parquet('data/*.parquet') WHERE year = 2024;

-- Query CSV
SELECT category, SUM(revenue) as total FROM read_csv('sales.csv') GROUP BY category;

-- Query JSON
SELECT * FROM read_json('events.jsonl') WHERE event_type = 'purchase';

-- Query remote files
SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');
Enter fullscreen mode Exit fullscreen mode

Python Integration

import duckdb
import pandas as pd

# Query a DataFrame
df = pd.read_csv('sales.csv')
result = duckdb.sql("SELECT category, SUM(amount) FROM df GROUP BY category").df()

# Query Parquet files
result = duckdb.sql("""
    SELECT date_trunc('month', created_at) as month,
           COUNT(*) as orders,
           SUM(total) as revenue
    FROM read_parquet('orders/*.parquet')
    GROUP BY 1
    ORDER BY 1
""").df()

# Persistent database
con = duckdb.connect('my_analytics.db')
con.sql("CREATE TABLE sales AS SELECT * FROM read_csv('sales.csv')")
con.sql("SELECT * FROM sales WHERE revenue > 1000").show()
Enter fullscreen mode Exit fullscreen mode

CLI

duckdb my_analytics.db
D SELECT COUNT(*) FROM read_parquet('logs/*.parquet');
D COPY (SELECT * FROM sales WHERE year=2024) TO 'export.csv';
Enter fullscreen mode Exit fullscreen mode

Key Features

  • In-process — no server, embedded like SQLite
  • Columnar — optimized for analytics (OLAP)
  • Direct file queries — Parquet, CSV, JSON, Arrow
  • Python/R/Node — native bindings
  • S3/GCS/Azure — query cloud storage directly
  • Window functions — full analytical SQL support
  • MIT license — truly free

DuckDB vs SQLite

Feature DuckDB SQLite
Workload Analytics (OLAP) Transactions (OLTP)
Storage Columnar Row-based
Aggregations Very fast Slow on large data
File queries Parquet, CSV, JSON No
Concurrent writes Limited WAL mode

Resources


Need to extract analytics data, process large datasets, or query cloud storage? Check out my Apify tools or email spinov001@gmail.com for custom solutions.

Top comments (0)