DuckDB is an in-process OLAP database. Think SQLite for analytics — no server, just import and query CSV, Parquet, JSON directly.
What Is DuckDB?
DuckDB runs inside your application process and uses columnar storage with vectorized execution.
Features:
- Zero-dependency, in-process
- Columnar vectorized engine
- Query Parquet, CSV, JSON directly
- PostgreSQL wire protocol
- Free and open source (MIT)
Python Example
import duckdb
result = duckdb.sql("SELECT * FROM read_csv(data.csv) WHERE amount > 100")
print(result.fetchdf())
duckdb.sql("SELECT year, SUM(sales) FROM read_parquet(s3://bucket/*.parquet) GROUP BY year")
Use Cases
- Data analysis — SQL on local files
- Jupyter notebooks — fast analytics
- Embedded analytics — inside your app
- ETL — transform with SQL
- Local BI — query without servers
Need web data at scale? Check out my scraping tools on Apify or email spinov001@gmail.com for custom solutions.
Top comments (0)