I just finished curating 150+ data engineering tools and here's the uncomfortable truth:
You don't need 150 tools. You need 7.
Here's the stack I'd pick if I were starting a data team from scratch in 2026.
The 7-Tool Data Stack
1. Ingestion: dlt (data load tool)
Forget Airbyte's complexity. Forget Fivetran's pricing.
dlt is a Python library that loads data from any source to any destination in ~10 lines of code:
import dlt
pipeline = dlt.pipeline(
pipeline_name="github_issues",
destination="duckdb",
dataset_name="github_data"
)
source = dlt.source(...)
pipeline.run(source)
It handles schema evolution, incremental loading, and data contracts. No infra to manage.
2. Storage: DuckDB (local) + ClickHouse (production)
DuckDB for development. In-process OLAP that runs anywhere — your laptop, CI/CD, Lambda. Absurdly fast on files up to ~100GB.
ClickHouse for production. Petabyte-scale analytics with sub-second queries.
The key insight: use DuckDB for everything until you can't. Most teams switch too early.
3. Transformation: dbt
dbt won the transformation layer. Write SQL models, test them, document them, version them.
Alternative: If dbt's Jinja templating annoys you, check out SQLMesh — dbt with better DX.
4. Orchestration: Dagster
Airflow is showing its age. Dagster is what Airflow should have been:
- Asset-based (think about data, not tasks)
- Type-safe (catch errors before runtime)
- Great local dev (run everything on your laptop)
- Built-in observability (data lineage without plugins)
If you're already on Airflow and it works — don't migrate. But for new projects, Dagster.
5. Quality: Great Expectations + Soda
Great Expectations for data validation. Soda for monitoring.
Run quality checks after every pipeline run. No exceptions.
6. Streaming: Redpanda + Bytewax
Redpanda is Kafka without the pain. Kafka-compatible, no JVM, no ZooKeeper, 10x easier to operate.
Bytewax for Python stream processing — Flink for people who don't want to learn Java.
7. Visualization: Evidence
Evidence is BI-as-code. Write SQL in Markdown files, get beautiful dashboards. Version-controlled. Reviewable. No Tableau licenses.
The Full Picture
Sources -> dlt -> DuckDB/ClickHouse -> dbt -> Evidence
| |
Dagster (orchestration) Great Expectations
|
Redpanda + Bytewax (real-time)
Total cost for a startup: $0 (all open-source) to ~$500/mo (managed ClickHouse + Dagster Cloud).
The Full 150+ Tool List
If you want ALL the options (not just my picks):
Awesome Data Engineering 2026 — 150+ tools across 18 categories
Star it if it's useful. I update it weekly.
What's YOUR stack? Drop your tools in the comments — I'm curious what data teams are actually using in 2026.
I write about data engineering, web scraping, and developer tools. Follow for weekly deep dives.
Building a data pipeline? I can help: spinov001-art.github.io | Spinov001@gmail.com
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*
Top comments (0)