apache druid vs pinot vs clickhouse is the comparison every data platform team eventually argues through — usually at 11pm on a Tuesday after a dashboard latency SLO breached and a streaming pipeline began coughing. The three projects converged on the same workload (sub-second analytics over append-heavy, billion-row, time-stamped event streams) from three different starting points: Druid grew out of Metamarkets ad-tech in 2011, Pinot out of LinkedIn's user-facing dashboards in 2014, and ClickHouse out of Yandex's metrics platform in 2016. They look alike on a benchmark page and behave very differently in production.
This guide is the comparison you wished existed the first time a "just pick the fastest OLAP engine" Slack message landed in your DMs. It walks the real-time OLAP problem, the architectural philosophies of each engine, ingestion personalities, query-pattern fit, and the cheat sheet of recipes that ship in every senior data-engineering interview answer. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.
When you want hands-on reps immediately after reading, drill the real-time analytics practice library →, rehearse the streaming problems →, and stack the aggregation muscles with aggregation drills →.
On this page
- The real-time OLAP problem
- The real-time OLAP landscape — where each engine sits
- Architecture trio — under the hood
- Ingestion models — streaming vs batch in each engine
- Query patterns and engine fit
- Cheat sheet — Druid vs Pinot vs ClickHouse recipes
- Frequently asked questions
- Practice on PipeCode
1. The real-time OLAP problem
Sub-second analytics over billion-row streams is a different problem than warehouse OLAP — and that gap is what spawned three different open-source engines
The one-sentence invariant: real-time OLAP means returning aggregated answers in well under a second over data that is both very large and very fresh — usually time-stamped, append-heavy, and arriving from Kafka or a similar stream. Once you accept that constraint, neither a row-oriented OLTP database (Postgres / MySQL) nor a batch warehouse (Snowflake / BigQuery) can serve the workload — and the gap between them is what Druid, Pinot, and ClickHouse fill.
The three forces that define the workload.
- Streaming ingest. Events arrive continuously, often at hundreds of thousands of rows per second per topic. The engine must accept those rows and immediately make them queryable, often within seconds. Batch warehouses load on a schedule (minutes to hours); OLTP databases choke on the write rate.
- Time-bucketed reads. Almost every query carries a time predicate: "last 5 minutes," "yesterday by hour," "this week." The engine that wins is the one that partitions and indexes on time first, then on dimensions. Druid bakes this into the storage model; Pinot and ClickHouse expose it via partition keys.
- Multi-dimensional filters. Dashboards do not ask "give me everything" — they ask "give me revenue by region by product by minute for the last hour." The engine must filter on multiple high-cardinality columns simultaneously, then aggregate. Bitmap indexes, inverted indexes, and star-tree pre-aggregation are how each engine solves this.
Why row-oriented OLTP cannot serve this.
-
Storage layout. Row stores write every column together. A
SELECT SUM(amount)scans every row's every column from disk. A 10-column table that needs only one column wastes 90% of the I/O on irrelevant data. - Indexes are point-lookup-shaped. B-tree indexes are optimised for "find a single row by key," not "scan a billion rows by a filter."
- Write amplification. OLTP databases use redo logs, checkpoints, and MVCC tombstones — features that protect transactional correctness at the cost of throughput. For pure-append event streams, those features are pure overhead.
Why batch warehouses cannot hit sub-second.
- Compute is spun up per query. Snowflake suspends warehouses to save money; BigQuery allocates slots from a shared pool. Cold-start cost is in the seconds. Real-time OLAP needs warm, always-on workers.
- Storage is decoupled but cold. Warehouse data lives in object storage with high read latency on the first byte. A "scan 100 MB" query takes seconds — fine for ad-hoc analytics, fatal for a dashboard refresh.
- Ingest is batch-first. A streaming insert lands in a write buffer and waits for the next micro-batch. Freshness is measured in minutes, not seconds.
Why three different engines, not one.
- Druid (2011, Metamarkets / now Apache). Born to power ad-tech dashboards. Optimised for time-series workloads with heavy pre-aggregation ("rollup"). Segments are immutable time-bucketed files written to deep storage (S3 / HDFS).
- Pinot (2014, LinkedIn / now Apache). Born to power user-facing analytics ("People you may know" dashboards). Optimised for ultra-low-latency lookups via per-segment indexes (star-tree, bitmap, range). Helix-coordinated.
- ClickHouse (2016, Yandex / now ClickHouse Inc). Born to power Yandex.Metrica (Russian web-analytics platform like Google Analytics). Optimised as a general-purpose columnar SQL engine with a flexible MergeTree storage family. Coordinator-less peer servers.
Who actually runs these in production.
- Druid powers analytics at Netflix, Confluent, Lyft, Airbnb, Walmart, Twitch, and Cisco — wherever the workload is time-series with heavy filtering and pre-computed rollups.
- Pinot powers user-facing analytics at LinkedIn, Uber, Stripe, Slack, and Razorpay — wherever the latency budget is in the tens of milliseconds.
- ClickHouse powers analytics at Cloudflare, Uber, eBay, Bloomberg, and Sentry — wherever the team wants real-time OLAP without a separate ingestion service and with the freedom to write arbitrary SQL.
What interviewers listen for.
- Do you say "real-time OLAP fills the gap between OLTP and batch warehouses" when asked why these engines exist? — senior signal.
- Do you mention pre-aggregation as the unifying trick (rollup, star-tree, AggregatingMergeTree)? — required answer.
- Do you recognise that Druid decouples storage, Pinot decouples coordination, and ClickHouse fuses both? — senior signal.
- Do you say "I'd pick the engine by query shape, not by benchmark" when asked which is fastest? — senior signal.
Worked example — why a row store cannot serve a "by-minute" dashboard
Detailed explanation. A product team wants a live dashboard of revenue by minute over the last 24 hours, refreshed every 5 seconds, against a 2-billion-row orders table. The obvious answer is "just query Postgres" — and the obvious answer fails immediately because of how row stores read disk.
Question. Estimate the I/O cost of a row-store scan for the last 24 hours of orders versus a columnar engine reading only the amount and minute columns. Assume 100 bytes per row, 2 billion rows, and 5% in the time window.
Input.
| Engine | Storage layout | Bytes per row read | Compression |
|---|---|---|---|
| Postgres (row store) | row-oriented heap | 100 bytes (whole row) | none / TOAST |
| Columnar OLAP | column-oriented | ~16 bytes (2 cols × 8) | 5x typical |
Code.
-- The same query, different engines, very different I/O
SELECT
date_trunc('minute', order_ts) AS minute,
SUM(amount) AS revenue
FROM orders
WHERE order_ts >= NOW() - INTERVAL '24 hours'
GROUP BY 1
ORDER BY 1;
Step-by-step explanation.
- Rows in the 24h window:
2_000_000_000 * 0.05 = 100_000_000rows. - Postgres reads the entire row (100 bytes each) because the heap is row-oriented. I/O =
100_000_000 * 100 = 10_000_000_000bytes = ~10 GB. - A columnar OLAP engine reads only
order_tsandamount— about 16 bytes/row uncompressed. I/O =100_000_000 * 16 = 1.6 GB. - After columnar compression (typical 5x for numeric / timestamp data), the actual disk read drops to ~320 MB.
- The columnar engine is roughly 30x faster on I/O alone, before considering vectorised execution and pre-aggregation.
Output.
| Engine | Disk read | Approx latency on warm cache |
|---|---|---|
| Postgres row scan | 10 GB | tens of seconds |
| Columnar OLAP scan | 320 MB | hundreds of ms |
| Columnar + pre-agg (Druid rollup) | < 10 MB | tens of ms |
Rule of thumb. Whenever the workload is "scan a wide column subset over a time window," start columnar. Whenever the queries themselves are repetitive enough that you can pre-aggregate at ingest, you can buy another 10–100x — that is the bet Druid and Pinot make on every real-time pipeline.
Worked example — why a batch warehouse cannot hit sub-second freshness
Detailed explanation. A growth team wants impressions-per-second analytics with under 5-second end-to-end latency from event to dashboard. The "just use BigQuery + Looker" answer breaks on two fronts: the streaming insert API has minutes of buffering, and the dashboard slot warm-up is seconds.
Question. Trace the path of a single event from Kafka to a BigQuery dashboard tile. Compare with a Druid path. Where does each engine spend its time budget?
Input.
| Stage | BigQuery path (typical) | Druid path (typical) |
|---|---|---|
| Kafka → engine | dataflow / firehose buffer 1–5 min | Kafka indexing service ~1–5 s |
| Engine → query-visible | streaming buffer 1–2 min | real-time segment, immediate |
| Query slot cold-start | 1–3 s | warm broker, ~0 s |
| Scan | 0.5–2 s (slot-bound) | 50–200 ms (vectorised) |
Code.
# Pseudocode — the time budget for "live analytics"
event_to_dashboard_ms = sum([
kafka_to_engine_ms, # ingestion
engine_to_visible_ms, # query-visible delay
query_slot_warmup_ms, # cold start
scan_ms, # actual query
])
Step-by-step explanation.
- The "scan" stage looks similar on both engines (BigQuery's vectorised scan is fast; Druid's bitmap-filtered scan is fast). The fight is elsewhere.
- BigQuery's
streaming insertbuffers for ~60 seconds; rows are only available to query after the buffer flushes. That alone breaks any "under 5 seconds" SLA. - BigQuery dashboard tiles pay slot-warmup cost on the first query of a new tile — measured in seconds for a cold slot pool.
- Druid's Kafka indexing service writes to a real-time segment that is immediately queryable. The end-to-end latency is dominated by the Kafka commit delay (1–3 seconds), not engine plumbing.
- The pre-aggregation in Druid's rollup step also shrinks the scan further — many "by-minute" queries hit only a tiny pre-aggregated row count instead of raw events.
Output.
| Path | Realistic end-to-end latency | Sub-second feasible? |
|---|---|---|
| Kafka → BigQuery → Looker | 60–120 s | no |
| Kafka → Druid → Superset | 1–5 s | yes |
| Kafka → Pinot → Druid-style dashboard | 1–3 s | yes |
| Kafka → ClickHouse → Grafana | 2–8 s | yes (with care) |
Rule of thumb. If your latency SLO is measured in seconds, batch warehouses simply do not fit. Move to a real-time OLAP engine — and choose between Druid, Pinot, and ClickHouse on the next axis (query shape, ops appetite, indexing needs).
Worked example — the three forces in a single query
Detailed explanation. "Top 10 advertisers by spend in the last 5 minutes, grouped by country" is the canonical real-time OLAP query. It packs streaming ingest, time-bucketed reads, and multi-dimensional filters into a single SELECT — and the engine choice determines whether it returns in 30 ms or 30 s.
Question. Walk through how Druid, Pinot, and ClickHouse each solve this query. Show what part of the pipeline carries the load.
Input — common assumption. A ad_events stream from Kafka with columns (event_ts, advertiser_id, country, spend), arriving at 200k events/sec.
Code.
-- The canonical real-time OLAP query
SELECT
advertiser_id,
country,
SUM(spend) AS spend_5m
FROM ad_events
WHERE event_ts >= NOW() - INTERVAL '5 minutes'
GROUP BY advertiser_id, country
ORDER BY spend_5m DESC
LIMIT 10;
Step-by-step explanation.
-
Druid path. Rollup is configured on
(minute, advertiser_id, country). The 5-minute window is already pre-aggregated; the query reads ~5 rows per (advertiser × country) pair instead of millions. Latency: tens of ms. -
Pinot path. A star-tree index is built on
(advertiser_id, country)withspendas the metric. The 5-minute slice is found via the time-partitioned segments; the star-tree returns the aggregated answer in a single lookup. Latency: tens of ms. -
ClickHouse path. An
AggregatingMergeTreematerialised view pre-aggregates(toStartOfMinute(event_ts), advertiser_id, country) -> sum(spend). The query merges 5 minute-buckets and sorts. Latency: tens of ms — provided the MV exists. - The fork. All three are fast when configured correctly. The differences show up in the operational story: Druid needs deep storage and the Kafka indexing service; Pinot needs Helix and a star-tree config; ClickHouse needs the MV definition and the Kafka engine table.
Output.
| Engine | Pre-agg structure | Query latency | Ops surface area |
|---|---|---|---|
| Druid | rollup on ingest | tens of ms | high — many services |
| Pinot | star-tree index | tens of ms | medium — Helix |
| ClickHouse | AggregatingMergeTree MV | tens of ms | low — peer servers |
Rule of thumb. Real-time OLAP queries are won at ingest time by pre-aggregating, not at query time by scanning faster. Pick the engine whose pre-aggregation model fits your workload — and accept the ops surface area as the price of that pre-aggregation.
Real-time OLAP system design interview question
A senior interviewer often opens with: "Design a real-time ads dashboard that shows top advertisers by spend over the last 5 minutes, refreshed every 2 seconds, against a 500k events/sec Kafka stream. Which OLAP engine would you choose and why?" It blends ingest, pre-aggregation, and the three-forces analysis into a single answer.
Solution Using a Druid pipeline with rollup pre-aggregation
[Kafka topic ad_events]
|
v
[Druid Kafka indexing service]
| (rollup on (minute, advertiser_id, country) with SUM(spend))
v
[Real-time segment] --(handoff)--> [Historical segment on S3]
| |
+------> [Broker] <--- [Coordinator] -+
|
v
[Superset dashboard, 2s refresh]
A worked-example SQL the dashboard tile would emit (Druid SQL):
SELECT
advertiser_id,
country,
SUM(spend_sum) AS spend_5m
FROM "ad_events_rollup"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '5' MINUTE
GROUP BY advertiser_id, country
ORDER BY spend_5m DESC
LIMIT 10;
Step-by-step trace.
| Stage | Input rate | Output rate | Latency contribution |
|---|---|---|---|
| Kafka topic | 500k events/s | — | 0 (source) |
| Druid Kafka indexing | 500k events/s | ~5k pre-agg rows/s | 1–3 s ingest delay |
| Real-time segment | append-only | broker-visible | < 1 s |
| Broker scan (5 min) | ~1.5M pre-agg rows | 10 top rows | 20–80 ms |
| Superset render | — | dashboard tile | 50 ms |
The rollup at ingest is the load-bearing optimisation: 500k events/s collapses to ~5k pre-agg rows/s because most events share (minute, advertiser_id, country) tuples. The broker only scans pre-aggregated rows, not raw events.
Output:
| advertiser_id | country | spend_5m |
|---|---|---|
| adv_77 | US | 18,420.50 |
| adv_12 | DE | 14,003.10 |
| adv_88 | IN | 12,950.75 |
Why this works — concept by concept:
-
Rollup at ingest — Druid's
rollup: trueconfig groups input rows by the dimension tuple at ingest and stores only the aggregated metric. A 100x cardinality collapse is normal for ad-tech workloads. -
Time-partitioned segments — every segment carries a
__timerange; the broker prunes segments outside the WHERE window before scanning. The "last 5 minutes" query touches one or two segments. - Real-time + historical handoff — newly ingested data lives in a real-time segment served by the middle-manager. After a configurable period, it is handed off to a historical server backed by S3. Queries see both transparently.
- Broker fan-out — the broker scatters the query across all segment-owning servers and merges results. Pre-aggregation keeps the per-server scan tiny.
- Cost — ingest cost: O(events/s × indexing parallelism). Query cost: O(pre-agg rows in window × log fan-out). Storage cost: O(rollup ratio × event volume).
Data Engineering
Topic — real-time analytics
Real-time analytics problems
2. The real-time OLAP landscape — where each engine sits
Druid, Pinot, and ClickHouse converged on the same workload from three different starting points — and the gaps between them are what dictates the choice
The mental model in one line: Druid is the time-series specialist with mandatory pre-aggregation, Pinot is the low-latency dashboard specialist with index-heavy storage, and ClickHouse is the general-purpose columnar SQL engine with the most flexible schema model. Once you can place each engine on that spectrum, the "which one should we use" debate collapses into a sentence about query shape.
The spectrum from warehouse to search index.
- Far left — batch warehouses (Snowflake, BigQuery, Redshift). Wide, deep storage. Minutes of latency. Best for ad-hoc analytics and long history.
- Center-left — Druid. Time-bucketed, pre-aggregated, segment-storage. Best for time-series with heavy rollup.
- Center — Pinot. Index-heavy per-segment storage with star-tree pre-aggregation. Best for user-facing dashboards under 100 ms.
- Center-right — ClickHouse. Flexible MergeTree family with vectorised execution. Best for general analytics SQL with arbitrary filters and joins.
- Far right — search indexes (Elasticsearch, OpenSearch). Inverted-index document stores. Best for full-text and fuzzy lookup.
Druid's positioning.
-
Time-series first. Segments are partitioned by
__timeranges. The schema must include a timestamp column, and every query is implicitly time-filtered. - Pre-aggregation is mandatory in practice. "Raw" Druid (rollup off) works, but the value of Druid evaporates without rollup — you'd use ClickHouse instead.
- Decoupled storage. Segments are immutable files on S3 or HDFS ("deep storage"). Historical servers cache them locally; coordinators rebalance.
- Heavy service surface. Coordinator, overlord, broker, historical, middle-manager, router — six services in a healthy production deploy.
Pinot's positioning.
- User-facing first. Pinot was born to serve LinkedIn's member-facing dashboards under 100 ms. Latency budget is the dominant design constraint.
- Index-heavy. Bitmap, inverted, range, sorted, text, and the famous star-tree index. The star-tree is a pre-aggregated tree where every node is a partial aggregate — query latency is logarithmic.
- Helix-coordinated. Apache Helix (also from LinkedIn) handles state-machine-based cluster coordination; controllers, brokers, and servers are the three roles.
- Two server types. "Realtime" servers consume from Kafka and seal segments; "offline" servers serve completed segments. The split is the key insight.
ClickHouse's positioning.
-
General-purpose SQL. Real ANSI-ish SQL with JOINs, window functions, CTEs, dictionaries, and aggregate functions for every shape (
uniqCombined,quantileTDigest,topK). -
MergeTree family. A dozen storage engines:
MergeTree,ReplacingMergeTree,SummingMergeTree,AggregatingMergeTree,CollapsingMergeTree. Each tunes the merge step for a specific workload. - Coordinator-less. Peer servers, optionally coordinated by ZooKeeper or the newer ClickHouse Keeper. Sharding and replication are explicit.
- Lightest ops surface. A single binary; you can run a one-node ClickHouse on a laptop and query a billion rows.
How each one sits next to Kafka, S3, and BI tools.
- Kafka. All three have a first-class Kafka ingest path (Druid's Kafka indexing service, Pinot's realtime servers, ClickHouse's Kafka engine).
-
S3 / object storage. Druid uses it as deep storage by design. Pinot uses it for completed segments. ClickHouse uses it via the
S3table engine or as a backup target — not as primary storage. - BI tools. Druid + Pinot speak SQL via dedicated layers (Druid SQL, Pinot's SQL). ClickHouse speaks SQL natively. All three plug into Superset, Tableau, Looker, Grafana.
Comparison points — the dimensions that matter.
- Query latency. All three target sub-second. Pinot is fastest on indexed point queries; ClickHouse wins on arbitrary SQL; Druid wins on heavily rolled-up time-series.
- Data freshness. All three offer real-time ingest from Kafka with seconds-to-tens-of-seconds visibility.
- SQL maturity. ClickHouse > Pinot > Druid. ClickHouse has the broadest SQL surface; Druid is the most limited (no full JOIN until recently).
- Ops cost. ClickHouse cheapest, Pinot middle, Druid most expensive. The number of services to babysit follows the same ranking.
Worked example — placing a workload on the spectrum
Detailed explanation. A team runs three workloads in the same platform: ad-tech rollups, member-facing dashboards, and internal ad-hoc analytics. Putting all three on a single engine is a false economy — each one lives at a different spot on the spectrum.
Question. For each of the three workloads, name the engine that fits best and justify in one sentence.
Input.
| Workload | Latency budget | Query shape | Data freshness |
|---|---|---|---|
| Ad-tech rollups | < 200 ms | time-bucketed top-N | seconds |
| Member-facing dashboards | < 100 ms | indexed point + group | seconds |
| Internal ad-hoc analytics | < 5 s | arbitrary SQL + JOINs | minutes ok |
Code (engine selection logic).
if workload.shape == "time-bucketed + pre-aggregable":
pick(DRUID)
elif workload.shape == "user-facing + indexed":
pick(PINOT)
elif workload.shape == "general SQL + ad-hoc":
pick(CLICKHOUSE)
else:
fall_back_to(BATCH_WAREHOUSE)
Step-by-step explanation.
- Ad-tech rollups are the canonical Druid workload: pre-aggregate at ingest, time-partition segments, query the rolled-up table.
- Member-facing dashboards are the canonical Pinot workload: pre-build star-tree and bitmap indexes, query with strict latency SLO.
- Internal ad-hoc analytics are the canonical ClickHouse workload: write arbitrary SQL, use JOINs and window functions, accept a few hundred ms of latency.
- Mixing workloads on a single engine is possible but pays the worst-of-all-worlds cost — every engine has a sweet spot, and outside it is a tax.
Output.
| Workload | Best fit | One-sentence reason |
|---|---|---|
| Ad-tech rollups | Druid | rollup at ingest collapses cardinality 100x |
| Member-facing dashboards | Pinot | star-tree indexes deliver < 100 ms point queries |
| Internal ad-hoc analytics | ClickHouse | arbitrary SQL with JOINs and the broadest function library |
Rule of thumb. Place every new workload on the spectrum before picking an engine. The fastest engine is the one whose sweet spot overlaps your workload — not the one with the best public benchmark.
Worked example — when a single engine has to serve two workloads
Detailed explanation. Most teams start with one engine and slowly grow into two. The cleanest pattern is "Druid for time-series + ClickHouse for ad-hoc" or "Pinot for user-facing + ClickHouse for internal." Knowing which pairs combine well is a senior-level architecture call.
Question. A platform team has Pinot powering user-facing dashboards but is being asked to support ad-hoc analyst SQL with JOINs. Should they overload Pinot or add ClickHouse?
Input.
| Workload | Engine fit |
|---|---|
| User-facing dashboards | Pinot (excellent) |
| Ad-hoc analyst SQL with JOINs | Pinot (weak) / ClickHouse (excellent) |
Code (pseudocode for the architecture decision).
fit(Pinot, user_facing) = excellent
fit(Pinot, ad_hoc_join) = weak # Pinot's JOIN story is limited
fit(ClickHouse, ad_hoc_join) = excellent
decision = add(ClickHouse) # don't overload Pinot
Step-by-step explanation.
- Pinot's JOIN support is real but limited (multi-stage query engine is new; complex JOINs hurt latency). Overloading Pinot with JOIN-heavy SQL erodes the latency SLO that justified picking it in the first place.
- ClickHouse handles JOINs natively, including
JOINstrategies (hash, partial merge, parallel hash) tuned per query. - The combined stack: Kafka feeds both — Pinot consumes for the dashboard topic, ClickHouse consumes for analyst exploration. Both share the same source-of-truth but optimise differently.
- The alternative (add
JOINto Pinot) was tried by many teams and reverted; the lesson is "every engine has a sweet spot — respect it."
Output.
| Decision | Outcome |
|---|---|
| Overload Pinot | dashboard latency degrades |
| Add ClickHouse for ad-hoc | both workloads stay fast |
| Replace Pinot with ClickHouse | dashboards lose the < 100 ms SLO |
Rule of thumb. Stack engines additively rather than swap them. "Pinot + ClickHouse" is a common production combo at Uber, Cloudflare, and others — each engine does what it is best at.
Worked example — when to stay on the batch warehouse
Detailed explanation. Real-time OLAP is not free. Adding Druid, Pinot, or ClickHouse to the stack costs ops, monitoring, and on-call. Many workloads claim "real-time" but can comfortably live on a 5-minute micro-batch in Snowflake or BigQuery — saving the team a year of infrastructure work.
Question. A finance team wants "live" revenue analytics refreshed "every few minutes." Should they introduce a real-time OLAP engine?
Input.
| Requirement | Latency SLO | Volume | Ad-hoc SQL? |
|---|---|---|---|
| Live revenue dash | < 5 minutes | 10k events/s | yes |
Code (decision pseudocode).
if latency_slo_s >= 300 and volume_events_per_s <= 50_000:
keep(BATCH_WAREHOUSE_micro_batch)
elif latency_slo_s < 5:
add(REAL_TIME_OLAP)
else:
evaluate_both_with_cost(...)
Step-by-step explanation.
- A 5-minute SLO is a batch SLO, not a real-time one. Snowflake's
dynamic tablesor BigQuery'sMATERIALIZED VIEWrefresh on a schedule and serve sub-second SQL on the cached result. - The team would pay real-time OLAP cost (Druid / Pinot / ClickHouse cluster + ops) for a latency they do not need.
- The right call is "stay on the warehouse, add a refresh schedule." Save the real-time OLAP introduction for the workload that actually needs it.
- Senior interviewers love this answer because it shows the candidate respects cost as a constraint.
Output.
| Path | Cost | Latency delivered | Fit |
|---|---|---|---|
| Snowflake dynamic table | low | ~5 min | excellent |
| Druid pipeline | high | < 1 s | overkill |
Rule of thumb. Do not adopt a real-time OLAP engine until you have a workload whose SLO is measured in seconds and whose ingest rate exceeds what a 5-minute micro-batch can serve. "Real-time everywhere" is an anti-pattern.
Real-time OLAP positioning interview question
A senior interviewer often frames this as: "Walk me through how you'd decide between Druid, Pinot, ClickHouse, and just micro-batching to Snowflake for a new workload. What's your decision tree?" The probe blends engine positioning with cost discipline.
Solution Using a query-shape decision tree
workload = profile_workload(...)
if workload.latency_slo_s >= 300:
return BATCH_WAREHOUSE_MICRO_BATCH
if workload.shape == "time-bucketed metrics with rollups":
return DRUID
if workload.shape == "user-facing dashboard < 100 ms":
return PINOT
if workload.shape == "general SQL + JOINs + windows":
return CLICKHOUSE
return PROFILE_DEEPER
Step-by-step trace.
| Workload | latency SLO | shape | Decision |
|---|---|---|---|
| Ad-tech rollups | 200 ms | time-bucketed + pre-agg | DRUID |
| LinkedIn-style dashboards | 80 ms | user-facing indexed | PINOT |
| Cloudflare analytics | 500 ms | general SQL | CLICKHOUSE |
| Finance daily revenue | 5 min | tabular SQL | BATCH WAREHOUSE |
The tree is intentionally lean: latency SLO eliminates the warehouse first, then query shape picks among the three real-time engines. Adding extra branches (e.g. "data volume," "operator headcount") refines the answer but does not invalidate the tree.
Output:
| Workload | Recommended engine |
|---|---|
| Ad-tech rollups | Druid |
| User-facing dashboards | Pinot |
| Cloudflare-style analytics | ClickHouse |
| Daily revenue rollup | Snowflake (micro-batch) |
Why this works — concept by concept:
- Latency SLO as the first gate — eliminates the warehouse path early; spares the team from paying real-time ops cost for a batch SLO.
- Query shape over benchmark — engines look interchangeable on a single benchmark but diverge sharply on real workloads. Pick by shape.
- Pre-aggregation hint — workloads that pre-aggregate well lean Druid or Pinot; workloads that do not lean ClickHouse.
- SQL maturity hint — JOINs and windows tip the scale toward ClickHouse, away from Druid.
- Cost — adopting a real-time OLAP engine has a fixed ops cost (cluster + monitoring + on-call). Justify it with a workload whose SLO requires it.
Data Engineering
Topic — streaming
Streaming pipeline problems
3. Architecture trio — under the hood
Druid + Pinot decouple storage from compute; ClickHouse fuses them — and that single choice explains 80% of the operational differences
The mental model in one line: Druid and Pinot were built as distributed systems first (segment files on deep storage + many specialised services), while ClickHouse was built as a fast single-node engine that was later distributed (peer servers + sharding) — and every operational difference flows from that lineage. Once you can sketch the three architectures on a whiteboard, you can predict where each one shines and where each one breaks.
Druid's architecture — many services, one job each.
- Coordinator. Manages segment assignment across historicals; rebalances and replicates.
- Overlord. Manages ingestion tasks; assigns work to middle-managers.
- Broker. Receives client queries; scatters to historicals / middle-managers, gathers, merges.
- Historical servers. Hold completed segments locally (cached from deep storage); serve queries.
- Middle-managers. Run real-time tasks; hold real-time segments until handoff.
- Deep storage. S3 / HDFS / GCS — the durable source of truth for segments.
- Metadata store. Postgres / MySQL — stores segment metadata, task state, coordination data.
- Router (optional). Front-door load balancer.
Pinot's architecture — three roles + Helix.
- Controller. Manages cluster state; coordinates segment assignment; runs the segment-completion protocol for realtime tables.
- Broker. Routes queries; merges per-segment results.
- Server. Holds segments; two flavours — "offline servers" hold sealed segments; "realtime servers" consume from Kafka and own consuming segments.
- Helix (via ZooKeeper). State-machine-based cluster manager from LinkedIn; tracks server state, segment state, and assignment.
- Deep storage. S3 / HDFS — for sealed segments. (Realtime segments live in memory + local disk until completion.)
ClickHouse's architecture — peer servers + optional coordinator.
- No dedicated coordinator role. Every server is a peer. Queries can be sent to any server (typically a "distributed" table acts as the fan-out point).
- Shards × replicas. Data is sharded across nodes; each shard has N replicas for HA.
- ZooKeeper or ClickHouse Keeper. Used for replication coordination (Replicated*MergeTree engines). Smaller dependency than Druid/Pinot's stack.
-
Local storage. Parts (the ClickHouse equivalent of segments) live on local SSDs by default. Can be tiered to S3 via the
S3storage policy. - Distributed table. A virtual table that fans out queries to all shards; results are merged by the receiving server.
Segment / part files — the unit of storage.
- Druid segment. Immutable file, time-partitioned, columnar internal layout, contains dictionary-encoded dimensions, bitmap indexes, and aggregated metrics. Typical size: 300 MB–1 GB.
- Pinot segment. Immutable file, indexed columns (bitmap / inverted / star-tree / sorted), columnar layout. Realtime segments are mutable in memory then "sealed" into immutable offline segments. Typical size: hundreds of MB.
- ClickHouse part. Folder of column files on disk, immutable once written. Background merges combine parts into larger ones. Typical size: tens of MB to a few GB.
Deep-storage models.
- Druid. Mandatory deep storage. Historicals are caches. Loss of a historical means re-download from S3, not data loss.
- Pinot. Deep storage holds sealed segments; servers cache them locally. Same "cache + durable backing" model as Druid.
- ClickHouse. Local storage is primary; replication provides durability. Tiered storage to S3 is opt-in via storage policies.
Index types — the per-column accelerators.
- Bitmap indexes. All three support bitmap (or roaring-bitmap) indexes on dimension columns. Filter pushdown becomes a bitmap AND/OR.
- Inverted indexes. Pinot's signature; maps term → document list.
- Star-tree indexes. Pinot's killer feature — a multi-level pre-aggregation tree where every node holds partial aggregates. Trades disk for query latency.
- Skip indexes. ClickHouse's per-granule index (minmax, set, bloom_filter, tokenbf_v1). Skips entire granules during a scan.
- Sorted indexes. Pinot's "sorted index" — actual sort order on a column, enabling range pushdown.
Worked example — sketching the Druid service map
Detailed explanation. Drawing the six-service Druid diagram from memory is a senior-interview reflex. The trick is grouping: "control plane" (coordinator, overlord), "data plane" (broker, historicals, middle-managers), "storage plane" (deep storage, metadata DB).
Question. Draw the Druid architecture from memory. Group services into control / data / storage planes and explain the role of each.
Input. A blank whiteboard and an interviewer asking about Druid.
Code (text sketch).
[Router (optional)]
|
[Broker] <----+
/ \ \
v v v
[Historical] [Historical] [Middle-Manager]
\ / |
v v v
[Coordinator] <---> [Overlord]
| |
v v
[Metadata DB] [Deep Storage S3]
Step-by-step explanation.
- Router. Optional load balancer in front of brokers. Skipped in small deploys.
- Broker. Receives every client query, parses, plans, and scatters to historicals + middle-managers that own the relevant segments.
- Historicals. Hold sealed segments cached from deep storage; respond to broker fan-out.
- Middle-managers. Run ingest tasks; hold real-time segments until handoff to historicals.
- Coordinator. Decides which historical owns which segment; rebalances and replicates.
- Overlord. Decides which middle-manager runs which ingest task.
- Metadata DB (Postgres / MySQL). Source of truth for segment metadata, task state, and rule configs.
- Deep storage (S3 / HDFS). Source of truth for segment files; historicals cache locally.
Output.
| Plane | Services |
|---|---|
| Control | Coordinator, Overlord |
| Data | Broker, Historicals, Middle-Managers |
| Storage | Deep Storage, Metadata DB |
| Edge | Router (optional) |
Rule of thumb. When asked about Druid in an interview, lead with the three-plane grouping. Six services in a flat list looks intimidating; three planes feels organised.
Worked example — sketching the Pinot service map
Detailed explanation. Pinot's service map is simpler than Druid's — three roles plus Helix. The interesting part is the realtime / offline server split, which is unique to Pinot and the reason it wins low-latency dashboard workloads.
Question. Draw the Pinot architecture. Highlight the realtime / offline server split and explain Helix's role.
Input. A blank whiteboard.
Code (text sketch).
[Controller(s)] <---+
| \
v v
[Helix / ZK] <---> (cluster state)
^
|
[Broker]
/ \
v v
[Realtime Server] [Offline Server]
| |
v v
[Kafka topic] [Deep storage S3]
^
|
[Sealed segments]
Step-by-step explanation.
- Controller(s). Manage segment assignment, run the segment-completion protocol, expose the admin API.
- Brokers. Route queries; consult Helix for routing tables.
- Realtime servers. Consume from Kafka into "consuming segments." When a segment reaches a size or row-count threshold, it is sealed and offloaded.
- Offline servers. Hold sealed segments; serve queries against them.
- Helix (built on ZooKeeper). Cluster state machine — tracks every server's state (online/offline), every segment's state (consuming/sealed/error), and triggers transitions.
- Deep storage. Holds sealed segments; offline servers download them locally.
Output.
| Service | Job |
|---|---|
| Controller | cluster admin + segment-completion |
| Broker | query routing |
| Realtime server | Kafka consumer + consuming segment |
| Offline server | sealed segment serving |
| Helix | state machine over the above |
Rule of thumb. When asked about Pinot, lead with "realtime + offline servers, coordinated by Helix" — that captures the two unique design choices in one sentence.
Worked example — sketching the ClickHouse cluster topology
Detailed explanation. ClickHouse looks deceptively simple because there is no coordinator role. The real complexity is in the table-engine choice (MergeTree, ReplicatedMergeTree, Distributed) and the storage policy. A senior whiteboard answer names the engines.
Question. Draw a 2-shard, 2-replica ClickHouse cluster. Name the table engines on each node.
Input. A 4-node cluster spec.
Code (text sketch + schema).
[Distributed table on any node]
|
+-------------+-------------+
v v
[Shard 1] [Shard 2]
/ \ / \
v v v v
[Node 1 r1] [Node 1 r2] [Node 2 r1] [Node 2 r2]
| | | |
[ReplicatedMergeTree] [ReplicatedMergeTree]
\ /
v v
[Keeper / ZooKeeper cluster]
-- Local replicated table on each node
CREATE TABLE events_local ON CLUSTER my_cluster
(
event_ts DateTime,
user_id UInt64,
region LowCardinality(String),
amount Float64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(event_ts)
ORDER BY (region, event_ts, user_id);
-- Distributed virtual table on every node (the entry point)
CREATE TABLE events ON CLUSTER my_cluster
AS events_local
ENGINE = Distributed(my_cluster, default, events_local, rand());
Step-by-step explanation.
- ReplicatedMergeTree. Each node has a local table replicated to its peer within the shard via Keeper. Inserts on one replica auto-propagate.
-
Distributed table. A virtual table on every node that fans out reads to all shards and combines them. Inserts to the Distributed table can be sharded by
rand()or by a key. -
Keeper / ZooKeeper. Coordinates replication metadata, distributed DDL (
ON CLUSTER), and replica election. No data flows through it. - No coordinator role. The system is symmetric — any node can serve any query.
Output.
| Concept | Detail |
|---|---|
| Shard | a horizontal data slice (2 shards × 2 replicas = 4 nodes) |
| Replica | a copy of a shard for HA |
| ReplicatedMergeTree | per-node table with replication |
| Distributed | virtual fan-out table |
| Keeper / ZooKeeper | metadata-only coordinator |
Rule of thumb. When asked about ClickHouse, name the table engines first (ReplicatedMergeTree + Distributed) and the keeper second. That distinguishes you from candidates who confuse it with a "just a fast Postgres."
Real-time OLAP architecture interview question
A senior interviewer often opens with: "Compare the architectures of Druid, Pinot, and ClickHouse on three axes — storage layout, coordination model, and ops surface area. Which architectural choice has the biggest impact on operability?" It blends the three sketches into a single judgement call.
Solution Using a three-axis architecture matrix
STORAGE COORDINATION OPS SURFACE
Druid deep storage + coordinator + 6 services + metadata DB
local cache overlord + Helix- + deep storage
like rules
Pinot deep storage + Helix + controller 3 roles + Helix + deep storage
local cache
ClickHouse local + opt S3 Keeper / ZK 1 binary + Keeper
tiering (metadata only)
Step-by-step trace.
| Engine | Coordination services | Storage tiers | Ops surface |
|---|---|---|---|
| Druid | 2 (Coord + Overlord) + Router | deep + local | highest |
| Pinot | 1 (Controller) + Helix | deep + local | medium |
| ClickHouse | 0 dedicated + Keeper | local + opt S3 | lowest |
The matrix shows that coordination service count tracks with ops surface area almost perfectly. Druid's six services and Pinot's Helix dependency are the source of their operational reputation; ClickHouse's "just a binary" model is why solo data engineers reach for it first.
Output:
| Engine | Stated ops complexity (1-5) | Big-O of "things to monitor" |
|---|---|---|
| Druid | 5 | O(7 services × replicas) |
| Pinot | 3 | O(3 roles × replicas + Helix) |
| ClickHouse | 2 | O(nodes + Keeper) |
Why this works — concept by concept:
- Decoupled storage vs fused storage — Druid + Pinot pay a deep-storage tax for the right to lose any node without losing data; ClickHouse pays a replication tax to keep data local for I/O latency.
- Coordination service count — every coordination service is a thing on-call has to watch. Fewer services = lower MTTR.
- Helix as a dependency — Pinot inherits LinkedIn's Helix project; another moving part to learn, but a battle-tested one.
- Keeper as the minimal coordinator — ClickHouse Keeper is a drop-in replacement for ZooKeeper, smaller binary, simpler ops. The newer the cluster, the more likely you should use Keeper.
- Cost — ops cost scales with service count, not data volume. A 100 TB ClickHouse cluster can be smaller (on-call hours) than a 10 TB Druid cluster.
Data Engineering
Topic — aggregation
OLAP aggregation problems
4. Ingestion models — streaming vs batch in each engine
All three engines accept Kafka input — but Druid wraps it in a managed indexing service, Pinot in a state-machine-coordinated server, and ClickHouse in a SQL-shaped Kafka engine
The mental model in one line: Druid's Kafka indexing service writes real-time segments that hand off to historicals; Pinot's realtime servers consume into consuming segments that are sealed via the segment-completion protocol; ClickHouse's Kafka engine table + materialised view drains messages into a MergeTree target — and the difference shows up in how you reason about ingest lag and replay. Once you can sketch the three lanes, ingestion troubleshooting becomes a deterministic exercise.
Druid's streaming ingest.
-
Kafka indexing service. A first-class subsystem (
KafkaIndexTask) that consumes a topic, builds real-time segments, and reports progress to the overlord. - Real-time segment. Lives on a middle-manager; immediately queryable as it grows. Queries against it are served alongside historical segments via the broker.
- Handoff. When a segment "completes" (size / row-count / time threshold), it is published to deep storage and a historical server downloads it.
- At-least-once semantics by default. Supervised via overlord checkpoints; exactly-once requires the use of partition-locked offsets and idempotent ingest configs.
-
Rollup configured at ingest time. The supervisor spec declares
rollup: trueand the metric aggregators (e.g.longSum,doubleSum,hyperUnique).
Pinot's streaming ingest.
- Realtime servers. Each realtime server owns a consuming segment per partition. Reads directly from Kafka via the low-level consumer API.
- Consuming segment → completing → sealed. When a threshold is hit, the segment "completes": one realtime server is elected to seal the segment, others discard their in-memory copy, and the controller assigns the sealed segment to an offline server.
- Segment-completion protocol. Helix-coordinated; ensures exactly one sealed copy per logical segment across the cluster.
- Star-tree pre-aggregation. Configured per table; the star-tree is built when the segment is sealed.
ClickHouse's streaming ingest.
- Kafka engine table. A virtual table that consumes from a Kafka topic. Reading from it consumes messages — you do not read from it directly.
-
Materialised view trigger. An MV with
TO target_tableruns on every batch of messages and INSERTs into a real MergeTree table. -
Target = MergeTree family. Typically
MergeTreefor raw events orAggregatingMergeTreefor pre-aggregated data. -
Exactly-once is not free. The Kafka engine commits offsets after a successful INSERT, but failure modes (crash between INSERT and offset commit) can lead to duplicates. Dedup is via
ReplicatingMergeTreekeys or external means. -
Pre-aggregation via the merge step.
SummingMergeTree/AggregatingMergeTreecollapse rows that share the ORDER BY key during background merges.
Batch ingest lanes.
- Druid. Native batch ingest (Hadoop / index_parallel) reads from S3/HDFS and writes segments directly. Used for backfills.
- Pinot. Offline tables ingested via segment generation jobs (often Spark) and uploaded via the controller's segment upload API. Used for backfills and historical loads.
-
ClickHouse. Plain
INSERT INTO ... SELECT FROM s3(...), thes3table function, orclickhouse-client --query "INSERT ...". Most natural batch experience of the three.
Pre-aggregation models.
-
Druid rollup. Group-by at ingest on
(timestamp_truncated, dimensions...)with aggregator functions (longSum,hyperUnique). Storage drops linearly with cardinality collapse. - Pinot star-tree. A pre-computed multi-level tree where leaf nodes are full rows and intermediate nodes are partial aggregates. Storage tax for query speed.
-
ClickHouse AggregatingMergeTree. Stores partial-state aggregates (e.g.
quantileTDigestState,uniqCombinedState) that are merged by ORDER BY key on every background merge. Finalised at query time with-Mergecombinators.
Schema evolution.
- Druid. Dimensions can be added at ingest spec change; old segments do not have the new dimension and return NULL. Removing a dimension is harder (segments hold it forever).
- Pinot. Schema versioning is per-table; new columns appear NULL in old segments. Renames are not supported directly.
-
ClickHouse.
ALTER TABLE ... ADD COLUMNworks in seconds (metadata change); the column is NULL or default in old parts until those parts are rewritten via mutations.
Operational cost ranking.
- Druid. Highest. Six services, deep storage, metadata DB, Kafka indexing supervision.
- Pinot. Medium. Three roles, Helix, controller config sprawl.
- ClickHouse. Lowest. Kafka engine table + MV + target table is a three-line schema.
Worked example — Druid Kafka indexing supervisor spec
Detailed explanation. Druid's ingestion is configured by a supervisor spec — a JSON document submitted to the overlord that describes the topic, schema, partitions, and rollup. Reading one fluently is a Druid-specific interview test.
Question. Write a minimal Druid supervisor spec for the ad_events topic with rollup on (minute, advertiser_id, country) summing spend.
Input. A Kafka topic ad_events with JSON messages {ts, advertiser_id, country, spend}.
Code.
{
"type": "kafka",
"spec": {
"dataSchema": {
"dataSource": "ad_events_rollup",
"timestampSpec": { "column": "ts", "format": "iso" },
"dimensionsSpec": {
"dimensions": ["advertiser_id", "country"]
},
"metricsSpec": [
{ "name": "count", "type": "count" },
{ "name": "spend_sum", "type": "doubleSum", "fieldName": "spend" }
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
}
},
"ioConfig": {
"topic": "ad_events",
"inputFormat": { "type": "json" },
"consumerProperties": {
"bootstrap.servers": "kafka-1:9092,kafka-2:9092"
},
"useEarliestOffset": false
},
"tuningConfig": {
"type": "kafka",
"maxRowsInMemory": 100000
}
}
}
Step-by-step explanation.
-
timestampSpectells Druid which JSON field is the event timestamp and how to parse it. -
dimensionsSpeclists the columns to keep as dimensions; metrics are not dimensions in Druid. -
metricsSpecdeclares the aggregators;doubleSumpre-aggregatesspendat ingest. -
segmentGranularity: HOURmeans each segment file holds one hour of data. -
queryGranularity: MINUTEis the rollup bucket — events are truncated to the minute before grouping. -
rollup: trueenables pre-aggregation. With these settings, 500k events/s often collapses to a few thousand pre-agg rows/s.
Output.
| Setting | Effect |
|---|---|
queryGranularity: MINUTE |
rollup bucket = 1 min |
segmentGranularity: HOUR |
file size scope |
rollup: true |
pre-aggregate at ingest |
metricsSpec |
which columns to sum |
Rule of thumb. Memorise the four sections of the supervisor spec — dataSchema, ioConfig, tuningConfig, and the top-level type — and you can read any Druid spec in production.
Worked example — Pinot realtime table config
Detailed explanation. Pinot configures realtime tables via two JSON files: a schema and a table config. The "low-level consumer" mode is the production default — it gives Pinot direct control over Kafka partitions and segment completion.
Question. Write a minimal Pinot realtime table config for the same ad_events topic with a star-tree index on (advertiser_id, country) summing spend.
Input. Same Kafka topic, JSON messages, daily segment cadence.
Code.
// schema
{
"schemaName": "ad_events",
"dimensionFieldSpecs": [
{"name": "advertiser_id", "dataType": "STRING"},
{"name": "country", "dataType": "STRING"}
],
"metricFieldSpecs": [
{"name": "spend", "dataType": "DOUBLE"}
],
"dateTimeFieldSpecs": [
{"name": "ts", "dataType": "TIMESTAMP", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MINUTES"}
]
}
// table config (excerpt)
{
"tableName": "ad_events_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "ts",
"schemaName": "ad_events",
"replication": "2"
},
"tableIndexConfig": {
"loadMode": "MMAP",
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": ["advertiser_id", "country"],
"functionColumnPairs": ["SUM__spend", "COUNT__*"]
}
],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "ad_events",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.broker.list": "kafka-1:9092",
"realtime.segment.flush.threshold.rows": "5000000"
}
}
}
Step-by-step explanation.
- The schema names two dimensions, one metric, and a time column.
- The table config attaches a star-tree on
(advertiser_id, country)precomputingSUM(spend)andCOUNT(*). -
stream.kafka.consumer.type: lowlevelenables Pinot's preferred consumer — controllers assign Kafka partitions to realtime servers explicitly. -
realtime.segment.flush.threshold.rows: 5_000_000is the segment-completion trigger; when a consuming segment reaches 5M rows, Helix runs the completion protocol. - After completion, the sealed segment is uploaded to deep storage and assigned to an offline server.
Output.
| Mechanism | Effect |
|---|---|
Star-tree on (advertiser_id, country)
|
query latency O(log N) on aggregations |
| Low-level consumer | controller-managed partition assignment |
flush.threshold.rows |
controls segment size + cadence |
| Replication = 2 | two realtime + two offline copies |
Rule of thumb. Always configure star-tree indexes on the dimensions that drive your dashboard filters — that single setting is the difference between Pinot at 30 ms and Pinot at 300 ms.
Worked example — ClickHouse Kafka engine + MV
Detailed explanation. ClickHouse ingestion is a three-table pattern: a Kafka engine source, a MergeTree target, and a materialised view glue. Once you have written it once, every subsequent topic is copy-paste.
Question. Write the three CREATE TABLE statements that consume ad_events and write to a MergeTree partitioned by day.
Input. Same topic, JSON messages.
Code.
-- 1) Kafka source table
CREATE TABLE ad_events_kafka
(
ts DateTime64(3),
advertiser_id String,
country LowCardinality(String),
spend Float64
)
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'kafka-1:9092',
kafka_topic_list = 'ad_events',
kafka_group_name = 'clickhouse_ad_events',
kafka_format = 'JSONEachRow',
kafka_num_consumers = 2;
-- 2) Target MergeTree
CREATE TABLE ad_events
(
ts DateTime64(3),
advertiser_id String,
country LowCardinality(String),
spend Float64
)
ENGINE = MergeTree()
PARTITION BY toDate(ts)
ORDER BY (country, advertiser_id, ts);
-- 3) Materialised view glue
CREATE MATERIALIZED VIEW ad_events_mv TO ad_events AS
SELECT ts, advertiser_id, country, spend
FROM ad_events_kafka;
Step-by-step explanation.
- The Kafka engine table is a stream — every row is consumed exactly once per consumer-group offset.
- SELECTing directly from
ad_events_kafkaconsumes messages; this is not what you want. - The materialised view triggers on every batch of incoming rows, runs the SELECT, and INSERTs into
ad_events. - The target table is a normal
MergeTree, partitioned by day and ordered by(country, advertiser_id, ts)— the ORDER BY is the primary key and the sort order on disk. - Pre-aggregation can be added by changing the target to
AggregatingMergeTreeand using-Statecombinators in the MV SELECT.
Output.
| Table | Engine | Role |
|---|---|---|
ad_events_kafka |
Kafka | source (consumes the topic) |
ad_events_mv |
MaterializedView | trigger (consumes + inserts) |
ad_events |
MergeTree | target (storage + queries) |
Rule of thumb. Always use the three-table pattern. SELECTing from the Kafka table directly works once and then mysteriously stops because offsets advanced — a classic ClickHouse-onboarding bug.
Real-time OLAP ingestion interview question
A senior interviewer often opens with: "Compare how Druid, Pinot, and ClickHouse handle exactly-once Kafka ingestion. Which one would you trust by default and why?" It blends the three ingest models with the realities of exactly-once delivery.
Solution Using a per-engine exactly-once matrix
Druid:
- Kafka indexing service tracks offsets in metadata DB
- At-least-once by default; exactly-once via supervisor "exactlyOnce" mode
- Handoff to historicals does not duplicate; segments are idempotent by ID
Pinot:
- Realtime servers track offsets in Helix property store
- Exactly-once-ish: sealed segments are deterministic by partition + offset range
- Crashed consuming segment is re-consumed; deduplication via segment ID
ClickHouse:
- Kafka engine commits offsets after MV INSERT succeeds
- Crash between INSERT and commit -> duplicates
- Dedup via ReplacingMergeTree on (event_id) or external idempotency key
Step-by-step trace.
| Engine | Offset tracking | Default semantics | Strict exactly-once path |
|---|---|---|---|
| Druid | metadata DB | at-least-once | supervisor exactlyOnce mode |
| Pinot | Helix property store | effectively exactly-once via segment IDs | enable transactional sealing |
| ClickHouse | Kafka group offsets | at-least-once |
ReplacingMergeTree on event_id |
The lesson: none of the three give pure exactly-once for free. Druid has the most explicit story (supervisor flag), Pinot's segment-ID determinism is the most elegant, and ClickHouse defers exactly-once to the schema layer (ReplacingMergeTree).
Output:
| Engine | Trust level for exactly-once Kafka |
|---|---|
| Druid (exactlyOnce supervisor) | high |
| Pinot (segment-ID dedup) | high |
| ClickHouse (ReplacingMergeTree) | high if schema is correct |
Why this works — concept by concept:
- Offset-tracking ownership — the engine that owns offsets reliably (Druid metadata DB, Pinot Helix) has an easier exactly-once story than one that delegates to Kafka (ClickHouse).
-
Segment ID determinism — Pinot's clever trick: a sealed segment ID is
(partition, start_offset, end_offset). Re-sealing the same range produces the same ID, which the controller dedups. - Schema-level dedup — ClickHouse asks you to design the dedup at the table layer, not the ingestion layer. Powerful but easy to get wrong.
- Replay safety — all three support replaying a Kafka topic from a known offset; only Druid and Pinot make replay idempotent without schema-side work.
- Cost — exactly-once costs throughput. Druid's exactlyOnce mode caps ingest parallelism; Pinot's deterministic sealing adds Helix coordination cost; ClickHouse's ReplacingMergeTree adds a merge-time dedup pass.
Data Engineering
Topic — streaming
Kafka ingestion problems
5. Query patterns and engine fit
The same dashboard query lands on three different execution shapes — pick the engine that matches your filter pattern, not the one with the best benchmark
The mental model in one line: Druid wins time-filtered group-bys over rolled-up data; Pinot wins low-latency point-and-index queries; ClickHouse wins arbitrary analytics SQL with JOINs and window functions — and the right answer to "which is fastest" is always "for what query shape?". Once you can sketch the same query in all three engines, the fit decision is a one-sentence check.
Druid's query shape.
-
Time-filtered group-by with pre-aggregated metrics. "Show me top advertisers by spend in the last 24h, by hour" — the broker prunes segments by
__time, scans the rolled-up rows, and merges. - No expensive JOINs by default. Druid added lookup joins and broadcast joins, but heavy multi-table JOINs are an anti-pattern; you denormalise at ingest.
-
TopN as a first-class query. Druid's
topNquery type is faster than a genericgroupByfor the common "top-K" pattern. -
HLL / sketches built in.
hyperUniquecolumns store HLL sketches at ingest; cardinality queries become a sketch read.
Pinot's query shape.
- Indexed point + group queries. "Give me the click-through-rate for advertiser X in the last hour" — the broker hits the star-tree index and returns in tens of ms.
- Predicate pushdown via bitmap. Multi-dim filters become bitmap intersections per segment.
- Single-table workhorse. Multi-stage query engine (MSQE) added in 2023 enables JOINs, but single-table queries are where Pinot shines.
- Lookup joins via dimension tables. Common pattern for user-facing dashboards.
ClickHouse's query shape.
- Arbitrary analytics SQL. Window functions, CTEs, complex JOINs, subqueries, array functions, dictionary lookups — all first-class.
-
JOINs work but require care.
JOINstrategies (hash, partial merge, parallel hash) tune per query; the right strategy is workload-dependent. -
Wide function library.
quantileTDigest,uniqCombined,topK,groupArray,arrayJoin— half the time a dedicated function replaces a complex SELECT. - Materialised views as pre-agg. Same pattern as ingestion MV but reused for query acceleration.
SQL maturity ranking.
- ClickHouse > Pinot > Druid. ClickHouse is closest to a "real" SQL engine. Pinot's SQL is targeted at dashboard patterns. Druid's SQL is a thin layer over its native JSON query language.
Worked example — the same query in all three.
Same query: "spend per country in the last hour from ad_events."
-- Druid SQL
SELECT country, SUM(spend_sum) AS spend
FROM ad_events_rollup
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
GROUP BY country
ORDER BY spend DESC;
-- Pinot SQL
SELECT country, SUM(spend) AS spend
FROM ad_events
WHERE ts >= ago('PT1H')
GROUP BY country
ORDER BY spend DESC;
-- ClickHouse SQL
SELECT country, sum(spend) AS spend
FROM ad_events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY country
ORDER BY spend DESC;
The text is nearly identical. The execution differs sharply: Druid hits a rolled-up table; Pinot hits a star-tree (or bitmap-indexed segments); ClickHouse vectorises a column scan over partitions in the time window.
Decision matrix — query shape vs engine.
| Query shape | Best fit |
|---|---|
| Time-bucketed metrics + rollups | Druid |
| User-facing dashboards < 100 ms with indexed dims | Pinot |
| Arbitrary SQL + JOINs + windows | ClickHouse |
| Long-tail ad-hoc analytics | ClickHouse |
| Real-time top-K by dimension | Druid or Pinot |
| Multi-table joins as a first-class need | ClickHouse |
Combining engines in the same stack.
- Pinot + ClickHouse. Pinot for user-facing dashboards; ClickHouse for internal ad-hoc. Common at Uber and Stripe.
- Druid + ClickHouse. Druid for pre-aggregated time-series; ClickHouse for analyst exploration. Common in ad-tech.
- All three? Rare. Sign of org sprawl or a misaligned mandate. Most teams settle on two.
When you should stop comparing and just pick one.
- If your workload is clearly time-series with heavy rollup potential — Druid.
- If your latency budget is < 100 ms and queries are dashboard-shaped — Pinot.
- If you want a single binary that does general analytics — ClickHouse.
- If you cannot decide between the latter two for a non-dashboard workload, default to ClickHouse — it has the lowest ops cost and the broadest SQL.
Worked example — Druid TopN vs GroupBy
Detailed explanation. Druid's topN query is a specialised, approximate (by default) version of groupBy LIMIT N. It is faster and uses less memory because it avoids materialising the full group-by hash. Senior Druid users know to reach for it.
Question. Rewrite a "top 10 advertisers by spend in the last hour" as both a Druid groupBy and a Druid topN. Explain the cost difference.
Input. The ad_events_rollup table on Druid.
Code.
-- GroupBy variant (Druid SQL)
SELECT advertiser_id, SUM(spend_sum) AS spend
FROM ad_events_rollup
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
GROUP BY advertiser_id
ORDER BY spend DESC
LIMIT 10;
// TopN variant (native JSON; Druid SQL auto-translates eligible queries)
{
"queryType": "topN",
"dataSource": "ad_events_rollup",
"intervals": ["2026-06-15T00:00:00Z/2026-06-15T01:00:00Z"],
"granularity": "all",
"dimension": "advertiser_id",
"metric": "spend",
"threshold": 10,
"aggregations": [
{"type": "doubleSum", "name": "spend", "fieldName": "spend_sum"}
]
}
Step-by-step explanation.
- The
groupByquery allocates a hash map sized by the cardinality ofadvertiser_id(could be millions); the broker then sorts and trims to 10. - The
topNquery maintains a per-segment top-N heap of sizethreshold * factorand merges heaps at the broker. Memory and CPU drop sharply. - By default,
topNis approximate beyond the top-K; results are deterministic for the actual top-K rows but the long tail may shuffle. - Druid SQL automatically rewrites eligible
GROUP BY ... ORDER BY ... LIMIT Nqueries to the nativetopNwhen safe.
Output.
| Variant | Memory per segment | Latency on hot cache |
|---|---|---|
groupBy |
O(cardinality) | tens to hundreds of ms |
topN |
O(threshold × small factor) | low tens of ms |
Rule of thumb. For "top K by dimension" dashboards, prefer topN. Druid's SQL planner will choose it automatically when the query shape is recognisably top-K — write the SQL idiomatically and trust the planner.
Worked example — Pinot star-tree lookup
Detailed explanation. A star-tree index pre-aggregates the table along chosen dimensions. A query that filters on those dimensions hits a single tree node instead of scanning rows. The result is logarithmic-in-N latency for grouped queries — Pinot's signature trick.
Question. Show how a star-tree on (advertiser_id, country) accelerates the canonical query. Sketch the tree.
Input. Star-tree configured per the Pinot table config in section 4.
Code (the query that benefits).
SELECT country, SUM(spend) AS spend
FROM ad_events
WHERE advertiser_id = 'adv_77'
AND ts >= ago('PT1H')
GROUP BY country
ORDER BY spend DESC;
Step-by-step explanation.
- The star-tree root holds the total
SUM(spend)across all dimensions. - The first split is on
advertiser_id; child nodes correspond to each advertiser plus a special "star" node (no filter on advertiser). - The second split is on
country; each (advertiser, country) leaf node holds theSUM(spend)for that pair. - The query "advertiser = adv_77, group by country" finds the
adv_77child, then enumerates its country leaves — returning the per-country sums directly. - No row scan; one tree walk per segment, merged at the broker.
Output.
| Without star-tree | With star-tree |
|---|---|
| scan all matching rows | walk tree to leaf nodes |
| O(rows) per segment | O(log dimensions) per segment |
| hundreds of ms | tens of ms |
Rule of thumb. Build star-tree indexes on the dimensions your dashboards filter and group by. Disk cost is real (a few extra GB per segment is typical); query latency drops by 5–20x.
Worked example — ClickHouse window function
Detailed explanation. ClickHouse's window functions (OVER (PARTITION BY ... ORDER BY ...)) bring it close to a "real" SQL engine. The canonical example is a rolling 5-minute spend per advertiser — a workload that Druid handles via rollup, Pinot avoids, and ClickHouse runs as plain SQL.
Question. Compute rolling 5-minute spend per advertiser using a ClickHouse window function.
Input. A ad_events table on ClickHouse.
Code.
SELECT
advertiser_id,
toStartOfMinute(ts) AS minute,
sum(spend) AS minute_spend,
sum(sum(spend)) OVER (
PARTITION BY advertiser_id
ORDER BY toStartOfMinute(ts)
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) AS rolling_5m_spend
FROM ad_events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY advertiser_id, minute
ORDER BY advertiser_id, minute;
Step-by-step explanation.
- The inner
sum(spend)aggregates per(advertiser_id, minute). - The window function
sum(sum(spend)) OVER (...)sums the per-minute totals across a 5-row sliding frame (current row plus 4 preceding rows = 5 minutes). -
PARTITION BY advertiser_idkeeps each advertiser's sliding window independent. -
ROWS BETWEEN 4 PRECEDING AND CURRENT ROWis the frame spec — 5 minute-rows total. - Vectorised execution evaluates this in one streaming pass over the partition.
Output (excerpt).
| advertiser_id | minute | minute_spend | rolling_5m_spend |
|---|---|---|---|
| adv_77 | 10:00 | 200 | 200 |
| adv_77 | 10:01 | 150 | 350 |
| adv_77 | 10:02 | 300 | 650 |
| adv_77 | 10:03 | 100 | 750 |
| adv_77 | 10:04 | 250 | 1000 |
| adv_77 | 10:05 | 400 | 1200 |
Rule of thumb. When you need window functions on real-time OLAP data, ClickHouse is the default answer. Druid does not support windows natively; Pinot's MSQE engine is improving but not the first choice.
Real-time OLAP query-fit interview question
A senior interviewer often opens with: "Given three workloads — ad-tech rollups, member-facing < 100 ms dashboards, and analyst SQL with window functions — write the same 'top countries by spend' query in the engine that fits each and explain why." It blends fit + idiomatic SQL.
Solution Using engine-idiomatic SQL per workload
-- Workload 1: ad-tech rollups -> Druid SQL
SELECT country, SUM(spend_sum) AS spend
FROM ad_events_rollup
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
GROUP BY country
ORDER BY spend DESC
LIMIT 10;
-- Workload 2: member-facing < 100 ms -> Pinot SQL (star-tree backed)
SELECT country, SUM(spend) AS spend
FROM ad_events
WHERE ts >= ago('PT1H')
GROUP BY country
ORDER BY spend DESC
LIMIT 10;
-- Workload 3: analyst SQL with window -> ClickHouse SQL
SELECT
country,
sum(spend) AS spend,
sum(sum(spend)) OVER (ORDER BY country) AS running_total
FROM ad_events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY country
ORDER BY spend DESC
LIMIT 10;
Step-by-step trace.
| Workload | Engine | Execution shape | Approx latency |
|---|---|---|---|
| Ad-tech rollups | Druid | rolled-up table scan + topN | tens of ms |
| < 100 ms dashboard | Pinot | star-tree leaf walk + broker merge | tens of ms |
| Analyst with window | ClickHouse | vectorised scan + window pass | low hundreds of ms |
Each engine is given a query it is designed for. The latency numbers match each engine's strong suit; running the same workload on the wrong engine would degrade by 5–20x.
Output:
| Workload | Best query latency |
|---|---|
| Druid + rollup | ~25 ms |
| Pinot + star-tree | ~30 ms |
| ClickHouse + window | ~150 ms |
Why this works — concept by concept:
-
Engine-idiomatic SQL — each query uses functions and time-filter syntax native to that engine (
__time,ago('PT1H'),now() - INTERVAL). Avoid forcing one engine's SQL onto another. - Workload-engine alignment — Druid handles the rolled-up scan; Pinot handles the indexed dashboard; ClickHouse handles the window. None is "the best" — each is best for its workload.
- Latency by construction — pre-aggregation (Druid, Pinot) trades disk for latency; raw scans (ClickHouse) trade latency for flexibility. Choose your trade.
- Cost — total cost = engine ops cost + ingest cost + storage cost + query cost. Optimise the workload-engine fit before optimising any single line item.
- Decision discipline — pick by query shape, not by benchmark; revisit when the workload changes; do not let one bad fit poison the whole platform.
Data Engineering
Topic — real-time analytics
Real-time OLAP query patterns
Cheat sheet — Druid vs Pinot vs ClickHouse recipes
-
Time-bucketed rollup. Druid
granularitySpec: {queryGranularity: MINUTE, rollup: true}withmetricsSpecaggregators (doubleSum,hyperUnique). 100x cardinality collapse is normal for ad-tech. -
Star-tree pre-aggregation. Pinot
starTreeIndexConfigswithdimensionsSplitOrderlisting the filter dimensions andfunctionColumnPairslisting the aggregations. Build at segment-seal time. -
Real-time materialised view. ClickHouse
AggregatingMergeTreetarget +MaterializedView TO target AS SELECT ... groupArray*State(...). Finalise with-Mergecombinators at query time. -
Kafka ingestion — Druid. Submit a supervisor spec to overlord with
type: kafkaand the JSON dataSchema; the indexing service runs Kafka tasks on middle-managers. -
Kafka ingestion — Pinot. Configure
stream.kafka.consumer.type: lowlevelin the table config; controllers assign partitions to realtime servers; segments seal via Helix. -
Kafka ingestion — ClickHouse. Three-table pattern:
Kafkasource +MaterializedViewtrigger +MergeTreetarget. Always insert via the MV, never SELECT from the Kafka table. -
Sub-second filter — Druid. Bitmap indexes on dimensions are automatic; segment pruning by
__timeis automatic. Addsearched CASE WHENis not needed — Druid handles it natively. -
Sub-second filter — Pinot. Configure
invertedIndexColumns+bloomFilterColumnsper table; the broker pushes predicates down to bitmap intersections per segment. -
Sub-second filter — ClickHouse. Use
ORDER BY (primary_key_cols)to align disk order with filter predicates; addINDEX skip_idx col TYPE bloom_filter GRANULARITY 4for high-cardinality columns. -
Deduplication — Druid. Set
exactlyOncemode in the supervisor; segments are idempotent by ID. Re-ingestion of the same offset range overwrites cleanly. -
Deduplication — Pinot. Sealed segments are deterministic by
(partition, offset_range); controllers dedup at completion. Useupserttable type for primary-key dedup. -
Deduplication — ClickHouse. Use
ReplacingMergeTreeon(event_id)or external idempotency key; dedup happens at merge time, not insert time. -
Top-K — Druid. Use the
topNquery type (or let Druid SQL rewriteGROUP BY ... ORDER BY ... LIMIT N). Per-segment heap merging beats fullgroupBy. -
Top-K — Pinot. Plain
SELECT ... GROUP BY ... ORDER BY ... LIMIT N; the star-tree returns the answer in a tree walk. -
Top-K — ClickHouse.
SELECT ... ORDER BY ... LIMIT NwithLIMIT BYfor top-K-per-group; thetopK(N)(col)aggregate function for approximate top-K. -
HLL / cardinality — Druid. Configure a
hyperUniquemetric at ingest; query withAPPROX_COUNT_DISTINCT(metric). -
HLL / cardinality — Pinot. Configure
noDictionaryColumnsplusfieldConfigListwithH3orHLL_PLUSindexes; query withDISTINCTCOUNTHLL(col). -
HLL / cardinality — ClickHouse. Use
uniqCombined(col)aggregate (HLL-based) oruniqHLL12(col)for fixed-precision; both are first-class. -
JOINs. Druid avoids them — denormalise at ingest. Pinot supports them via the multi-stage query engine but with care. ClickHouse handles them natively; pick the JOIN algorithm with
SETTINGS join_algorithm. - Window functions. Druid does not support window functions directly. Pinot's MSQE has limited support. ClickHouse has full SQL window functions — preferred for windowed analytics.
-
Time-zone handling. Druid stores epochs; query-time conversion via
TIME_FORMAT. Pinot stores epochs;DATETRUNCfor conversion. ClickHouse storesDateTimewith optional TZ;toTimeZone(col, 'America/Los_Angeles')per column. - Choose the engine — one-line rules. Time-series + rollup → Druid. User-facing < 100 ms with indexes → Pinot. General SQL + JOINs + windows → ClickHouse. Latency SLO >= 5 minutes → stay on the batch warehouse.
Frequently asked questions
Druid vs Pinot — which is faster for user-facing analytics?
Pinot is generally faster for user-facing analytics where queries hit indexed dimensions and the latency budget is under 100 ms. Pinot's star-tree, bitmap, and sorted indexes are designed for the "filter + group by indexed dim" shape that powers LinkedIn-style member dashboards. Druid is competitive for time-bucketed pre-aggregated workloads but typically lands in the tens-to-hundreds-of-ms range rather than under 50 ms. If your workload is "show user X their personalised dashboard" Pinot wins; if your workload is "top advertisers by spend over rolled-up time," Druid is more economical.
Is ClickHouse a real-time OLAP database?
Yes — ClickHouse is a real-time OLAP engine, but with a different design philosophy than Druid or Pinot. It is a general-purpose columnar SQL engine with first-class Kafka ingest via the Kafka engine table + materialised view pattern, sub-second query latency on rolled-up AggregatingMergeTree data, and the broadest SQL surface of the three (JOINs, window functions, dictionary lookups, array functions). The trade-off is that ClickHouse asks you to design the pre-aggregation strategy explicitly (MV + AggregatingMergeTree) instead of declaring it once in a supervisor spec. For most teams without an ad-tech-grade time-series workload, ClickHouse is the lowest-ops real-time OLAP choice.
Can I use Druid without deep storage?
No — deep storage is mandatory in Druid. Historical servers cache segments locally for serving, but the durable copy lives in S3 (or HDFS / GCS / Azure Blob). The architecture is intentional: losing a historical server is a non-event because the missing segments are re-downloaded from deep storage. This is the inverse of ClickHouse, where local replicated storage is primary and S3 tiering is optional. If your storage budget rules out S3 (rare today), Druid is not the right engine — pick ClickHouse or stay on the batch warehouse.
What is Pinot's star-tree index?
A star-tree is a pre-aggregated multi-level tree where every node holds partial aggregates over a subset of dimensions. The leaves correspond to fully-grouped tuples; intermediate nodes correspond to "wildcard on this dimension" partial aggregates. A query that filters on the tree's split dimensions walks a few nodes and returns the answer without scanning rows. The cost is disk (a few extra GB per segment) and ingest time (build the tree when sealing a segment); the benefit is 5–20x faster grouped queries. Star-trees are configured per table via starTreeIndexConfigs listing dimensionsSplitOrder and functionColumnPairs.
Which engine has the best SQL support?
ClickHouse has the most complete SQL surface of the three. It supports full ANSI-ish SQL with window functions, CTEs, complex JOINs (with tunable algorithms), subqueries, array functions, and a vast aggregate-function library (quantileTDigest, uniqCombined, topK, argMax). Pinot's SQL is targeted at dashboard shapes and supports JOINs via the multi-stage query engine added in 2023, but with caveats around latency. Druid's SQL is a layer over its native JSON query language; it handles GROUP BY, topN, and filters well but lacks proper window functions and has limited JOIN support. If your workload requires general SQL flexibility, ClickHouse is the default answer.
Druid vs Pinot vs ClickHouse — when does each one win?
Druid wins time-bucketed metric workloads where heavy pre-aggregation (rollup) collapses billions of events into millions of rolled-up rows — ad-tech, observability, network telemetry. Pinot wins user-facing dashboards where the latency budget is under 100 ms and queries hit indexed dimensions — LinkedIn-style personalised analytics, payments dashboards, marketplace operations. ClickHouse wins everything else — general analytics SQL, ad-hoc analyst exploration, anything that needs JOINs or window functions, and any team that wants the lowest-ops real-time OLAP option. The three are not interchangeable; the right answer to "which one" is always "for what query shape and what latency SLO?"
Practice on PipeCode
- Drill the real-time analytics practice library → for the time-bucketed, sub-second query patterns that show up in every Druid / Pinot / ClickHouse interview.
- Rehearse on streaming pipeline problems → when the interviewer wants Kafka ingest + windowing + state-management answers.
- Sharpen aggregation drills → for the rollup, star-tree, and AggregatingMergeTree muscle that pre-aggregation engines reward.
- Layer the data aggregation library → for the broader GROUP BY + cardinality + HLL surface.
- Stack the conditional aggregation library → for the FILTER + CASE patterns inside engine-specific SQL.
- For the broader surface, read top data engineering interview questions →.
- Stack the prerequisites with the only 5 skills you need to become a data engineer →.
- Sharpen the system-design axis with the ETL system design for data engineering interviews course →.
- For long-form schema craft, work through data modelling for DE interviews →.
Pipecode.ai is Leetcode for Data Engineering — every Druid / Pinot / ClickHouse recipe above ships with hands-on practice rooms where you write the rollup supervisor, the star-tree config, and the AggregatingMergeTree materialised view against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your design for a sub-second top-K dashboard actually scales the way you claimed.





Top comments (0)