ETL Testing Interview Questions & Answers — A Complete Deep-Dive Guide
etl testing interview questions sit at the intersection of seven testable disciplines: metadata testing and schema parity, data completeness testing and row count parity, transformation logic testing with reference tables and rule-based assertions, performance testing (concurrency, volume, stress, endurance), data reconciliation and tolerance thresholds, regression testing with snapshot diffs, and the data quality frameworks (Great Expectations, Soda Core, dbt tests, Monte Carlo) that operationalise the whole stack. Together those seven stages form the deep-guide curriculum every modern etl tester interview questions loop circles back to — and the curriculum this guide walks through, end to end, in seven numbered sections.
This is the deep-dive companion to a tighter Q&A round-up: where a 5-section interview cheat sheet covers the eight etl testing types and the four validation rule families, this guide widens the surface into seven full teaching sections — metadata + schema testing (the cheapest tests that catch the loudest bugs), data completeness + row count testing (raw counts, GROUP BY counts, aggregate parity, tolerance thresholds), transformation logic testing (reference tables, business-rule assertions, snapshot diffs, dbt audit_helper, datafold data-diff), performance + reconciliation + regression (the senior tester rounds), DQ frameworks (4-tool decision tree), and the ETL tester career path (behavioural questions, the etl tester to data engineer transition, certification ROI, interview-day playbook). Each section ends as etl testing questions and answers: a question, a SQL or Python snippet, a traced execution, a sample output, and a concept-by-concept why this works breakdown — the exact shape etl testing interview questions for experienced rounds reward.
When you want hands-on reps immediately after reading, browse SQL practice library →, drill data-validation problems →, sharpen ETL drills →, rehearse aggregation reconciliation patterns →, reinforce database problems →, or widen coverage on the full Python practice library →.
On this page
- Why ETL testing is its own interview track
- Metadata + schema testing — the cheapest tests that catch the loudest bugs
- Data completeness + row-count testing — count parity, group-by counts, aggregate parity
- Transformation logic testing — reference tables, rule-based assertions, snapshots
- Performance, reconciliation, regression — the senior tester rounds
- DQ frameworks — Great Expectations, Soda Core, dbt tests, Monte Carlo
- ETL tester career path + interview-day playbook
- Choosing the right ETL test (cheat sheet)
- Frequently asked questions
- Practice on PipeCode
1. Why ETL testing is its own interview track
etl testing interview questions — a distinct discipline from QA and pipeline engineering
The one-sentence invariant: etl testing is a distinct discipline because the bugs it catches — schema drift, row drops, transform errors, reconciliation gaps, regression breaks — are silent until they hit production reports, and the assertions that catch them are data-shaped, not *code-shaped*. A senior data-engineering QA engineer is not a generalist tester with SQL skills; they are a specialist who thinks in row count parity, aggregate reconciliation, null-rate thresholds, and schema drift detection, and who automates those checks as first-class artefacts in CI / CD.
What interviewers actually score on etl tester interview questions.
- Test-taxonomy fluency — can you map the seven stages of this guide (metadata, completeness, transformation, performance, reconciliation, regression, DQ) onto a real pipeline?
-
SQL fluency on
data quality testing— can you write aCOUNT(*),GROUP BY,SUM, andFULL OUTER JOINcheck in 60 seconds, on paper, with no editor? -
Tooling familiarity on
data quality frameworks— can you name one strength and one weakness ofGreat Expectations,Soda Core,dbt tests, andMonte Carlo? -
reconciliation testingmental model — given a source ledger of $1.2M and a target ledger of $1.199M, what's your investigation playbook? -
regression testingdiscipline — when you change a transform, what tests do you re-run, and why? -
Production-safety patterns —
idempotency,dead-letter queues,late-arriving data,slowly changing dimensions— do you handle them in test logic or only in code?
The 7-stage map this guide walks through.
-
Stage 1 —
metadata + schema testing— column existence, type match, nullability, PK / FK preserved,information_schemaqueries, schema-drift detection. -
Stage 2 —
data completeness testing— raw row counts,GROUP BYrow counts, aggregate parity (MIN / MAX / SUM / AVG), tolerance thresholds,FULL OUTER JOINtechnique. -
Stage 3 —
transformation logic testing— reference-table comparison, rule-based assertions, snapshot diffs,dbt audit_helper,datafold data-diff. -
Stage 4 —
performance testing— concurrency, volume, stress, endurance, SLA gates. -
Stage 5 —
data reconciliation— source vs target ledger, tolerance bands, drilldown queries. -
Stage 6 —
regression testing— snapshot-based comparison, PR-triggered re-runs, break-on-fail gates. -
Stage 7 —
DQ frameworks—Great Expectations,Soda Core,dbt tests,Monte Carlo— pick-the-tool decision tree.
Why this is its own interview track and not a SQL round.
-
data quality testingis not feature testing — the system under test is data, not code; assertions are statistical and set-based, not procedural. -
The bugs are silent — a dropped row in a JOIN does not throw an exception; only a
COUNT(*)parity test surfaces it. -
schema driftpropagates — an upstream column rename breaks 50 downstream models if no schema parity test runs on every PR. -
reconciliationis the senior round — junior testers runCOUNT(*)parity; senior testers reconcileSUM(amount) GROUP BY regionand explain a $1k delta. -
regression testingis the discipline gate — promoting a transform without re-running the prior suite is the bug that gets you fired; the discipline of always re-running the suite is the senior signal interviewers chase.
Worked example — map a single ETL bug onto all seven stages
Detailed explanation. Real interviews probe whether you can think across stages on a single bug. Below is a canonical failure mode — a daily orders load drops every row where currency IS NULL — and how it surfaces (or escapes) at each of the seven stages.
Question. A daily load of orders from the OLTP source into the fact_orders warehouse table silently drops every row where currency IS NULL because of an INNER JOIN against a dim_currency lookup. Which of the seven test stages catches it, and how?
Input. Source orders has 1,234,567 rows today. After load, fact_orders has 1,231,402 rows — 3,165 missing. The bug only surfaces in next-day reconciliation when finance notices revenue is $48k below the source ledger.
Code.
-- Stage 2 — Completeness check that catches it
SELECT
(SELECT COUNT(*) FROM stg_orders) AS source_rows,
(SELECT COUNT(*) FROM fact_orders) AS target_rows,
(SELECT COUNT(*) FROM stg_orders)
- (SELECT COUNT(*) FROM fact_orders) AS missing_rows;
Step-by-step explanation.
- Stage 1 (metadata) — does not catch it; the schema is unchanged.
-
Stage 2 (completeness) — catches it the moment
source_rows != target_rows; the raw row-count parity test fires. -
Stage 3 (transformation logic) — catches it if a
LEFT JOINsnapshot diff is run against yesterday'sfact_orders; the 3,165 missing rows show up as removed. - Stage 4 (performance) — does not catch it; the load actually completes faster because it processes fewer rows.
-
Stage 5 (reconciliation) — catches it when source
SUM(amount) = $1.200Mand targetSUM(amount) = $1.152M; the $48k delta exceeds the 0.01% tolerance band. -
Stage 6 (regression) — catches it if a prior PR introduced the bad
INNER JOIN; snapshot diff against the priormainbranch fires on the row-count delta. -
Stage 7 (DQ framework) —
dbt test row_count_matchorgreat_expectations expect_table_row_count_to_equal_other_tablecodifies the assertion so it runs on every load.
Output (the completeness query's result).
| source_rows | target_rows | missing_rows |
|---|---|---|
| 1234567 | 1231402 | 3165 |
Rule of thumb: one bug touches multiple stages. Senior testers think across the seven stages on every failure; junior testers fix the one stage that fired and miss the others.
etl testing interview questions for experienced — the four senior signals
Signal 1 — opinionated tooling choices. Senior testers do not say "all four DQ tools are good"; they say "I use dbt tests for column-level invariants because they live next to the model, Great Expectations for distribution checks because the 40+ built-in expectations are unmatched, and Monte Carlo for cross-cutting freshness alerting on > 100 tables."
Signal 2 — tolerance thresholds, not exact equality. Junior testers assert source_total = target_total. Senior testers assert ABS(source_total - target_total) / source_total < 0.0001 because floating-point arithmetic, late-arriving data, and rounding all introduce noise.
Signal 3 — regression-aware change discipline. Senior testers do not ship a transformation change without re-running the prior test suite on the new output; the discipline is every PR re-runs the full prior suite plus the new tests, and the CI gate is break-on-fail.
Signal 4 — incident reasoning, not just incident detection. When a test fails, junior testers report "the test failed". Senior testers report "the test failed because upstream dim_currency was repointed to a new source; here's the rollback PR, here's the 15-minute fix, and here's the new test that would have caught it one stage earlier".
SQL
Topic — etl
ETL testing drills
SQL
Topic — data-validation
Data validation practice
Solution Using a 7-stage test-coverage matrix
Code.
-- One canonical coverage matrix — every row maps a stage to a test artefact.
CREATE TABLE etl_test_coverage AS
SELECT * FROM (VALUES
(1, 'metadata', 'schema_parity_orders', 'information_schema.columns', 'every PR'),
(2, 'completeness', 'row_count_orders', 'COUNT(*) source vs target', 'every load'),
(2, 'completeness', 'group_by_count_orders_region', 'COUNT(*) GROUP BY region', 'every load'),
(3, 'transformation', 'fact_orders_rule_total', 'order_total = unit_price * qty - discount', 'every load'),
(4, 'performance', 'fact_orders_sla', 'load_seconds < 1800', 'every load'),
(5, 'reconciliation', 'fact_orders_amount_recon', 'ABS(SUM(amount) src - tgt) / src < 0.0001', 'every load'),
(6, 'regression', 'fact_orders_snapshot_diff', 'datafold diff vs prior snapshot', 'every PR'),
(7, 'dq_framework', 'fact_orders_ge_suite', 'great_expectations expectation_suite','every load')
) AS t(stage_id, stage_name, test_name, assertion, cadence);
Step-by-step trace.
| stage_id | stage_name | test_name | assertion | cadence |
|---|---|---|---|---|
| 1 | metadata | schema_parity_orders | information_schema.columns | every PR |
| 2 | completeness | row_count_orders | COUNT(*) source vs target | every load |
| 2 | completeness | group_by_count_orders_region | COUNT(*) GROUP BY region | every load |
| 3 | transformation | fact_orders_rule_total | order_total = unit_price * qty - discount | every load |
| 4 | performance | fact_orders_sla | load_seconds < 1800 | every load |
| 5 | reconciliation | fact_orders_amount_recon | ABS(SUM(amount) src - tgt) / src < 0.0001 | every load |
| 6 | regression | fact_orders_snapshot_diff | datafold diff vs prior snapshot | every PR |
| 7 | dq_framework | fact_orders_ge_suite | great_expectations expectation_suite | every load |
- Row 1 —
metadataruns on every PR; it's the cheapest gate and the first one to fire when an upstream column is renamed. - Rows 2-3 —
completenessruns on every load; raw counts plusGROUP BYcounts together catch missing partitions and skewed transforms. - Row 4 —
transformationcodifies one business rule; the same model has 5-15 rule assertions in production. - Row 5 —
performancegates on a 30-minute SLA; volume-load failures show up here long before reconciliation does. - Row 6 —
reconciliationis the senior assertion; a0.0001tolerance band tolerates floating-point noise but flags real $100+ deltas. - Row 7 —
regressionruns on every PR viadatafold data-diff; it catches transformation drift that completeness alone misses. - Row 8 — the DQ framework wraps every assertion so the cadence and ownership are uniform.
Output.
| stage_id | stage_name | test_name | cadence |
|---|---|---|---|
| 1 | metadata | schema_parity_orders | every PR |
| 2 | completeness | row_count_orders | every load |
| 3 | transformation | fact_orders_rule_total | every load |
| 4 | performance | fact_orders_sla | every load |
| 5 | reconciliation | fact_orders_amount_recon | every load |
| 6 | regression | fact_orders_snapshot_diff | every PR |
| 7 | dq_framework | fact_orders_ge_suite | every load |
Why this works — concept by concept:
- Stage coverage matrix — turns the 7-stage map into an auditable artefact; every test is owned by exactly one stage, so you can talk to coverage gaps in one query.
-
Cadence binding — pairs each test with its run cadence (
every PRvsevery load); senior testers explicitly assign cadence per assertion, not "run everything always". - Assertion column — codifies the predicate in plain SQL / English; interviewers love a tester who can recite the predicate, not just the test name.
-
Tolerance bands — reconciliation row uses
ABS(...) / src < 0.0001, never raw equality; this single decision separates senior testers from junior ones. -
Cost —
O(1)to read the coverage matrix; the actual tests areO(N)over the underlying tables but parallelisable across the seven stages.
2. Metadata + schema testing — the cheapest tests that catch the loudest bugs
metadata testing and schema parity — four invariants that gate every PR
metadata testing is the cheapest test you can write and the loudest bug-catcher in the stack: a 10-line query against information_schema.columns runs in milliseconds, and a single missing column or changed type is the bug that breaks fifty downstream dashboards. Every ETL pipeline should gate every PR on four invariants — column existence, type match, nullability, and PK / FK preservation — and schema drift detection is the discipline of running those four checks on every load.
The four schema parity invariants.
-
column existence— every column in the source contract exists in the target with the same name; renames break downstream SQL silently. -
type match—INT → INT,VARCHAR(50) → VARCHAR(50),TIMESTAMP → TIMESTAMP; an implicitINT → VARCHARcast loses ordering, aggregations, and date arithmetic. -
nullability match— if the source contract sayscustomer_id NOT NULLand the target storescustomer_id NULL, downstream joins will silently drop rows. -
PK / FK preservation— primary keys must be unique and non-null in the target; foreign keys must reference an existing key in the parent table.
information_schema.columns — the standard-library inspection query.
-- Source vs target schema parity
WITH src AS (
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'staging' AND table_name = 'orders'
), tgt AS (
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'warehouse' AND table_name = 'fact_orders'
)
SELECT
COALESCE(s.column_name, t.column_name) AS column_name,
s.data_type AS src_type, t.data_type AS tgt_type,
s.is_nullable AS src_null, t.is_nullable AS tgt_null,
CASE
WHEN s.column_name IS NULL THEN 'missing in source'
WHEN t.column_name IS NULL THEN 'missing in target'
WHEN s.data_type != t.data_type THEN 'type mismatch'
WHEN s.is_nullable != t.is_nullable THEN 'nullability mismatch'
ELSE 'ok'
END AS verdict
FROM src s
FULL OUTER JOIN tgt t USING (column_name)
WHERE verdict != 'ok';
-
FULL OUTER JOIN— surfaces both missing-in-source and missing-in-target in one pass. -
USING (column_name)— equi-join on the column name only; PostgreSQL syntax. -
verdictCASE — single-column summary; one row per anomaly. - Empty result = pass — zero rows means schema parity holds.
schema drift detection in production.
-
Snapshot the schema — persist a
schemastable keyed by(table_name, snapshot_ts, column_name, data_type, is_nullable)on every load. -
Diff against yesterday —
LEFT JOINtoday's snapshot against yesterday's; non-matching rows are drift events. - Alert on drift — page on-call for any unexpected schema change; whitelist intentional changes via PR-tracked allowlist.
-
The 2 a.m. bug — an upstream OLTP team renames
cust_id → customer_idon a Friday; without drift detection, your Monday dashboards are wrong and finance is upset.
Worked example — write the 4-rule schema parity loop in one SQL block
Detailed explanation. Real interviews ask you to write the full schema parity check from scratch, not just the row-count one. Below is the canonical four-rule loop that fits on a whiteboard.
Question. Write a single SQL query that surfaces every column-level violation between staging.orders and warehouse.fact_orders for all four schema parity invariants (column existence, type match, nullability match, PK preservation).
Input. Two information_schema.columns row-sets plus two information_schema.table_constraints row-sets for PK introspection.
Code.
WITH src_cols AS (
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'staging' AND table_name = 'orders'
), tgt_cols AS (
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'warehouse' AND table_name = 'fact_orders'
), col_diff AS (
SELECT
COALESCE(s.column_name, t.column_name) AS column_name,
CASE
WHEN s.column_name IS NULL THEN 'missing_in_source'
WHEN t.column_name IS NULL THEN 'missing_in_target'
WHEN s.data_type != t.data_type THEN 'type_mismatch'
WHEN s.is_nullable != t.is_nullable THEN 'nullability_mismatch'
ELSE 'ok'
END AS verdict
FROM src_cols s
FULL OUTER JOIN tgt_cols t USING (column_name)
), pk_diff AS (
SELECT 'pk_count_mismatch' AS verdict
WHERE (
SELECT COUNT(*) FROM information_schema.key_column_usage
WHERE table_name = 'orders' AND constraint_name LIKE '%pkey%'
) != (
SELECT COUNT(*) FROM information_schema.key_column_usage
WHERE table_name = 'fact_orders' AND constraint_name LIKE '%pkey%'
)
)
SELECT column_name, verdict FROM col_diff WHERE verdict != 'ok'
UNION ALL
SELECT NULL, verdict FROM pk_diff;
Step-by-step explanation.
-
src_colsandtgt_colsmaterialise the source and target column metadata. -
col_diffFULL OUTER JOINs them oncolumn_nameand emits one of five verdicts per column. -
pk_diffseparately compares PK column counts viainformation_schema.key_column_usage. - The final
UNION ALLreturns one row per violation across all four invariants. - An empty result set = pass; any rows = block the PR until reconciled.
Output (when a single type mismatch exists).
| column_name | verdict |
|---|---|
| amount | type_mismatch |
Rule of thumb: schema parity is the first test that runs in CI and the cheapest one to write — never skip it. Junior testers forget to add it; senior testers refuse to merge without it.
schema drift detection in modern stacks — dbt, Great Expectations, datafold
-
dbt source freshness+dbt test—dbt source freshnessflags stale upstreams; column-level tests (not_null,unique,accepted_values,relationships) run on everydbt build. -
Great Expectations—expect_table_columns_to_match_ordered_list— codifies the column contract so any added or removed column fails the suite. -
datafoldschema diff — surfaces schema deltas as part of every PR review; visualises type and nullability changes in the GitHub UI. -
Monte Carloschema observability — auto-detects schema changes across hundreds of tables; alerts on unexpected drift without you writing the check.
SQL
Topic — database
Database / schema drills
SQL
Topic — data-validation
Schema validation practice
Solution Using information_schema.columns + a frozen contract table
Code.
-- Freeze the contract in a versioned table; diff every load against it.
CREATE TABLE schema_contract_fact_orders AS
SELECT * FROM (VALUES
('order_id', 'integer', 'NO'),
('customer_id', 'integer', 'NO'),
('region', 'character varying','NO'),
('amount', 'numeric', 'NO'),
('order_ts', 'timestamp with time zone','NO')
) AS t(column_name, data_type, is_nullable);
SELECT
COALESCE(c.column_name, a.column_name) AS column_name,
c.data_type AS contract_type, a.data_type AS actual_type,
c.is_nullable AS contract_null, a.is_nullable AS actual_null,
CASE
WHEN a.column_name IS NULL THEN 'missing_in_actual'
WHEN c.column_name IS NULL THEN 'unexpected_in_actual'
WHEN c.data_type != a.data_type THEN 'type_drift'
WHEN c.is_nullable != a.is_nullable THEN 'nullability_drift'
ELSE 'ok'
END AS verdict
FROM schema_contract_fact_orders c
FULL OUTER JOIN (
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'warehouse' AND table_name = 'fact_orders'
) a USING (column_name)
WHERE verdict != 'ok';
Step-by-step trace.
| column_name | contract_type | actual_type | verdict |
|---|---|---|---|
| amount | numeric | character varying | type_drift |
-
schema_contract_fact_ordersis the frozen contract — checked into git, owned by the test suite, never auto-updated. - The
FULL OUTER JOINagainstinformation_schema.columnssurfaces every drift between contract and reality. - The
verdictcolumn emits exactly one drift type per row; empty result = pass. - Any drift row fails the CI gate and pages the owning team.
Output.
| column_name | contract_type | actual_type | contract_null | actual_null | verdict |
|---|---|---|---|---|---|
| amount | numeric | character varying | NO | NO | type_drift |
Why this works — concept by concept:
- Frozen contract — the contract is a git artefact, not a probe; this makes drift a binary, auditable event rather than a fuzzy comparison.
-
FULL OUTER JOIN — catches both missing-in-actual (column dropped) and unexpected-in-actual (column added) in one pass;
LEFT JOINalone would miss additions. -
Verdict enum — one of five strings, never a free-text reason; downstream alerting can
WHERE verdict = 'type_drift'and route accordingly. -
Nullability matters — flipping a column from
NOT NULLtoNULLis silent in most ORMs; the contract catches it before downstream joins drop rows. -
Cost —
O(C)whereC= column count (typically 10-100); runs in milliseconds and is the cheapest gate in the suite.
3. Data completeness + row-count testing — count parity, group-by counts, aggregate parity
data completeness testing — three test families, one tolerance band
data completeness testing is the bedrock of ETL QA: every source row that should land in the target must actually land, and every aggregate computed downstream must match the source. There are exactly three test families — raw row count parity, GROUP BY row counts, and aggregate parity (MIN, MAX, SUM, AVG) — and one universal tolerance pattern that wraps all three.
Family 1 — row count parity.
SELECT
(SELECT COUNT(*) FROM staging.orders) AS src_rows,
(SELECT COUNT(*) FROM warehouse.fact_orders) AS tgt_rows,
(SELECT COUNT(*) FROM staging.orders)
- (SELECT COUNT(*) FROM warehouse.fact_orders) AS delta;
- The cheapest test in the suite — runs in milliseconds even on billion-row tables (count metadata is cached).
-
Catches the loudest bug — a
LEFT JOINthat should beINNER JOINand vice versa, a missing partition, aWHEREpredicate that drops rows. -
Tolerance — for non-incremental loads,
delta = 0is the gate; for incremental loads,delta = ROWS_IN_INCREMENTis the gate.
Family 2 — GROUP BY row counts.
SELECT
s.region,
COUNT(*) AS src_rows,
COALESCE(t.cnt, 0) AS tgt_rows,
COUNT(*) - COALESCE(t.cnt, 0) AS delta
FROM staging.orders s
LEFT JOIN (
SELECT region, COUNT(*) AS cnt
FROM warehouse.fact_orders
GROUP BY region
) t USING (region)
GROUP BY s.region, t.cnt
HAVING COUNT(*) - COALESCE(t.cnt, 0) != 0;
-
Catches skewed transforms — a transform that drops only EU rows passes
COUNT(*)parity within tolerance if EU is small, but fails per-region. -
Run on every grouping dimension —
region,currency,status,partition_date; one query per dimension is fine. - Empty result = pass — only mismatched groups show up.
Family 3 — aggregate parity (MIN, MAX, SUM, AVG).
SELECT
'amount' AS metric,
(SELECT SUM(amount)::numeric(18,4) FROM staging.orders) AS src_sum,
(SELECT SUM(amount)::numeric(18,4) FROM warehouse.fact_orders) AS tgt_sum,
ABS(
(SELECT SUM(amount) FROM staging.orders)
- (SELECT SUM(amount) FROM warehouse.fact_orders)
) /
NULLIF((SELECT SUM(amount) FROM staging.orders), 0) AS rel_delta;
-
Catches arithmetic transform bugs — a
CASTfromnumeric(18,4)tonumeric(18,2)quietly truncates pennies;SUMparity catches it. -
Tolerance band —
rel_delta < 0.0001(0.01%) is the canonical pass band; tighter for finance, looser for clickstream. -
Run for every numeric column —
MIN,MAX,SUM,AVG, andCOUNT(DISTINCT)per column. -
NULLIF(...)guards — protects against divide-by-zero when the source is empty.
The FULL OUTER JOIN reconciliation technique.
SELECT
COALESCE(s.order_id, t.order_id) AS order_id,
CASE
WHEN t.order_id IS NULL THEN 'missing_in_target'
WHEN s.order_id IS NULL THEN 'unexpected_in_target'
WHEN s.amount != t.amount THEN 'amount_mismatch'
ELSE 'ok'
END AS verdict
FROM staging.orders s
FULL OUTER JOIN warehouse.fact_orders t USING (order_id)
WHERE verdict != 'ok';
-
Surfaces every diverging row — not just the count, but the actual
order_ids. - Catches all three failure modes in one query — missing rows, extra rows, and value mismatches.
-
COALESCE(s.order_id, t.order_id)— works whether the row exists in source, target, or both. -
The investigation query — when
row_countparity fails, this is the second query you run.
Worked example — assemble the three completeness families into one SQL block
Detailed explanation. A real interview asks you to write the full completeness loop in one go, not just one family. Below is the canonical block that combines raw count, group-by count, and aggregate parity into a single CI gate.
Question. Write a single CI-friendly SQL block that gates fact_orders on (a) raw row count parity, (b) per-region GROUP BY row counts, and (c) SUM(amount) parity within a 0.01% tolerance.
Input. staging.orders (1,234,567 rows; SUM(amount) = 12,500,000.00) and warehouse.fact_orders (1,234,567 rows; SUM(amount) = 12,500,000.00).
Code.
WITH counts AS (
SELECT
(SELECT COUNT(*) FROM staging.orders) AS src,
(SELECT COUNT(*) FROM warehouse.fact_orders) AS tgt
), grp AS (
SELECT region, COUNT(*) AS delta
FROM (
SELECT region, COUNT(*) AS cnt FROM staging.orders GROUP BY region
) s
FULL OUTER JOIN (
SELECT region, COUNT(*) AS cnt FROM warehouse.fact_orders GROUP BY region
) t USING (region)
WHERE s.cnt IS DISTINCT FROM t.cnt
GROUP BY region
), amt AS (
SELECT
ABS(
(SELECT SUM(amount) FROM staging.orders)
- (SELECT SUM(amount) FROM warehouse.fact_orders)
) /
NULLIF((SELECT SUM(amount) FROM staging.orders), 0) AS rel_delta
)
SELECT
CASE WHEN (SELECT src FROM counts) != (SELECT tgt FROM counts) THEN 'FAIL row_count'
WHEN (SELECT COUNT(*) FROM grp) > 0 THEN 'FAIL group_by_count'
WHEN (SELECT rel_delta FROM amt) > 0.0001 THEN 'FAIL sum_amount'
ELSE 'PASS'
END AS verdict;
Step-by-step explanation.
-
countsmaterialises source and target row counts side by side. -
grpmaterialises every region whose per-group count diverges. -
amtmaterialises the relative delta ofSUM(amount)against tolerance. - The final
SELECTcollapses the three checks into a singlePASS / FAILverdict. - The CI gate is
WHERE verdict = 'PASS'— non-pass blocks the PR.
Output (when all three pass).
| verdict |
|---|
| PASS |
Rule of thumb: run all three families on every load; raw count is fast, GROUP BY count is slightly slower, aggregate parity is the most expensive but the most informative.
data completeness testing — the four senior nuances
-
tolerance thresholdselection —0.0001(0.01%) for finance,0.001(0.1%) for marketing,0.01(1%) for clickstream; never raw equality on floating-point. -
late-arriving data— incremental loads where today's count includes yesterday's late rows; the gate isdelta IN (expected_set), notdelta = 0. -
NULL handling in COUNT—COUNT(*)counts all rows,COUNT(col)skipsNULLs; senior testers state which one they mean. -
incremental vs full-refresh— incremental loads gate onpartition_count = expected_partition_count, not on total rows.
SQL
Topic — aggregation
Aggregation parity drills
SQL
Topic — etl
ETL completeness practice
Solution Using a FULL OUTER JOIN + tolerance band
Code.
WITH src AS (
SELECT order_id, amount FROM staging.orders
), tgt AS (
SELECT order_id, amount FROM warehouse.fact_orders
), diff AS (
SELECT
COALESCE(s.order_id, t.order_id) AS order_id,
s.amount AS src_amount,
t.amount AS tgt_amount,
CASE
WHEN t.order_id IS NULL THEN 'missing_in_target'
WHEN s.order_id IS NULL THEN 'unexpected_in_target'
WHEN ABS(s.amount - t.amount) / NULLIF(ABS(s.amount), 0) > 0.0001
THEN 'amount_drift'
ELSE 'ok'
END AS verdict
FROM src s
FULL OUTER JOIN tgt t USING (order_id)
)
SELECT verdict, COUNT(*) AS n
FROM diff
WHERE verdict != 'ok'
GROUP BY verdict;
Step-by-step trace.
| order_id | src_amount | tgt_amount | verdict |
|---|---|---|---|
| 1001 | 99.50 | 99.50 | ok |
| 1002 | 250.00 | NULL | missing_in_target |
| 1003 | 100.00 | 100.01 | ok (within 0.01%) |
| 1004 | 75.25 | 80.00 | amount_drift |
- The
FULL OUTER JOINwalks everyorder_idthat appears in either side. -
t.order_id IS NULLflags rows present in source but absent in target — the classic "row drop". -
s.order_id IS NULLflags rows present in target but absent in source — the classic "duplicate insert" or "stale row". -
ABS(s.amount - t.amount) / NULLIF(ABS(s.amount), 0) > 0.0001flags value drift outside the 0.01% tolerance. - The final
GROUP BY verdictaggregates the failures by type — one row per failure class.
Output.
| verdict | n |
|---|---|
| missing_in_target | 1 |
| amount_drift | 1 |
Why this works — concept by concept:
- FULL OUTER JOIN — only operator that surfaces missing-in-source and missing-in-target and value-drift in one pass.
-
Tolerance band —
0.0001relative delta is the industry-standard finance tolerance; protects against floating-point noise without masking real drift. -
NULLIF guard —
/ NULLIF(ABS(s.amount), 0)prevents divide-by-zero when the source amount is zero; the row falls into "ok" if both sides are zero, "drift" otherwise. -
Verdict enum — discrete failure classes (
missing_in_target,unexpected_in_target,amount_drift) feed clean alerting routes. -
Cost —
O((N + M) log (N + M))for the hash-join-or-sort plan; on 10M-row tables this completes in seconds.
4. Transformation logic testing — reference tables, rule-based assertions, snapshots
transformation logic testing — three patterns, every transform tested by at least one
transformation logic testing is the discipline that catches arithmetic and business-rule bugs that completeness tests cannot see. There are exactly three patterns — reference-table comparison (for fixed-output transforms), rule-based assertion (for business logic), and snapshot diff (for silent drift) — and every transform should be covered by at least one.
Pattern 1 — reference-table comparison.
-- expected_output is a frozen, hand-curated table of (input, expected) pairs.
SELECT
e.input_id,
e.expected_amount,
a.actual_amount,
CASE
WHEN ABS(e.expected_amount - a.actual_amount) > 0.01
THEN 'fail' ELSE 'pass'
END AS verdict
FROM expected_output e
LEFT JOIN actual_output a USING (input_id)
WHERE verdict = 'fail';
- Use case — fixed-output transforms where every input maps to a known output (currency rounding, tax calculation, status normalisation).
-
The contract —
expected_outputis a hand-curated git-tracked table; updated only via PR. - Pass criterion — zero rows returned.
- Failure mode caught — any transform-logic bug that changes the mapping; the strongest possible assertion when feasible.
Pattern 2 — rule-based assertion.
-- order_total must equal unit_price * qty - discount.
SELECT order_id, unit_price, qty, discount, order_total
FROM fact_orders
WHERE ABS(order_total - (unit_price * qty - discount)) > 0.01;
- Use case — business rules that hold for every row (order math, percentage calculations, tier assignments).
- The assertion — a SQL predicate that every row must satisfy; any returned row is a violation.
- Pass criterion — zero rows returned.
- Failure mode caught — rule violations introduced by upstream changes, late-arriving columns, or transform refactors.
Pattern 3 — snapshot diff.
-- Compare today's fact_orders against yesterday's snapshot.
SELECT
COALESCE(t.order_id, y.order_id) AS order_id,
t.amount AS today_amount,
y.amount AS yesterday_amount,
CASE
WHEN y.order_id IS NULL THEN 'new'
WHEN t.order_id IS NULL THEN 'dropped'
WHEN t.amount != y.amount THEN 'changed'
ELSE 'unchanged'
END AS verdict
FROM fact_orders t
FULL OUTER JOIN fact_orders_snapshot_yesterday y USING (order_id)
WHERE verdict IN ('dropped', 'changed');
- Use case — catching silent drift in transforms where neither a reference table nor a rule fully captures the logic.
- The contract — yesterday's run is the baseline; today's run must differ only in expected ways.
-
Pass criterion —
droppedandchangedcounts are within expected bands (or zero for immutable rows). - Failure mode caught — silent drift, especially after upstream OLTP changes or transform-version bumps.
dbt audit_helper — the dbt-native pattern.
-- dbt model: tests/audit/fact_orders_audit.sql
{{ audit_helper.compare_relations(
a_relation=ref('fact_orders'),
b_relation=ref('fact_orders_old'),
primary_key='order_id'
) }}
-
compare_relations— compares two dbt models row-by-row, column-by-column; surfaces every diff in a single result set. -
compare_column_values— narrower comparison, single-column. -
compare_queries— compares two arbitrary SQL queries; useful for testing model refactors. -
Ships with dbt-labs — install via
dbt deps, configure inpackages.yml, run viadbt build.
datafold data-diff — the standalone diff tool.
-
CLI tool —
data-diff postgresql://... postgresql://... -k order_id -c amount. - PR-integrated — comments on every GitHub PR with the row-level diff.
- Cross-database — diffs PostgreSQL ↔ Snowflake ↔ BigQuery row-by-row.
- Reach for it — when reviewing transform refactors; the diff in the PR replaces eyeballing 50 rows.
Worked example — write the three patterns for the same fact_orders model
Detailed explanation. Real interviews want to see that you can write all three transformation tests on the same model. Below is the canonical block.
Question. Write three tests for fact_orders: (a) a reference-table comparison for the status normalisation, (b) a rule-based assertion for order_total = unit_price * qty - discount, (c) a snapshot diff against yesterday.
Input. fact_orders (today's), fact_orders_snapshot_yesterday (yesterday's), and status_reference (a frozen mapping of raw status → normalised status).
Code.
-- (a) reference-table comparison
SELECT f.order_id, f.status_normalised, r.expected
FROM fact_orders f
LEFT JOIN status_reference r ON r.raw = f.status_raw
WHERE f.status_normalised IS DISTINCT FROM r.expected;
-- (b) rule-based assertion
SELECT order_id, unit_price, qty, discount, order_total
FROM fact_orders
WHERE ABS(order_total - (unit_price * qty - discount)) > 0.01;
-- (c) snapshot diff
SELECT
COALESCE(t.order_id, y.order_id) AS order_id,
CASE
WHEN y.order_id IS NULL THEN 'new'
WHEN t.order_id IS NULL THEN 'dropped'
WHEN t.amount != y.amount THEN 'changed'
ELSE 'unchanged'
END AS verdict
FROM fact_orders t
FULL OUTER JOIN fact_orders_snapshot_yesterday y USING (order_id)
WHERE verdict IN ('dropped', 'changed');
Step-by-step explanation.
-
(a) — joins
fact_ordersto a frozen reference; any row where the normalised status disagrees fails. -
(b) — re-derives
order_totalfrom the source columns; any row where the stored value diverges by more than 1 cent fails. -
(c) —
FULL OUTER JOINagainst yesterday; surfaces every dropped, new, or changed row. - Each query is a CI gate; an empty result is a pass.
Output (when all three pass).
| query | rows_returned | verdict |
|---|---|---|
| (a) reference | 0 | PASS |
| (b) rule | 0 | PASS |
| (c) snapshot | 0 | PASS |
Rule of thumb: every transform deserves at least one of the three; the strongest models have all three.
SQL
Topic — data-validation
Transformation logic drills
SQL
Topic — sql
SQL rule-assertion practice
Solution Using a combined rule-assertion + snapshot-diff pattern
Code.
-- Combined transformation-logic gate in one CTE chain.
WITH rule_violations AS (
SELECT order_id, 'rule_total_mismatch' AS verdict
FROM fact_orders
WHERE ABS(order_total - (unit_price * qty - discount)) > 0.01
), snapshot_diffs AS (
SELECT
COALESCE(t.order_id, y.order_id) AS order_id,
CASE
WHEN y.order_id IS NULL THEN 'snapshot_new'
WHEN t.order_id IS NULL THEN 'snapshot_dropped'
WHEN t.amount != y.amount THEN 'snapshot_changed'
ELSE 'unchanged'
END AS verdict
FROM fact_orders t
FULL OUTER JOIN fact_orders_snapshot_yesterday y USING (order_id)
)
SELECT order_id, verdict FROM rule_violations
UNION ALL
SELECT order_id, verdict FROM snapshot_diffs
WHERE verdict != 'unchanged';
Step-by-step trace.
| order_id | verdict |
|---|---|
| 1004 | rule_total_mismatch |
| 1008 | snapshot_changed |
| 1010 | snapshot_dropped |
-
rule_violationsemits every row where the business rule fails — one row per violation. -
snapshot_diffsemits every row whose state diverged from yesterday — one row per drift event. - The
UNION ALLflattens both into a single CI artefact; downstream alerting canWHERE verdict LIKE 'snapshot_%'orWHERE verdict LIKE 'rule_%'. - An empty result is the pass condition; any rows block the PR.
Output.
| order_id | verdict |
|---|---|
| 1004 | rule_total_mismatch |
| 1008 | snapshot_changed |
| 1010 | snapshot_dropped |
Why this works — concept by concept:
-
Rule re-derivation —
order_total = unit_price * qty - discountis re-computed from source columns; this is the strongest assertion because it doesn't trust the stored value. - Snapshot baseline — yesterday's snapshot is the truth until proven otherwise; today's run must be a known-good superset.
-
Verdict namespacing —
rule_*andsnapshot_*prefixes make alert routing trivial. -
UNION ALL — preferred over
UNIONhere because we want every row, not deduplicated; faster and simpler. -
Cost —
O(N)for the rule scan,O((N + M) log (N + M))for the snapshot join; both index-friendly onorder_id.
5. Performance, reconciliation, regression — the senior tester rounds
etl performance testing + data reconciliation + regression testing — the three senior rounds
The senior tester rounds are the three test families that distinguish a junior tester from a senior one — performance testing (does the pipeline finish inside its SLA?), data reconciliation (do source and target ledgers agree within tolerance?), and regression testing (does today's run still produce yesterday's expected output?). Every modern ETL suite ships all three; the etl testing interview questions for experienced loop probes all three.
performance testing — four sub-families.
-
volume testing— load the pipeline with 10x, 100x production data; the SLA must still hold. -
stress testing— load until something breaks; characterise the failure mode. -
concurrency testing— run N parallel loads; surface lock contention, deadlocks, slot exhaustion. -
endurance testing— run the pipeline continuously for 24 hours; surface memory leaks and slow degradation.
Performance gate in SQL.
-- Persist load metrics every run; gate on the SLA.
INSERT INTO pipeline_metrics (run_id, started_at, ended_at, row_count, sla_seconds)
VALUES (...);
SELECT
run_id,
EXTRACT(EPOCH FROM (ended_at - started_at)) AS load_seconds,
sla_seconds,
CASE
WHEN EXTRACT(EPOCH FROM (ended_at - started_at)) > sla_seconds
THEN 'FAIL' ELSE 'PASS'
END AS verdict
FROM pipeline_metrics
WHERE run_id = :current_run;
-
load_seconds— wall-clock time from start to end of load. -
sla_seconds— the contract; typically 30 minutes for hourly loads, 4 hours for daily. -
pipeline_metrics— persisted table; trend analysis over weeks shows slow regression. -
Alert on trend — page on-call when
load_secondsexceeds0.8 * sla_secondsfor 3 consecutive runs.
data reconciliation — the source ledger / target ledger pattern.
WITH src AS (
SELECT region, SUM(amount) AS total FROM staging.orders GROUP BY region
), tgt AS (
SELECT region, SUM(amount) AS total FROM warehouse.fact_orders GROUP BY region
)
SELECT
COALESCE(s.region, t.region) AS region,
s.total AS src_total,
t.total AS tgt_total,
ABS(s.total - t.total) AS abs_delta,
ABS(s.total - t.total) / NULLIF(ABS(s.total), 0) AS rel_delta,
CASE
WHEN ABS(s.total - t.total) / NULLIF(ABS(s.total), 0) > 0.0001
THEN 'FAIL' ELSE 'PASS'
END AS verdict
FROM src s FULL OUTER JOIN tgt t USING (region);
-
Source ledger —
SUM(amount) GROUP BY regionon staging. -
Target ledger —
SUM(amount) GROUP BY regionon warehouse. -
abs_delta— raw dollar / unit difference; useful for the alerting payload. -
rel_delta— relative difference; the actual gate. -
Tolerance band —
0.0001(0.01%); tighter for finance, looser for analytics.
regression testing — snapshot-based comparison + PR triggers.
-- Snapshot the model on every PR merge; diff every new PR against the latest snapshot.
SELECT
COALESCE(c.order_id, b.order_id) AS order_id,
CASE
WHEN b.order_id IS NULL THEN 'regression_new'
WHEN c.order_id IS NULL THEN 'regression_dropped'
WHEN c.amount != b.amount THEN 'regression_amount_change'
ELSE 'unchanged'
END AS verdict
FROM fact_orders_candidate c
FULL OUTER JOIN fact_orders_baseline b USING (order_id)
WHERE verdict != 'unchanged';
-
fact_orders_candidate— the model output under the new PR. -
fact_orders_baseline— the model output undermain. - PR gate — non-empty result blocks merge; reviewer must approve the deltas explicitly.
-
break-on-fail— CI fails the build the moment regression rows appear; senior testers never override.
Worked example — assemble a performance + reconciliation + regression suite in Python
Detailed explanation. Real senior loops ask you to wire all three into a single Python harness that runs on every PR. Below is the canonical Python orchestration.
Question. Write a Python script that runs (a) the performance SLA gate, (b) the per-region reconciliation, and (c) the regression diff against main, and exits non-zero on any failure.
Input. A psycopg2 connection to the warehouse, a current run_id, and the baseline snapshot table name.
Code.
import psycopg2, sys, time
conn = psycopg2.connect(host="warehouse", database="prod")
cur = conn.cursor()
failures = []
# (a) performance SLA
cur.execute("""
SELECT EXTRACT(EPOCH FROM (ended_at - started_at)), sla_seconds
FROM pipeline_metrics WHERE run_id = %s
""", (run_id,))
load_s, sla_s = cur.fetchone()
if load_s > sla_s:
failures.append(f"perf: {load_s:.0f}s > sla {sla_s}s")
# (b) reconciliation
cur.execute("""
SELECT COUNT(*) FROM (
SELECT region,
ABS(SUM(s.amount) - SUM(t.amount))
/ NULLIF(ABS(SUM(s.amount)), 0) AS rel
FROM staging.orders s
FULL OUTER JOIN warehouse.fact_orders t USING (region)
GROUP BY region
HAVING ABS(SUM(s.amount) - SUM(t.amount))
/ NULLIF(ABS(SUM(s.amount)), 0) > 0.0001
) x
""")
recon_failures = cur.fetchone()[0]
if recon_failures > 0:
failures.append(f"recon: {recon_failures} regions outside tolerance")
# (c) regression
cur.execute("""
SELECT COUNT(*) FROM fact_orders_candidate c
FULL OUTER JOIN fact_orders_baseline b USING (order_id)
WHERE b.order_id IS NULL OR c.order_id IS NULL OR c.amount != b.amount
""")
reg_failures = cur.fetchone()[0]
if reg_failures > 0:
failures.append(f"regression: {reg_failures} rows diverge from baseline")
if failures:
for f in failures:
print(f"FAIL: {f}")
sys.exit(1)
print("ALL PASS")
Step-by-step explanation.
- The script collects
failuresinstead of short-circuiting; this surfaces all failing rounds per run, not just the first. - (a) compares wall-clock against SLA; one row per run.
- (b) counts regions outside the 0.01% reconciliation band; one row per region.
- (c) counts every regression row in the snapshot diff; one row per drift.
- The script exits non-zero only when at least one round fails; CI gates on the exit code.
Output (when one reconciliation region fails).
FAIL: recon: 1 regions outside tolerance
exit code: 1
Rule of thumb: always run all three rounds on every load; never short-circuit on the first failure — collect them all and report once.
SQL
Topic — aggregation
Reconciliation aggregation drills
Python
Topic — etl
Regression + perf Python practice
Solution Using a triple-gate harness + non-zero exit on any failure
Code.
def run_triple_gate(cur, run_id: int) -> int:
"""Returns 0 if all three rounds pass, 1 otherwise."""
failures: list[str] = []
# 1. Performance
cur.execute(
"SELECT EXTRACT(EPOCH FROM (ended_at - started_at)), sla_seconds "
"FROM pipeline_metrics WHERE run_id = %s",
(run_id,),
)
load_s, sla_s = cur.fetchone()
if load_s > sla_s:
failures.append(f"perf: {load_s:.0f}s > sla {sla_s}s")
# 2. Reconciliation
cur.execute("""
SELECT COUNT(*) FROM (
SELECT region
FROM staging.orders s FULL OUTER JOIN warehouse.fact_orders t USING (region)
GROUP BY region
HAVING ABS(SUM(s.amount) - SUM(t.amount))
/ NULLIF(ABS(SUM(s.amount)), 0) > 0.0001
) x
""")
if (recon := cur.fetchone()[0]) > 0:
failures.append(f"recon: {recon} regions out of tolerance")
# 3. Regression
cur.execute("""
SELECT COUNT(*) FROM fact_orders_candidate c
FULL OUTER JOIN fact_orders_baseline b USING (order_id)
WHERE b.order_id IS NULL OR c.order_id IS NULL OR c.amount != b.amount
""")
if (reg := cur.fetchone()[0]) > 0:
failures.append(f"regression: {reg} rows diverge from baseline")
for f in failures:
print(f"FAIL: {f}")
return 1 if failures else 0
Step-by-step trace.
| step | check | observed | gate | verdict |
|---|---|---|---|---|
| 1 | perf | load_s=1700, sla_s=1800 | load_s <= sla_s | PASS |
| 2 | recon | regions_failing=0 | == 0 | PASS |
| 3 | regression | rows_diverging=3 | == 0 | FAIL |
| return | non-zero | failures = [regression] | exit 1 |
- Step 1 reads
pipeline_metrics; load completed in 1700s < 1800s SLA — PASS. - Step 2 counts regions outside the 0.01% tolerance band; zero — PASS.
- Step 3 counts regression rows against baseline; three rows — FAIL.
- The function appends to
failures, prints each one, and returns1to signal CI failure. - The CI gate is
if run_triple_gate(...) != 0: sys.exit(1).
Output.
FAIL: regression: 3 rows diverge from baseline
return: 1
Why this works — concept by concept:
- Triple-gate collection — collects all failures per run, never short-circuits; this is the senior pattern because it surfaces correlated bugs (perf + recon together usually = a transform that runs longer because it's processing extra rows).
-
Tolerance bands everywhere —
0.0001on reconciliation,<= sla_secondson performance, exact equality on regression (transforms must be deterministic). -
Walrus operator (
:=) — captures the count into a variable and tests it in one line; idiomatic Python 3.8+. -
Non-zero exit — the CI contract;
0 = pass,non-zero = fail. The pipeline blocks until fixed. - Cost — each check is a single round-trip to the warehouse; total runtime ~1-3s on top of the load itself; negligible.
6. DQ frameworks — Great Expectations, Soda Core, dbt tests, Monte Carlo
data quality frameworks — four tools, one decision tree
The data quality frameworks market has converged on four production-grade tools: dbt tests (the SQL-native default), Great Expectations (the most expressive), Soda Core (the most analyst-friendly), and Monte Carlo (the leading paid observability platform). Every senior etl tester interview questions loop asks "which would you pick, and why?" The answer is never "all four are great" — it's a decision tree based on stack + team skillset + table count.
Tool 1 — dbt tests (built-in + dbt-expectations).
# models/marts/fact_orders.yml
version: 2
models:
- name: fact_orders
columns:
- name: order_id
tests: [not_null, unique]
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customer')
field: customer_id
- name: amount
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
-
Strengths — co-located with the model; runs as part of
dbt build; integrates withmanifest.jsonand CI; the strongest default if you already use dbt. -
Weaknesses — limited to column-level assertions; weak on distribution and statistical tests (use
dbt-expectationsto extend). - Pick it when — your transformation layer is dbt; you want tests as code, in the same repo as the model.
-
Built-in tests —
not_null,unique,accepted_values,relationships; four assertions cover ~70% of column invariants.
Tool 2 — Great Expectations.
import great_expectations as ge
context = ge.get_context()
suite = context.add_or_update_expectation_suite("fact_orders_suite")
batch = context.sources.add_postgres("warehouse")\
.add_table_asset("fact_orders")\
.build_batch_request()
validator = context.get_validator(batch_request=batch, expectation_suite_name="fact_orders_suite")
validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_unique("order_id")
validator.expect_column_values_to_be_between("amount", min_value=0, max_value=1_000_000)
validator.expect_column_distinct_values_to_be_in_set("region", ["US","EU","APAC","LATAM"])
validator.expect_column_mean_to_be_between("amount", min_value=50, max_value=500)
- Strengths — 50+ built-in expectations covering null, uniqueness, range, regex, distribution, statistical checks; auto-generates HTML Data Docs.
- Weaknesses — heavier setup; Python-first (not SQL-first); steeper learning curve than dbt tests.
- Pick it when — you need distribution / statistical assertions, or you're outside the dbt world.
-
Key concept —
ExpectationSuite— a versioned, JSON-serialised contract.
Tool 3 — Soda Core.
# soda/fact_orders.yml
checks for fact_orders:
- row_count > 0
- missing_count(order_id) = 0
- duplicate_count(order_id) = 0
- min(amount) >= 0
- max(amount) <= 1000000
- failed rows:
name: status_in_set
fail query: |
SELECT * FROM fact_orders
WHERE status NOT IN ('paid','pending','refunded')
- Strengths — YAML-first, English-like assertions; fast to onboard non-engineers; SQL fallback for custom checks.
- Weaknesses — smaller built-in library than Great Expectations; less mature ecosystem.
- Pick it when — your team includes analysts who'll author tests; you want tests outside the dbt repo.
- Cloud add-on — Soda Cloud adds incident triage + ownership routing on top of the open-source core.
Tool 4 — Monte Carlo (observability platform).
- Strengths — auto-detects schema changes, freshness anomalies, volume anomalies, and field-level health across 100s of tables with zero config; lineage-aware alerting.
- Weaknesses — paid SaaS only; per-table licensing; not a write-tests-in-code platform.
- Pick it when — you have > 100 tables, multiple teams, and need reactive alerting on unknown failure modes.
- The complement — Monte Carlo sits above dbt tests + Great Expectations, not instead of them; it observes outputs, others assert invariants.
The 4-tool decision tree.
-
You use dbt and want tests next to models →
dbt tests(+dbt-expectationsfor advanced). -
You need distribution / statistical checks →
Great Expectations. -
Analysts will author the tests →
Soda Core. -
You have > 100 tables and need observability on unknowns →
Monte Carlo(on top of one of the others). -
Modern stacks combine —
dbt testsfor column invariants +Great Expectationsfor distributions +Monte Carlofor observability.
dbt audit_helper and datafold data-diff — the diff-tool category
-
dbt audit_helper—compare_relations,compare_column_values,compare_queries; native to dbt. -
datafold data-diff— CLI + GitHub-bot; cross-database row-level diff; killer for PR review. -
Pick
dbt audit_helperfor in-dbt refactor testing. -
Pick
datafoldwhen reviewing PRs across heterogeneous warehouses or wanting GitHub-comment UX.
The build-vs-buy choice — open source vs paid
-
Build (OS) —
dbt tests+Great Expectations+Soda Corecost zero in licensing, all developer time; full control. -
Buy (SaaS) —
Monte Carlo,Bigeye,Anomalo,Lightup; pay per table or per ingestion volume; faster time-to-value at scale. - The hybrid (most common) — open-source tools for write-tests-in-code invariants + one SaaS observability tool for cross-cutting alerting.
- Interview signal — name your stack's actual mix; "we use dbt tests + Great Expectations + Monte Carlo" is a senior answer.
SQL
Topic — data-validation
DQ framework practice
SQL
Topic — etl
ETL DQ drills
Solution Using a tool-selection matrix
Code.
-- Materialise the decision tree as a query you can paste into any architecture doc.
CREATE TABLE dq_tool_choice AS
SELECT * FROM (VALUES
('dbt tests', 'open source', 'column invariants', 'use dbt', 'low'),
('dbt-expectations', 'open source', 'extended dbt assertions', 'use dbt + need range/regex/distribution', 'low'),
('Great Expectations','open source', 'distribution + statistical', 'need 50+ expectation types', 'medium'),
('Soda Core', 'open source', 'analyst-authored checks', 'analysts author tests', 'low'),
('Monte Carlo', 'SaaS', 'observability + lineage', '>100 tables, multi-team', 'high'),
('Datafold', 'SaaS', 'PR-gated data-diff', 'reviewing transform PRs', 'medium'),
('Bigeye / Anomalo', 'SaaS', 'anomaly detection', 'analytics team, low-code', 'high')
) AS t(tool, licensing, primary_strength, pick_when, setup_cost);
Step-by-step trace.
| tool | licensing | primary_strength | pick_when |
|---|---|---|---|
| dbt tests | open source | column invariants | use dbt |
| Great Expectations | open source | distribution + statistical | need 50+ expectation types |
| Soda Core | open source | analyst-authored checks | analysts author tests |
| Monte Carlo | SaaS | observability + lineage | >100 tables, multi-team |
| Datafold | SaaS | PR-gated data-diff | reviewing transform PRs |
- Row 1 —
dbt testsis always the cheapest path when you already run dbt. - Row 3 —
Great Expectationsshines for distribution / statistical checks the dbt core can't express. - Row 4 —
Soda Coreis the YAML-first, English-like option for analyst teams. - Row 5 —
Monte Carlois the paid layer on top of the others; pick it when unknowns dominate. - Row 6 —
Datafoldis the PR-review tool; complements but doesn't replace the others.
Output.
| tool | licensing | pick_when |
|---|---|---|
| dbt tests | open source | use dbt |
| Great Expectations | open source | need distribution / statistical |
| Soda Core | open source | analysts author tests |
| Monte Carlo | SaaS | >100 tables, multi-team |
| Datafold | SaaS | PR-gated diff |
Why this works — concept by concept:
- Decision matrix — turns a vague "which tool?" into a one-row lookup; interviewers love a candidate who has internalised the tradeoffs as data, not opinion.
- Primary strength column — each row codifies the job the tool is best at; avoids the "all four are good" answer.
-
Setup cost — surfaces the real onboarding cost; senior testers weigh
setup_costagainst table count and team size. - Open source vs SaaS — the licensing column is the budget gate; some orgs cannot adopt SaaS at all.
-
Cost —
O(1)to read the matrix; the actual tools have their own runtime costs but those don't belong in the selection step.
7. ETL tester career path + interview-day playbook
etl tester career path — three rungs and a transition
The etl tester to data engineer transition is the most-asked career question in this interview track. The market splits roles into three rungs — junior ETL tester (0-2 years; manual SQL queries, ad-hoc reconciliation), senior ETL tester (3-7 years; owns the test suite, picks tools, mentors juniors), and data engineer / analytics engineer (the transition target; owns the pipeline plus the tests). The transition path is well-trodden and the interview-day playbook below is the canonical preparation framework.
Rung 1 — junior ETL tester (0-2 years).
- Daily work — run pre-written SQL queries; reconcile source vs target; file Jira tickets on failures.
- Tools mastered — SQL, Excel, Jira, one BI tool (Tableau / Power BI).
- Comp band (US) — $60-90K base.
- Interview focus — the eight test types, basic SQL (joins, GROUP BY, window functions), 1-2 DQ tool names.
Rung 2 — senior ETL tester (3-7 years).
- Daily work — design the test suite for new pipelines; pick the DQ tool stack; mentor juniors; own CI gates.
- Tools mastered — SQL, Python (pytest + pandas + requests), dbt tests, Great Expectations or Soda Core, Airflow basics.
- Comp band (US) — $90-140K base.
- Interview focus — opinionated tooling answers, the seven stages, reconciliation tolerance bands, regression discipline.
Rung 3 — data engineer / analytics engineer (the transition).
- Daily work — own pipelines end-to-end (ingest + transform + test + deploy); run on-call.
- Tools mastered — SQL, Python, dbt (models + tests + macros), Airflow / Dagster (DAGs + sensors), one warehouse deeply (Snowflake / BigQuery / Redshift), one cloud (AWS / GCP / Azure).
- Comp band (US) — $130-220K base (much higher at FAANG / fintech).
- The transition criterion — you ship a production pipeline you own (not just a test suite); promote yourself by demonstrating it.
The etl tester to data engineer transition recipe.
-
Step 1 — deepen Python beyond
pytestintopandas,requests,SQLAlchemy; ship a 200-line Python script that does ingest + transform + load. - Step 2 — own a dbt project end-to-end; models + tests + docs + CI; check it into a public GitHub repo.
- Step 3 — add an orchestrator (Airflow or Dagster); ship a DAG with sensors, retries, and SLAs.
- Step 4 — learn one warehouse deeply; performance-tune one model, document the before/after, write it up.
- Step 5 — add a cloud certification (AWS Data Engineer Associate, GCP PDE, Azure DP-203); helps clear ATS filters at larger orgs.
-
Step 6 — apply for
analytics engineerroles first; the role bridges testing + transformation and is the natural next step.
Certification ROI — the honest answer.
- Highest ROI — a public portfolio (GitHub repo with dbt project, GE suite, Airflow DAG); demonstrably > any cert.
- Mid ROI — dbt Analytics Engineering Certification (free); AWS Certified Data Engineer Associate; GCP Professional Data Engineer; Azure DP-203.
- Lower ROI but adjacent — ISTQB Foundation / Advanced (signals test-design fluency; not data-specific).
- The bar — at smaller / mid-size orgs, your portfolio matters more; at FAANG / large enterprises, a cert clears the ATS filter before the portfolio matters.
- Rule of thumb — pick one cert that matches the cloud / warehouse stack you're targeting; spend the rest of the budget on shipping a portfolio.
etl testing interview questions — interview-day playbook
The day-of preparation checklist.
- The 90-second pitch — "I'm an ETL tester transitioning into data engineering. I've shipped (X dbt models / Y GE suites / one production Airflow DAG); the bug I'm proudest of catching was (Z)."
- The seven-stage map — be able to draw it on a whiteboard from memory; this is the structural answer to most behavioural prompts.
- One bug story — pick a real bug you caught; have the source data, the failing test query, the fix, and the new regression test ready as a 3-minute story.
- One opinionated tooling answer — "I prefer dbt tests + Great Expectations over Soda Core because (X)"; never say "all four are good".
-
Tolerance band answers — when asked "how do you compare two amounts?", say "
ABS(a - b) / NULLIF(ABS(a), 0) < 0.0001, never raw equality"; this is the senior signal.
Behavioural questions and the STAR-with-data spin.
- "Tell me about a bug you caught." — STAR + the actual SQL query that surfaced it; bring the predicate.
- "Tell me about a time you missed a bug." — credibility comes from owning misses; describe the test that would have caught it.
- "How do you keep your test suite from becoming flaky?" — discuss tolerance bands, idempotency, deterministic transforms, snapshot baselines.
- "How do you onboard a new pipeline?" — walk the seven-stage map; assert all seven gates on day one.
-
"How would you test late-arriving data?" —
delta IN (expected_set)rather thandelta = 0; partitioned snapshots; tolerance windows that widen for older partitions.
Coding-round prep.
- SQL — practice the four-rule schema parity loop, the three-family completeness loop, the rule-based assertion, and the snapshot diff until you can write each on paper in 60 seconds.
- Python — practice the triple-gate harness (perf + recon + regression) from section 5; this is the canonical senior coding question.
-
System design — practice "design the test suite for a new
fact_orderspipeline" — walk the seven stages, name tools per stage, draw the CI gates.
Final-round signals interviewers chase.
- You name tradeoffs, not just tools — every tool has a strength and a weakness, name both.
-
You quote tolerance bands, not exact equality — finance:
0.0001, marketing:0.001, clickstream:0.01. -
You re-derive in your assertions —
order_total = unit_price * qty - discount, not just "trust the stored value". - You have a regression story — every PR re-runs the prior suite; you never short-circuit.
- You can talk to ownership — who owns the test, who owns the alert, who owns the runbook; senior testers always know.
SQL
Topic — etl
ETL interview-day drills
SQL
Topic — sql
SQL practice library
Solution Using a six-step transition roadmap + day-of checklist
Code.
-- Materialise the career path as a roadmap table; review weekly.
CREATE TABLE etl_to_de_roadmap AS
SELECT * FROM (VALUES
(1, 'Python beyond pytest', 'pandas + requests + SQLAlchemy', 'ship a 200-line ingest+transform+load script'),
(2, 'Own a dbt project', 'models + tests + docs + CI', 'public GitHub repo, README + screenshots'),
(3, 'Add an orchestrator', 'Airflow or Dagster', 'DAG with sensors + retries + SLA'),
(4, 'Deep-dive one warehouse','Snowflake or BigQuery or Redshift','perf-tune one model, document before/after'),
(5, 'One cloud cert', 'AWS DEA / GCP PDE / Azure DP-203','clears ATS at larger orgs'),
(6, 'Apply analytics engineer first','bridges testing + transformation','natural next step from senior tester')
) AS t(step, milestone, tools, artefact);
Step-by-step trace.
| step | milestone | tools | artefact |
|---|---|---|---|
| 1 | Python beyond pytest | pandas + requests + SQLAlchemy | 200-line ingest+transform+load script |
| 2 | Own a dbt project | models + tests + docs + CI | public GitHub repo |
| 3 | Add an orchestrator | Airflow or Dagster | DAG with sensors + retries + SLA |
| 4 | Deep-dive one warehouse | Snowflake / BigQuery / Redshift | perf-tune one model, document |
| 5 | One cloud cert | AWS DEA / GCP PDE / Azure DP-203 | clears ATS |
| 6 | Apply analytics engineer first | bridges testing + transformation | natural next step |
- Step 1 — Python skills beyond
pytestare the floor; withoutpandas+requests, you're a tester, not an engineer. - Step 2 — a public dbt project is the strongest single artefact you can ship for the transition.
- Step 3 — an orchestrator DAG turns "I ran the script" into "I own the pipeline".
- Step 4 — depth on one warehouse beats breadth across five; pick the one your target companies use.
- Step 5 — one cert clears the ATS at larger orgs; more than one is diminishing returns.
- Step 6 —
analytics engineeris the bridging role; apply there first.
Output.
| step | milestone | artefact |
|---|---|---|
| 1 | Python beyond pytest | 200-line script |
| 2 | Own a dbt project | public repo |
| 3 | Add an orchestrator | DAG with SLAs |
| 4 | Deep-dive one warehouse | perf-tune writeup |
| 5 | One cloud cert | ATS clear |
| 6 | Apply analytics engineer | natural next step |
Why this works — concept by concept:
- Artefact-driven — every step produces a thing you can put in a portfolio; "I learned X" is weaker than "here is the GitHub link".
- Sequenced — Python before dbt before Airflow; the order matters because each step builds on the previous.
- Pick one warehouse + one cloud — depth over breadth; interviewers respect depth.
-
Analytics engineer first — the bridging role is easier to land than full
data engineer; once you're an AE, the DE jump is internal. - Cost — 6-12 months of nights-and-weekends to complete the roadmap; faster if your day job overlaps any step.
Choosing the right ETL test (cheat sheet)
A one-screen cheat sheet for etl testing interview questions — pick the test that matches the failure mode you're worried about.
| You want to catch … | Test family | Canonical query / tool | Cadence |
|---|---|---|---|
| Renamed / dropped column | metadata + schema parity |
information_schema.columns + frozen contract |
every PR |
Type drift (INT → VARCHAR) |
metadata + schema parity | information_schema.columns |
every PR |
| Missing rows / row drops | row count parity | COUNT(*) source vs target |
every load |
| Skewed transforms (one region drops) |
GROUP BY row count |
COUNT(*) GROUP BY region |
every load |
| Arithmetic transform bugs | aggregate parity |
SUM / MIN / MAX / AVG with tolerance |
every load |
| Status normalisation bugs | reference-table comparison | LEFT JOIN expected_output |
every load |
| Business-rule violations | rule-based assertion | WHERE order_total != unit_price * qty - discount |
every load |
| Silent drift (no rule, no reference) | snapshot diff | FULL OUTER JOIN yesterday's snapshot |
every load |
| SLA breach (load too slow) | performance gate | load_seconds < sla_seconds |
every load |
| Source vs target ledger drift | reconciliation |
SUM(amount) GROUP BY region with 0.0001 tol |
every load |
| Regression on PR merge | regression diff | FULL OUTER JOIN candidate vs baseline |
every PR |
| Distribution shift on a column | DQ framework (statistical) | Great Expectations expect_column_mean_to_be_between |
every load |
| Cross-table observability | DQ framework (observability) |
Monte Carlo, Bigeye, Anomalo
|
continuous |
| PR-gated row-level diff | data-diff tool |
dbt audit_helper or datafold data-diff
|
every PR |
| Late-arriving data tolerance | partitioned snapshot diff | delta IN (expected_set) |
every load |
Frequently asked questions
How is this deep-dive different from a quick ETL testing Q&A round-up?
A quick etl testing interview questions cheat sheet covers the eight etl testing types, the four validation rule families, and the canonical eight-table SQL loop in one sitting — perfect for last-minute review. This deep-dive walks the same territory at 7 numbered teaching stages (metadata + schema, completeness + row count, transformation logic, performance + reconciliation + regression, DQ frameworks, ETL tester career path), with full worked examples, tolerance-band derivations, and tool-by-tool tradeoffs. Pick the deep-dive when you have a week to prepare, want to teach the material in a senior loop, or need the seven-stage map memorised. Pick the cheat-sheet round-up the night before. The two formats are complements, not duplicates — same topic, different depth and structure.
What's the canonical seven-stage map of ETL testing?
The seven stages are: (1) metadata + schema testing — column existence, type match, nullability, PK / FK; (2) completeness + row-count testing — raw COUNT(*), GROUP BY counts, aggregate parity with tolerance; (3) transformation logic testing — reference-table comparison, rule-based assertion, snapshot diff; (4) performance testing — volume, stress, concurrency, endurance vs SLA; (5) data reconciliation — source vs target ledger with 0.0001 tolerance band; (6) regression testing — snapshot-based diff against main, run on every PR; (7) DQ frameworks — dbt tests, Great Expectations, Soda Core, Monte Carlo. Memorise the seven; every senior interview-day question maps to one of them, and most bugs touch three or four at once. The seven-stage map is the structural answer interviewers reward when they ask "walk me through your test suite".
What tolerance band should I use for ETL reconciliation testing?
The industry default is 0.0001 (0.01%) relative tolerance for financial reconciliation; that is, ABS(src - tgt) / NULLIF(ABS(src), 0) < 0.0001. Use 0.001 (0.1%) for marketing / engagement metrics where late-arriving data and minor rounding are acceptable. Use 0.01 (1%) for clickstream / event-stream data where high-cardinality late events are normal. Never use raw equality on floating-point aggregates — floating-point arithmetic, currency conversions, and timing windows all introduce tiny noise that masks no real bug. Tighten the band on financial models (some teams go to 0.00001 on tax + GAAP-reported numbers); loosen it on analytical pipelines. The canonical interview answer is "I use a 0.01% relative tolerance band on reconciliation, with NULLIF to guard against divide-by-zero, and I tighten or loosen by sector" — that single sentence is the senior signal.
How does etl tester differ from data engineer on the org chart?
ETL tester owns the assertions, the reconciliation queries, the CI gates, and the bug-triage runbook. Data engineer owns the pipeline itself — the ingestion, the transformation, the deployment, the on-call rotation — plus collaborates with the tester on the test suite. At smaller orgs the two roles are one person; at larger orgs they're separate. The day-to-day differences: testers spend most of their time in SQL + DQ tool YAML + Jira; engineers spend most of theirs in dbt models + airflow DAGs + terraform + python. Comp differences in 2026: testers ~$70-140K base, engineers ~$130-220K base (US, much higher at FAANG / fintech). The transition path covered in section 7 — Python, dbt, Airflow, warehouse depth, one cloud cert — is well-trodden; most senior testers can move into analytics engineering within 9-12 months of focused work.
Which DQ framework should I pick — Great Expectations, Soda Core, dbt tests, or Monte Carlo?
There is no single winner — pick the tool that matches your stack and team. dbt tests is the strongest default if you already use dbt; built-in tests (not_null, unique, accepted_values, relationships) cover ~70% of column invariants, and dbt-expectations extends to range / regex / distribution. Great Expectations wins on expressive power — 50+ built-in expectations covering null, uniqueness, range, regex, distribution, statistical checks — and auto-generates HTML Data Docs; reach for it when dbt's built-ins aren't enough. Soda Core is the most analyst-friendly — YAML-first English-like assertions; great for hybrid teams. Monte Carlo is the leading paid observability platform — anomaly detection + lineage + incident triage; reach for it when you have > 100 tables and need reactive alerting on unknown failure modes. Most modern stacks combine dbt tests for column invariants + Great Expectations for distributions + Monte Carlo for cross-cutting observability — the hybrid is the senior answer.
What's the fastest path from etl tester to data engineer in 2026?
Six steps, 9-12 months. (1) Deepen Python beyond pytest into pandas, requests, SQLAlchemy; ship a 200-line ingest + transform + load script in a public GitHub repo. (2) Own a dbt project end-to-end — models + tests + docs + CI; the repo + a README with screenshots is your strongest single portfolio artefact. (3) Add an orchestrator — Airflow or Dagster; ship a DAG with sensors, retries, and an SLA. (4) Deep-dive one warehouse — Snowflake, BigQuery, or Redshift; performance-tune one model, document before/after. (5) Pick one cloud cert — AWS Certified Data Engineer Associate, GCP Professional Data Engineer, or Azure DP-203; one cert clears ATS at larger orgs, more than one is diminishing returns. (6) Apply for analytics engineer roles first; the role bridges testing + transformation and is the natural next step from senior tester. The single highest-ROI investment is not more certs — it's the public portfolio.
Practice on PipeCode
PipeCode ships 450+ data-engineering interview problems — including SQL + Python drills keyed to the same etl testing interview questions skill set this guide teaches (schema parity, row count parity, aggregate reconciliation, regression snapshots, DQ framework wiring, performance gates). Whether you're drilling etl testing questions and answers the night before a screen or grinding the etl tester to data engineer transition over 12 months, the practice library mirrors the same seven-stage mental model — plus the dbt tests + Great Expectations + Soda Core + Monte Carlo tooling you'll wire into your production suite.
Kick off via Explore practice →; drill the SQL practice lane →; fan out into the data-validation lane →; rehearse ETL drills →; reinforce aggregation reconciliation patterns →; widen coverage on the full Python practice library →.





Top comments (0)