DEV Community

Cover image for Data Quality Frameworks: Great Expectations vs dbt Tests vs Soda Core
Gowtham Potureddi
Gowtham Potureddi

Posted on

Data Quality Frameworks: Great Expectations vs dbt Tests vs Soda Core

Picking a data quality framework in 2026 is no longer a "should we test our pipelines?" debate — every senior data engineer knows the answer. The real debate is which framework: Python-native Great Expectations with its expectations + suites + checkpoints, warehouse-native dbt tests living next to models, or YAML-first Soda Core with SodaCL checks and the optional Soda Cloud dashboard. Each one solves the same problem — catch the broken row before the dashboard ships it — but each carries a different team shape, a different runtime contract, and a different blast radius when a test fails.

This guide is the dialect matrix you wished existed the first time a stakeholder asked "why is the metric off by 12%?" and the answer was a NULL row that slipped through ingestion. It walks the data quality testing vocabulary across the three frameworks (an Expectation vs a generic test vs a check is the same idea wearing three name tags), the expectations suite structure and how it maps to dbt's YAML, the generic tests library and how to extend it, and the coverage strategy that prevents alert fatigue. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, and a concept-by-concept breakdown of why it works.

PipeCode blog header for a data quality framework comparison — bold white headline 'Data Quality Frameworks' with subtitle 'great expectations · dbt tests · soda core' and a three-orb constellation (purple, orange, green) joined by glowing lines on a dark gradient with a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, drill the data validation practice library →, rehearse on NULL handling problems →, and stack the schema muscles with validation drills →.


On this page


1. Why pick a data quality framework at all

Silent data corruption is the most expensive class of pipeline bug — and BI dashboards are the worst place to catch it

The one-sentence invariant: the cost of catching a bad row at the source is one developer-hour; the cost of catching it in the executive dashboard is one quarter of trust. A data quality framework moves the assertion from "the analyst noticed the number looked weird" to "the pipeline failed at 03:14 and paged the on-call before the report ran" — and that shift is what separates a team that ships metrics from a team that explains metrics.

The four categories of data quality checks.

  • Row-level. "Every row of orders has a non-NULL order_id." Cheap, deterministic, the 80% of coverage.
  • Column-level. "The amount column is between 0 and 1,000,000 in 99.9% of rows; the status column belongs to the enum {placed, paid, shipped, cancelled}." Catches schema drift and bad enums.
  • Freshness. "The events table has a row in the last 60 minutes." Catches stalled ingestion before the report consumer notices.
  • Schema / contract. "The orders table has exactly these columns with these types in this order." Catches upstream renames, drops, and silent type changes — the hardest class to debug after the fact.

Tests live next to code, not in BI dashboards.

The anti-pattern is a Looker / Power BI / Tableau dashboard with a "data quality" tab that runs the same SQL on the same warehouse and turns red when something breaks. The dashboard is downstream — by the time it turns red, the bad data has already been read by other consumers, the metric snapshot has been computed, and the broken row has propagated. Push the assertion to the place where the data is produced (ingestion, transform, contract boundary) and the bad row never enters the warehouse in the first place.

Build vs buy vs adopt-an-OSS-framework.

  • Build — write your own checks as SQL UNION ALL queries that return failing rows. Cheap to start; expensive to maintain; no shared vocabulary across teams.
  • Buy — Monte Carlo, Bigeye, Anomalo. Strong on anomaly detection and lineage; sticker shock; lock-in.
  • Adopt OSS — Great Expectations, dbt tests, Soda Core. Zero vendor cost; ecosystem of community-contributed assertions; lives next to your code in version control. This guide is about the OSS lane.

Why "we already have dbt tests" is rarely enough alone.

dbt tests are excellent for the warehouse-native transform layer — every model already has YAML, so co-locating tests next to columns is a low-friction win. But dbt only knows what runs in dbt build. It cannot test the ingest layer (the Airflow task that lands the file), it cannot run pre-warehouse assertions, and its anomaly / distribution muscles are weaker than what GE or Soda ship out of the box. Most mature teams end up combining: dbt tests at transform, GE or Soda at the edges.

Worked example — three places NULL can break a dashboard and the four assertion families that catch each one

Detailed explanation. New analysts often assume a single dbt test on the mart table is enough. The reality is that the same NULL row can be introduced at ingest (the source API silently changed and started returning null), at transform (a LEFT JOIN with missing right-side rows), or at contract drift (a column renamed upstream). Each of those failures needs a different assertion family — and a single test at the end of the pipeline catches them too late to attribute the source.

Question. Given a daily_revenue mart table that reads from staging_orders and staging_payments, list the four data-quality test families and the failure mode each one catches first. Show which framework excels at each layer.

Input.

Layer Table Common failure mode
Ingest raw_orders Upstream API returns NULL customer_id
Transform staging_orders LEFT JOIN with missing right side leaks NULL
Mart daily_revenue SUM over NULL group yields 0 instead of NULL
Contract daily_revenue Downstream BI expects revenue column; it was renamed to gross_revenue

Code.

# dbt tests (mart layer) — runs inside `dbt build`
models:
  - name: daily_revenue
    columns:
      - name: revenue
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"
      - name: day
        tests:
          - unique
          - not_null
Enter fullscreen mode Exit fullscreen mode
# Great Expectations (ingest layer) — runs inside an Airflow PythonOperator
suite = context.add_expectation_suite("raw_orders_ingest")
batch.expect_column_values_to_not_be_null("customer_id")
batch.expect_column_values_to_be_in_set(
    "status", ["placed", "paid", "shipped", "cancelled"]
)
batch.expect_table_row_count_to_be_between(min_value=1, max_value=10_000_000)
Enter fullscreen mode Exit fullscreen mode
# Soda Core (contract / freshness) — runs as a CLI step
checks for daily_revenue:
  - row_count > 0
  - schema:
      fail:
        when required column missing: [day, revenue]
  - freshness(day) < 1d
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The ingest test (GE) catches the upstream API change before the row ever reaches the warehouse — the Airflow task fails at 03:14 instead of the daily report being wrong at 09:00.
  2. The transform test (dbt) catches a NULL leak inside staging_ordersdbt build halts the DAG and the mart never gets the bad day.
  3. The mart test (dbt + Soda) catches the SUM-of-NULL pathology — revenue >= 0 would have been satisfied silently if SUM(NULL) is NULL; the not_null test on the day makes the failure loud.
  4. The contract test (Soda schema check) catches the column rename — the daily_revenue table either has the revenue column or the run fails.
  5. Each layer needs the assertion to live at that layer. A single mart-level test sees only the end result and cannot say which upstream layer broke.

Output.

Layer Best-fit framework Why it wins here
Ingest Great Expectations Python-native, pluggable into Airflow / Dagster operators
Transform dbt tests YAML next to model, runs as part of dbt build
Mart dbt + Soda dbt for column-level, Soda for freshness + row-count
Contract Soda schema / dbt contracts Declarative schema assertion, no glue code

Rule of thumb. Map every dataset to the four assertion families before writing a single test. If a category is empty, that is the silent class of bug that will bite next quarter. Pick the framework per layer based on where it plugs in cleanly, not by which one your favourite blogger uses.

Worked example — the cost of not having a DQ framework

Detailed explanation. Teams often justify "we don't need a DQ framework — our analysts catch issues" until the first big silent outage. The math is unforgiving: a single bad metric that runs for three days on a board-level dashboard costs more in lost trust than three years of GE / Soda licences ever would (and both are free).

Question. Quantify the engineering cost of catching a bad row at four points: at ingest, in the warehouse, in the BI tool, and at the executive review. Show the time-to-detect and time-to-fix multiplier.

Input.

Catch point Avg time-to-detect Avg time-to-fix Trust cost
Ingest test fires 5 min 30 min none
Warehouse transform fails 30 min 60 min small
BI tool shows wrong number 4 hours 4 hours medium
Executive review flags it 3 days 1 day high

Code.

# A trivial Great Expectations check on the ingest side
import great_expectations as gx

context = gx.get_context()
asset = context.sources.add_pandas("ingest").add_dataframe_asset("raw_orders")
batch_request = asset.build_batch_request(dataframe=df)

suite = context.add_expectation_suite("raw_orders_smoke")
validator = context.get_validator(batch_request=batch_request, expectation_suite=suite)

validator.expect_column_values_to_not_be_null("customer_id")
validator.expect_column_values_to_be_between("amount", min_value=0, max_value=10_000_000)
validator.expect_table_row_count_to_be_between(min_value=100)

results = validator.validate()
if not results.success:
    raise RuntimeError("ingest DQ failed — quarantining batch")
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Catching the bad row at ingest fires within 5 minutes — the Airflow task fails, the on-call sees a single page, the fix is "rerun after upstream is patched."
  2. Catching it at the transform layer (dbt test on staging) fires within 30 minutes — slightly more cost because some downstream models may have started building.
  3. Catching it at the BI layer is hours later because dashboards refresh on a schedule, and humans only look at certain dashboards on certain days.
  4. The worst case — exec review — is days later, and the cost includes a board-level apology, an "explainability" retro, and weeks of rebuilt trust in the affected metric.
  5. The order of magnitude difference between "ingest fail" and "exec review fail" is roughly 1,000x in eng-hours and "infinite" in trust.

Output.

Catch point Cost multiplier
Ingest 1x
Warehouse 4x
BI 50x
Executive 1,000x+

Rule of thumb. Every dataset deserves a minimum of not_null + unique + freshness at the ingest or staging boundary. Adding those three lines of YAML / Python costs minutes; missing them costs quarters of trust the first time a row slips through.

SQL interview question on detecting a silent NULL leak

A senior interviewer often opens with: "You shipped a daily_revenue table to the exec dashboard last week. This morning the CFO says the number is off by 12%. Walk me through the audit: where would you look first, and what assertions would have caught it earlier?"

Solution Using a layered audit + retroactive DQ assertion plan

-- 1) Audit the mart for NULL leaks per column
SELECT
    COUNT(*)                                   AS total_rows,
    COUNT(*) - COUNT(customer_id)              AS null_customer,
    COUNT(*) - COUNT(revenue)                  AS null_revenue,
    SUM(CASE WHEN revenue < 0 THEN 1 ELSE 0 END) AS negative_revenue
FROM daily_revenue
WHERE day = CURRENT_DATE - 1;

-- 2) Audit the staging table for upstream drift
SELECT
    COUNT(*) - COUNT(amount) AS null_amount_in_staging,
    COUNT(*) FILTER (WHERE status NOT IN
        ('placed','paid','shipped','cancelled')) AS bad_enum_status
FROM staging_orders
WHERE created_at >= CURRENT_DATE - 1;

-- 3) Compare against the contract: row_count delta day-over-day
WITH day_counts AS (
  SELECT day, COUNT(*) AS row_count
  FROM daily_revenue
  GROUP BY day
)
SELECT day, row_count,
       row_count - LAG(row_count) OVER (ORDER BY day) AS delta
FROM day_counts
ORDER BY day DESC
LIMIT 7;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Step Finding Implied source layer
Mart audit 12% rows have NULL customer_id Either ingest or transform
Staging audit null_amount_in_staging = 0, bad_enum_status = 4,800 Ingest layer — upstream API drift
Row-count delta delta = -12,000 (down 12% vs prior week) Confirms the silent drop

The audit narrows the source layer to ingest in three queries. The fix is twofold: a one-time backfill for the affected days, and a permanent GE expectation on the ingest table to prevent recurrence.

Output:

Layer Retroactive assertion
Ingest (GE) expect_column_values_to_be_in_set("status", ["placed","paid","shipped","cancelled"])
Staging (dbt) accepted_values test on status with the same enum
Mart (Soda) row_count anomaly check with a 5% deviation threshold

Why this works — concept by concept:

  • Layered audit, not single query — running the same NULL count at each layer reveals where the NULLs were introduced. A single mart-level query only confirms "something broke."
  • Enum drift detectionbad_enum_status = 4,800 proves the upstream API started returning an undocumented status. This is the kind of failure schema-only contracts miss.
  • Row-count delta — comparing today's count to the trailing window of comparable days exposes silent drops that pass column-level checks. A row that is "missing" never appears in NULL counts.
  • Retroactive assertions become the regression suite — every audit finding becomes a permanent test. The first time the bug is caught is at exec review; the second time, the pipeline catches it before any consumer sees the bad day.
  • Cost — three short SQL queries (O(rows) each). The retroactive tests add zero marginal cost to dbt build and seconds to GE / Soda runs.

SQL
Topic — data validation
Data validation problems (SQL)

Practice →

SQL Topic — null handling NULL handling problems (SQL)

Practice →


2. The DQ framework landscape in 2026

Five open-source frameworks dominate — pick by team shape and stack, not by feature count

The mental model in one line: Great Expectations is Python-native, dbt tests are warehouse-native, Soda Core is YAML-first and multi-source — every other contender is either a specialist (pandera, Deequ) or a thin wrapper around one of those three. Once you can pattern-match a team's stack to one of those archetypes, the framework choice is mostly a fait accompli.

Landscape map showing five data quality framework cards arranged in a hexagonal cluster — a central 'DQ frameworks' badge surrounded by Great Expectations, dbt tests, Soda Core, pandera, and Deequ cards, each with a tiny domain icon, on a light PipeCode card.

Great Expectations (GE) in one paragraph.

  • Vocabulary. An Expectation is a single assertion. A Suite bundles expectations for a dataset. A Checkpoint binds a Suite to a runtime execution and ships the results to Data Docs (the auto-generated HTML report).
  • Strengths. Python-native, integrates with Pandas, Spark, SQLAlchemy, and every warehouse. Largest community-contributed expectation library. Data Docs is the gold standard for human-readable test reports.
  • Weaknesses. Heaviest of the three on setup ceremony; "Data Context" objects can feel ceremonious; recent v3 migration shed weight but added migration burden for v2 users.
  • Team shape it assumes. A Python-heavy team running Airflow / Dagster / Prefect, comfortable with config-as-code, and willing to invest a sprint in onboarding.

dbt tests in one paragraph.

  • Vocabulary. A generic test is a reusable macro that returns failing rows (built-ins: not_null, unique, accepted_values, relationships). A singular test is a single SQL file that returns failing rows. A custom generic test is a macro you write that any model can reuse.
  • Strengths. Lives in YAML next to the model definition — zero context switch. Runs as part of dbt build, which means tests gate the next model. dbt_utils and dbt_expectations packages multiply the built-in library tenfold.
  • Weaknesses. Warehouse-only — cannot test before the data lands. Anomaly / distribution detection is weak without dbt_expectations. Severity and warn-vs-fail semantics are simpler than GE / Soda.
  • Team shape it assumes. A warehouse-first team already running dbt for transforms — adding tests is a no-op decision because the YAML is already there.

Soda Core in one paragraph.

  • Vocabulary. A check is a single SodaCL clause (row_count > 0, missing_count(col) = 0). A scan is one run over a dataset. User-defined metrics extend SodaCL with custom SQL.
  • Strengths. YAML-first, dialect-agnostic (SodaCL compiles to whatever SQL the source dialect speaks). CLI-friendly — drops into Airflow / GitHub Actions / cron without ceremony. Soda Cloud (optional, paid) adds a dashboard and Slack alerts on top of the OSS engine.
  • Weaknesses. Smaller community than GE; some advanced expectations (distribution comparisons, partial-suite reuse) require Soda Cloud or DIY SQL.
  • Team shape it assumes. A multi-source team that wants a single declarative file per dataset across Postgres, Snowflake, BigQuery, Spark, with optional managed dashboard.

The lightweight neighbours.

  • pandera — DataFrame validation library for pandas / Polars / PySpark. Strong typing, schema inference, in-memory only. Use when the assertion lives in a Python script, not in the warehouse.
  • Deequ — Spark/Scala library from AWS, ports of which exist in PyDeequ. Strong on big-data anomaly detection. Best when the dataset is already a Spark DataFrame.
  • Elementary — dbt-native observability layer on top of dbt tests. Adds anomaly detection, lineage, and a dashboard. Treat as a dbt augmentation, not a standalone framework.
  • Pydantic — application-layer schema validation. Not a DQ framework per se; pairs well at the API / ingest boundary.

The terminology cheat sheet.

Concept GE dbt Soda
One assertion Expectation Generic test (per column) Check
Bundle of assertions Suite YAML test block on a model Checks-file (per dataset)
One run Checkpoint dbt test / dbt build Scan
Custom assertion Custom expectation Custom generic test (macro) User-defined metric
Severity Severity (action level) Severity (warn / error) Warn / Fail clauses
Result store Data Docs + Validation Result store run_results.json + manifest Soda Cloud / file / table

Team-shape decision matrix.

  • Python team, Airflow / Dagster, multi-source. GE wins. Pluggable into the orchestrator, expectation library beats everything.
  • dbt-shop, warehouse-native, transform-heavy. dbt tests + dbt_expectations. Don't bring in a second framework until you hit anomaly or freshness needs that dbt can't cover.
  • Polyglot team, YAML-first, multi-source, want a managed dashboard. Soda Core (free) + Soda Cloud (paid for the dashboard). The CLI scales from one dataset to one hundred without re-architecture.
  • Data-science / DataFrame-heavy, Pandas / Polars. pandera. Treat schema as code at the boundary of every script.
  • Big-data Spark shop. Deequ. Native to the engine you're already running.

Worked example — the same assertion in three frameworks

Detailed explanation. The fastest way to internalise the three vocabularies is to write the same assertion in all three. The assertion: "orders.amount is not NULL and is greater than 0." This is the most common production check, and seeing it spelled out three ways highlights where each framework's syntax becomes ergonomic.

Question. Translate "orders.amount IS NOT NULL AND orders.amount > 0" into a Great Expectations Suite, a dbt YAML test block, and a Soda Core checks file. Which is shortest? Which is most readable?

Input.

order_id amount
1 50
2 0
3 NULL
4 25

Code.

# Great Expectations — Python suite
suite = context.add_expectation_suite("orders_amount")
validator = context.get_validator(
    batch_request=orders_batch_request,
    expectation_suite=suite,
)
validator.expect_column_values_to_not_be_null("amount")
validator.expect_column_values_to_be_between(
    "amount", min_value=0, strict_min=True
)
validator.save_expectation_suite()
Enter fullscreen mode Exit fullscreen mode
# dbt — schema.yml next to the model
models:
  - name: orders
    columns:
      - name: amount
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "> 0"
Enter fullscreen mode Exit fullscreen mode
# Soda Core — checks file
checks for orders:
  - missing_count(amount) = 0
  - min(amount) > 0
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. GE's expression is the most verbose because it carries the Suite + Validator + Batch ceremony. The trade-off: every assertion ships with rich metadata and a Data Docs page automatically.
  2. The dbt version is the shortest if you accept the dbt_utils dependency for expression_is_true. The built-in not_null test handles half the assertion; the other half needs an expression macro.
  3. Soda Core sits between the two on verbosity. missing_count = 0 is the SodaCL idiom for "not null"; min > 0 is more declarative than a per-row predicate but means the same.
  4. All three produce the same failing row in the input table: order_id = 2 (amount = 0) and order_id = 3 (amount NULL).
  5. Readability is taste — but YAML wins for "non-engineer can read the assertion" by a wide margin.

Output.

Framework Lines of code Failing rows detected
Great Expectations 9 2 (order_id 2, 3)
dbt tests 6 2 (order_id 2, 3)
Soda Core 3 2 (order_id 2, 3)

Rule of thumb. When the assertion is purely column-level (not_null, range, enum), all three frameworks are roughly equivalent. The choice comes down to where the assertion runs and who maintains it.

Worked example — distribution / anomaly checks where the frameworks diverge

Detailed explanation. "Yesterday's row count was 10,000; today's is 8,000 — is that a 20% drop or normal weekend dip?" This is a distribution check, not a per-row assertion. Each framework's answer is materially different.

Question. Write a "row count must be within 10% of the trailing 7-day average" assertion in GE, dbt (with dbt_expectations), and Soda. Show which is most ergonomic.

Input.

day row_count
2026-06-08 10,000
2026-06-09 10,200
2026-06-10 9,800
2026-06-11 10,500
2026-06-12 9,900
2026-06-13 10,100
2026-06-14 10,000
2026-06-15 8,000

Code.

# Great Expectations — using a custom expectation
validator.expect_table_row_count_to_be_between(
    min_value=9000, max_value=11000,
    meta={"reason": "trailing 7-day average +/- 10%"}
)
# In practice teams wrap this in a custom expectation that
# reads the trailing window from a metadata table.
Enter fullscreen mode Exit fullscreen mode
# dbt — using dbt_expectations
models:
  - name: daily_revenue
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 9000
          max_value: 11000
Enter fullscreen mode Exit fullscreen mode
# Soda Core — using SodaCL anomaly clause
checks for daily_revenue:
  - anomaly score for row_count < 0.7
  - change for row_count > -10% and change for row_count < 10%
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. GE makes you compute the trailing average yourself (or in a custom expectation) — the built-in expect_table_row_count_to_be_between takes static bounds.
  2. dbt + dbt_expectations is the same — expect_table_row_count_to_be_between is the closest built-in. Anomaly detection requires either Elementary or a singular test that joins against the trailing window.
  3. Soda Core ships native anomaly score and change clauses that compute the trailing window for you. This is where Soda materially outperforms the others on declarative anomaly checks.
  4. For the input on 2026-06-15, the row count is 8,000, which is 20% below the 7-day average of 10,071. All three frameworks fire — but Soda fired without the engineer having to write the trailing-window logic.

Output.

Framework Lines of declarative config for trailing-window anomaly Failing day detected
GE needs custom expectation 2026-06-15
dbt + dbt_expectations static bounds only without singular test 2026-06-15
Soda Core 2 (built-in anomaly score + change) 2026-06-15

Rule of thumb. For static / deterministic checks, all three are similar. For trailing-window anomaly checks, Soda is the most ergonomic out of the box; GE catches up with custom expectations; dbt needs Elementary or a singular test.

SQL interview question on choosing a DQ framework

A senior interviewer might frame this as: "Your team runs Airflow + dbt + Snowflake. You already use dbt tests for the warehouse models. Where would you add Great Expectations or Soda Core, and what is the failure mode of each layer if you skip it?"

Solution Using a layered framework decision tree

# Layer 1 — Ingest (Airflow PythonOperator + Great Expectations)
- task_id: ingest_orders_dq
  python_callable: run_ge_checkpoint
  op_kwargs:
    checkpoint_name: raw_orders_smoke
  retries: 0  # fail fast on bad ingest

# Layer 2 — Transform (dbt build runs dbt tests)
- task_id: dbt_build
  bash_command: dbt build --select staging.+ marts.+

# Layer 3 — Mart freshness + row-count anomaly (Soda Core)
- task_id: soda_scan_mart
  bash_command: soda scan -d snowflake checks/marts.yml
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Layer Framework Failure mode if skipped
Ingest Great Expectations Bad rows enter the warehouse; downstream metrics silently wrong
Transform dbt tests NULL leaks across JOIN; marts compute on incomplete data
Mart Soda Core Row-count anomalies / freshness lag go undetected until BI tab refreshes

Each framework owns the layer where it plugs in cleanest. Skip a layer and the failure mode pictured above is what bites you next quarter.

Output:

Layer Test count Run-time budget Blast radius if it fails
Ingest GE 12 expectations ~30s Single batch quarantined
Transform dbt 50 tests ~2 min Single model halts; downstream gated
Mart Soda 8 checks ~20s Single dataset flagged; downstream still runs but alerts fire

Why this works — concept by concept:

  • Layer-per-framework — each framework wins at the layer it was designed for. Forcing GE into the warehouse (or dbt into ingest) costs more than the integration debt of running all three.
  • Fail-fast at ingest — GE Checkpoint fails the Airflow task; downstream dbt run never starts; nothing bad lands in the warehouse.
  • Gating transform with dbt builddbt build runs tests between models, so a failed staging test halts the downstream marts in a single command.
  • Soda for mart freshness + anomaly — declarative freshness < 1d and anomaly score clauses outperform what dbt tests express natively, without writing a custom test.
  • Alert routing per layer — ingest fails go to the data-platform on-call; transform fails go to the analytics-engineer on-call; mart anomalies go to the analytics-engineer and the metric owner.
  • Cost — three framework runs add ~3 minutes to the DAG. The savings: catching the same bad row in BI takes hours, in exec review takes days. Net positive every quarter.

SQL
Topic — validation
Validation problems (SQL)

Practice →


3. Expectations vs tests vs checks — the same idea, three vocabularies

Once you align the three vocabularies on the same row of a matrix, every DQ migration is a syntax exercise — not a semantic one

The mental model in one line: an Expectation, a generic test, and a SodaCL check are isomorphic — same input, same output, same failure semantics, only the names differ. The hard part of DQ work was never the vocabulary; it is the coverage strategy and the alert routing. Once you internalise the synonym table, the migration cost between frameworks falls to a few hours per dataset.

Three vertical columns labelled Great Expectations, dbt tests, and Soda Core, each column showing a stack of three pill labels with tiny icons — expectation/suite/checkpoint for GE, generic/singular/custom for dbt, check/scan/metric for Soda — with a horizontal alignment band tying equivalent concepts together, on a light PipeCode card.

The three vocabularies aligned.

Concept Great Expectations dbt Soda
One assertion Expectation generic test invocation Check
Bundle for a dataset Expectation Suite YAML tests: block under a model Checks file (checks for <dataset>:)
One execution Checkpoint dbt test / dbt build Scan
Reusable assertion logic Custom Expectation class Custom generic test macro User-defined metric (SQL)
One-off assertion Custom Expectation Singular test (single SQL file) Inline SQL inside a check
Severity action_list per result severity: warn or error warn: / fail: clauses
Result artefact Data Docs + ValidationResults run_results.json + manifest Scan results (CLI / Cloud / warehouse table)

Common assertions that exist verbatim in all three.

  • not_null. GE: expect_column_values_to_not_be_null. dbt: not_null. Soda: missing_count(col) = 0.
  • unique. GE: expect_column_values_to_be_unique. dbt: unique. Soda: duplicate_count(col) = 0.
  • accepted_values. GE: expect_column_values_to_be_in_set. dbt: accepted_values. Soda: invalid_count(col) = 0 with valid values: clause.
  • foreign_key / relationships. GE: expect_column_values_to_be_in_set with a query. dbt: relationships. Soda: failed rows query with a join.
  • row_count. GE: expect_table_row_count_to_be_between. dbt: dbt_utils.expression_is_true or singular test. Soda: row_count > 0.
  • freshness. GE: expect_column_max_to_be_between on a timestamp. dbt: freshness block in source config. Soda: freshness(updated_at) < 1d.

Custom logic — three styles.

  • GE custom expectation. Subclass ColumnExpectation or BatchExpectation in Python. Ships with the package; reusable across suites.
  • dbt custom generic test. A Jinja macro that returns failing rows. Lives in tests/generic/ and is invoked from YAML by name.
  • SodaCL user-defined metric. Inline SQL inside a metric_name: block; reusable inside the same checks file or imported.

Severity, thresholds, warn-vs-fail.

  • GE. Each expectation result is success: bool. Severity is configured per Checkpoint via action_list, which can route a "warning" to one Slack channel and a "failure" to another.
  • dbt. Built-in severity: warn / severity: error. Per-test threshold via error_if / warn_if (e.g., error_if: ">= 10" means "fail if 10+ failing rows").
  • Soda. Inline warn: and fail: clauses per check. Multi-level threshold by writing two checks with different bounds.

Common interview probes on vocabulary.

  • "What is the difference between an Expectation and a Suite in GE?" — an Expectation is one assertion; a Suite bundles them for a dataset and is reusable across batches.
  • "What does dbt build do that dbt test does not?" — dbt build interleaves model materialisation and test execution per node, so a failed test halts downstream models. dbt test runs tests independently after all models.
  • "Is Soda Core the same as Soda Cloud?" — Soda Core is the OSS engine (CLI + Python library). Soda Cloud is the paid dashboard / Slack-alerts layer that consumes Soda Core results. You can run Soda Core forever without Soda Cloud.
  • "What is a singular test in dbt vs a generic test?" — singular is one SQL file in tests/; generic is a Jinja macro in tests/generic/ invoked by name from YAML. Singulars are bespoke; generics are reusable.

Worked example — orders.amount > 0 AND amount IS NOT NULL in all three frameworks

Detailed explanation. The "amount must be positive and non-NULL" check is the canonical first assertion teams write. Spelling it in all three frameworks side by side surfaces the syntax differences and makes the migration delta concrete.

Question. Write the assertion in (a) a GE Expectation Suite, (b) a dbt YAML schema block, and (c) a Soda Core checks file. For each, show how to add a severity (warn-vs-fail) so the on-call only pages on >10 failing rows.

Input.

order_id amount
1 50
2 -10
3 NULL
4 25
5 0

Code.

# GE — Expectation Suite with action-list severity
suite = context.add_expectation_suite("orders_amount")
validator = context.get_validator(
    batch_request=orders_batch_request, expectation_suite=suite
)
validator.expect_column_values_to_not_be_null("amount")
validator.expect_column_values_to_be_between(
    "amount", min_value=0, strict_min=True,
    mostly=0.99,  # 99% must pass — under that, fail
)
validator.save_expectation_suite()
# Checkpoint routes: <0.95 -> page; <0.99 -> Slack warn
Enter fullscreen mode Exit fullscreen mode
# dbt — schema.yml with severity and threshold
models:
  - name: orders
    columns:
      - name: amount
        tests:
          - not_null:
              severity: error
              error_if: ">= 10"
              warn_if: ">= 1"
          - dbt_utils.expression_is_true:
              expression: "> 0"
              severity: warn
Enter fullscreen mode Exit fullscreen mode
# Soda — checks file with warn / fail thresholds
checks for orders:
  - missing_count(amount) = 0:
      warn: when between 1 and 9
      fail: when > 9
  - min(amount) > 0:
      warn: when min(amount) <= 0
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Each framework expresses "0% missing, all positive" with two assertions. The first asserts non-NULL; the second asserts positivity.
  2. GE's mostly=0.99 lets the check pass when 99%+ of rows satisfy — the closest GE primitive to "warn at 1 failing row, fail at 10."
  3. dbt's error_if / warn_if is the cleanest per-test threshold expression — semantically "fail if 10 or more failing rows, warn if 1 or more."
  4. Soda's warn: when between 1 and 9 / fail: when > 9 is the most declarative — the threshold reads like English.
  5. All three correctly classify the input: rows 2, 3, 5 are failing (negative, NULL, zero). With 3 failing rows out of 5, the dbt test would warn (≥1) but not error (<10); Soda would do the same; GE with mostly=0.99 would fail because 60% pass < 99%.

Output.

Framework Failing rows Severity classification
GE (mostly 0.99) 3 of 5 Fail (success rate 40% < 99%)
dbt (warn ≥1, error ≥10) 3 Warn
Soda (warn 1-9, fail >9) 3 Warn

Rule of thumb. When migrating between frameworks, the vocabulary is a clean translation but the severity semantics require careful matching. GE thinks in success-rate percentages; dbt and Soda think in failing-row counts. Match the threshold by computing the equivalent.

Worked example — a custom assertion (regex-validate email)

Detailed explanation. A frequent custom-logic check: "every value in users.email matches a basic email regex." None of the three frameworks ship this as a built-in (regex is fast-moving and dialect-specific), so each one's custom-assertion mechanism is exercised.

Question. Write a regex-validation assertion for users.email in (a) a GE custom expectation, (b) a dbt custom generic test, and (c) a SodaCL user-defined metric.

Input.

user_id email
1 alice@x.com
2 not-an-email
3 bob@y.org
4 NULL

Code.

# GE — using built-in expect_column_values_to_match_regex
validator.expect_column_values_to_match_regex(
    "email",
    r"^[^@\s]+@[^@\s]+\.[^@\s]+$",
    mostly=1.0,
)
Enter fullscreen mode Exit fullscreen mode
-- dbt — tests/generic/test_matches_regex.sql
{% test matches_regex(model, column_name, pattern) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} IS NOT NULL
  AND NOT regexp_like({{ column_name }}, '{{ pattern }}')
{% endtest %}

-- Invoked in schema.yml
models:
  - name: users
    columns:
      - name: email
        tests:
          - matches_regex:
              pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
Enter fullscreen mode Exit fullscreen mode
# Soda — user-defined metric with valid regex clause
checks for users:
  - invalid_count(email) = 0:
      valid regex: '^[^@\s]+@[^@\s]+\.[^@\s]+$'
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. GE has expect_column_values_to_match_regex built in — the simplest of the three. Pass the regex and mostly threshold; done.
  2. dbt does not ship regex tests, so a custom generic test wraps regexp_like in a Jinja macro. The macro returns failing rows (the dbt contract for any test); dbt counts them and applies severity.
  3. Soda Core ships valid regex as a clause on invalid_count — almost as concise as GE. The check fails when any non-NULL email fails the regex.
  4. All three skip NULLs by convention — regex validation is a content check, not a presence check. Pair with missing_count = 0 if NULLs should also fail.
  5. For the input, all three flag user_id=2 (not-an-email). NULL user_id=4 is skipped.

Output.

Framework Failing rows Notes
GE 1 Built-in expectation, smallest LOC
dbt 1 Custom generic test, reusable across columns
Soda 1 valid regex clause built-in

Rule of thumb. For common custom logic (regex, range with units, foreign-key with filter), check if the framework already ships it before writing a custom assertion. For bespoke custom logic (cross-table invariants, time-bounded distributions), the custom mechanism is unavoidable — write it once, reuse forever.

SQL interview question on vocabulary alignment

A senior interviewer might ask: "Your team is consolidating a Soda Core checks library into a dbt project for a warehouse-only audit. Walk me through how you'd migrate row_count > 0, missing_count(col) = 0, and freshness(col) < 1d from SodaCL to dbt YAML — and what semantic gaps to flag for the team."

Solution Using a SodaCL-to-dbt migration table + a dbt_expectations bridge

# Original SodaCL checks
checks for daily_revenue:
  - row_count > 0
  - missing_count(revenue) = 0
  - freshness(day) < 1d
  - duplicate_count(day) = 0
Enter fullscreen mode Exit fullscreen mode
# Migrated dbt schema.yml + dbt_utils + dbt_expectations
sources:
  - name: marts
    tables:
      - name: daily_revenue
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
        loaded_at_field: day

models:
  - name: daily_revenue
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
    columns:
      - name: revenue
        tests:
          - not_null
      - name: day
        tests:
          - unique
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

SodaCL clause dbt equivalent Semantic gap
row_count > 0 dbt_expectations.expect_table_row_count_to_be_between (min_value=1) None
missing_count(revenue) = 0 not_null test on column None
freshness(day) < 1d sources.freshness block, error_after: 24h dbt source freshness runs separately via dbt source freshness, not inline in dbt build
duplicate_count(day) = 0 unique test on column None

The migration is direct for three of four clauses. The freshness check moves from inline (Soda) to a separate dbt source freshness command (dbt) — that is the semantic gap to flag in the migration plan.

Output:

Aspect SodaCL (before) dbt (after)
Lines of YAML 4 14
Single-command run soda scan ... needs both dbt build and dbt source freshness
Severity granularity per-clause warn/fail per-test severity: warn/error
Anomaly checks (built-in) anomaly score clause needs Elementary or singular test

Why this works — concept by concept:

  • Vocabulary isomorphismmissing_count = 0 maps directly to not_null; duplicate_count = 0 maps to unique; the semantics are identical.
  • Freshness is split in dbt — source freshness lives in the source config (dbt source freshness), not in dbt build. The migration must add a second command to the orchestrator.
  • dbt_expectations bridges the row-count gap — the built-in dbt test library covers the column-level surface; dbt_expectations adds the table-level checks Soda has natively.
  • Severity translation is direct — Soda's warn: / fail: maps to dbt's severity: warn / severity: error with warn_if / error_if thresholds.
  • Anomaly is the residue — SodaCL's anomaly score and change clauses do not migrate cleanly to dbt. Layer Elementary on top, or accept that anomaly detection moves out of the DQ framework and into a separate observability tool.
  • Cost — same per-row cost (every framework compiles to similar SQL). The migration cost is a one-time YAML rewrite plus one extra command in the DAG for freshness.

SQL
Topic — data validation
Data validation problems (SQL)

Practice →


4. Where they plug into the pipeline

Placement (pre-ingest vs in-transform vs post-load) decides what you can quarantine — and that decision is bigger than the framework choice

The mental model in one line: GE plugs in via Python operators (Airflow / Dagster / Prefect), dbt tests run as part of dbt build in the warehouse, and Soda scans run as a CLI step anywhere in the DAG. Each placement carries a different blast radius when a check fails — fail-fast at ingest vs warn-only at the mart — and that contract is what the on-call cares about most.

Horizontal pipeline showing Source → Ingest → Warehouse → Transform → Mart → BI, with three DQ framework badges (GE, dbt tests, Soda) tagged at the stages where each plugs in, plus a bottom alert lane routing to Slack and PagerDuty icons, on a light PipeCode card.

Three placement archetypes.

  • Pre-ingest. Assertion runs before the data lands in the warehouse — usually inside an Airflow / Dagster task that pulls a file or API response. Fail-fast: the bad batch is quarantined and never enters the table.
  • In-transform. Assertion runs between transform steps — dbt build interleaves models and tests, so a failed test halts the downstream models in the same DAG.
  • Post-load. Assertion runs after the data is in the table, often on a schedule independent of the producing pipeline. Cannot fail-fast — the bad data is already there — but is the only option for cross-pipeline checks like freshness and anomaly.

Great Expectations — Checkpoints in Airflow / Dagster / Prefect.

# Airflow — GreatExpectationsOperator
from airflow_provider_great_expectations.operators.great_expectations import (
    GreatExpectationsOperator,
)

ingest_dq = GreatExpectationsOperator(
    task_id="ingest_orders_dq",
    data_context_root_dir="/opt/ge",
    checkpoint_name="raw_orders_smoke",
    fail_task_on_validation_failure=True,
)

land_to_warehouse = SnowflakeOperator(
    task_id="land_to_warehouse", sql="COPY INTO ...",
)

ingest_dq >> land_to_warehouse  # land only if DQ passes
Enter fullscreen mode Exit fullscreen mode

The Checkpoint binds a Suite to a batch, runs it, ships results to Data Docs, and (optionally) raises an exception when validation fails — which the operator translates into a failed task and a paged on-call.

dbt tests — dbt build interleaving.

# Single command runs every model + test in DAG order
dbt build --select staging.+ marts.+

# Equivalent to (conceptually):
#   for each model in topological order:
#     run model
#     run tests configured on this model
#     if any test fails: halt downstream
Enter fullscreen mode Exit fullscreen mode

The interleaving is dbt build's key feature. dbt run (models only) plus a later dbt test (tests only) does not halt downstream models on a test failure — the difference matters and shows up on senior interviews.

Soda Core — CLI step or Soda Cloud agent.

# CLI step inside Airflow BashOperator
soda scan -d snowflake \
  -c configuration.yml \
  checks/marts.yml

# Or, the Soda Cloud agent runs on a managed schedule
# without needing the DAG to call it explicitly.
Enter fullscreen mode Exit fullscreen mode

The CLI version drops into any orchestrator that can run a shell command. The Soda Cloud agent is the managed alternative — useful when you want a centralised dashboard and Slack routing without writing Airflow tasks per dataset.

Failure semantics — three modes.

  • Fail the run. The task / model / scan fails; the DAG halts; on-call is paged. Use at ingest (GE) and inside dbt build.
  • Warn-only. Log the failure to Data Docs / run_results.json / Soda Cloud; do not halt the DAG. Use for known-noisy checks (anomaly thresholds, soft enums) until the noise is calibrated.
  • Quarantine-and-continue. Bad rows are written to a side table; good rows continue to the main table; the DAG does not fail. Use when partial data is better than no data (e.g., reporting on a partial batch is acceptable; missing the batch entirely is not).

Result stores.

  • GE → Data Docs. Auto-generated HTML report site. Hosts on S3 / GCS / a local FS; rebuilt after every Checkpoint.
  • dbt → run_results.json + manifest. Machine-readable artefacts; consumed by Elementary, dbt Cloud, or custom dashboards.
  • Soda → file / warehouse table / Soda Cloud. Most flexible: CLI emits JSON / writes to a warehouse table you specify; Soda Cloud (paid) ingests into a managed dashboard.

Alerting payloads.

  • GE. Action list per Checkpoint; built-in Slack action, EmailAction, OpsgenieAction. Payload includes a link to the Data Docs page.
  • dbt. run_results.json is the source; downstream tools (Elementary, dbt Cloud, custom) translate to Slack / PagerDuty.
  • Soda. Native Slack / Webhook / Soda Cloud integration. Payload includes the failing check, failing row count, and a link to the Cloud dashboard.

Worked example — GE Checkpoint inside an Airflow DAG that fails fast on ingest

Detailed explanation. The textbook ingest pattern: pull → DQ → land. A GreatExpectationsOperator fires between the pull and the land; on failure, the batch is quarantined and the load is skipped. This is the only pattern that guarantees the bad row never reaches the warehouse.

Question. Write an Airflow DAG with three tasks: pull a daily orders dump, run a GE Checkpoint, and land it into Snowflake. The DQ task must halt the DAG on failure.

Input.

Task Purpose
pull_orders Download the API dump to S3
dq_orders Run raw_orders_smoke Checkpoint
land_to_warehouse COPY INTO Snowflake

Code.

from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.s3 import S3CreateObjectOperator
from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator
from airflow_provider_great_expectations.operators.great_expectations import (
    GreatExpectationsOperator,
)

with DAG(
    "orders_ingest",
    start_date=datetime(2026, 6, 15),
    schedule="@daily",
    catchup=False,
) as dag:
    pull_orders = S3CreateObjectOperator(
        task_id="pull_orders",
        s3_bucket="ingest",
        s3_key="orders/{{ ds }}.parquet",
        data="<<api response>>",
    )

    dq_orders = GreatExpectationsOperator(
        task_id="dq_orders",
        data_context_root_dir="/opt/ge",
        checkpoint_name="raw_orders_smoke",
        fail_task_on_validation_failure=True,
    )

    land_to_warehouse = SnowflakeOperator(
        task_id="land_to_warehouse",
        sql="""
            COPY INTO raw.orders
            FROM @ingest_stage/orders/
            PATTERN = '.*{{ ds }}\\.parquet';
        """,
    )

    pull_orders >> dq_orders >> land_to_warehouse
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The DAG runs daily. pull_orders writes the API response to S3.
  2. dq_orders reads the same Parquet file and runs the raw_orders_smoke Checkpoint. The Checkpoint is configured to assert not_null(customer_id), accepted_values(status), and row_count > 100.
  3. If the Checkpoint succeeds, land_to_warehouse runs and the bad row never had the chance to reach the warehouse.
  4. If the Checkpoint fails, fail_task_on_validation_failure=True raises an exception in the operator; Airflow marks the task as failed; land_to_warehouse is skipped; the on-call is paged via the standard Airflow on-failure callback.
  5. The batch sits in S3 untouched — replaying after a fix is a simple manual re-trigger of land_to_warehouse after the upstream issue is resolved.

Output.

Day dq_orders status land_to_warehouse status
2026-06-14 passed succeeded
2026-06-15 failed (bad enum) skipped
2026-06-16 passed (after upstream fix) succeeded

Rule of thumb. The "pull → DQ → land" pattern is the default for ingest DAGs. Add it once per source; every future bad batch is caught before it touches the warehouse.

Worked example — dbt build halts downstream models on a failed test

Detailed explanation. The reason senior DEs prefer dbt build over dbt run && dbt test is that build interleaves models and tests in DAG order — a failed test on the staging table halts the downstream marts in the same command. Pre-build, the same pipeline would build the mart on broken staging data and only fail at the final dbt test.

Question. Given stg_ordersint_orders_enrichedmart_daily_revenue, write a dbt build command and explain what happens when stg_orders.not_null(customer_id) fails.

Input.

Model Dependency
stg_orders raw.orders
int_orders_enriched stg_orders + dim_customers
mart_daily_revenue int_orders_enriched

Code.

# dbt build runs every node in DAG order, with tests interleaved
dbt build --select +mart_daily_revenue

# schema.yml beside stg_orders
models:
  - name: stg_orders
    columns:
      - name: customer_id
        tests:
          - not_null:
              severity: error
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. dbt build materialises stg_orders first.
  2. Immediately after materialisation, it runs the tests configured on stg_orders — including not_null(customer_id).
  3. If not_null(customer_id) fails (severity: error), dbt marks the test as failed and halts downstream — int_orders_enriched and mart_daily_revenue are skipped.
  4. The downstream models retain their previous values (the prior day's data), which is usually safer than overwriting with a half-broken build.
  5. The on-call receives a failure notification from dbt Cloud / Airflow / Elementary with the failing test name and the failing row count.

Output.

Node dbt build status
stg_orders materialised + test failed
int_orders_enriched skipped (upstream test failed)
mart_daily_revenue skipped (upstream test failed)

Rule of thumb. Always run dbt build, never dbt run followed by dbt test. The interleaved semantics is the entire reason build exists — and it is the only way to halt downstream materialisation on a failed test in a single command.

SQL interview question on alert routing

A senior interviewer might ask: "You have 200 DQ tests across GE, dbt, and Soda. The team is drowning in Slack alerts. Design the routing policy so the on-call only pages on actionable failures."

Solution Using a tiered severity + routing policy

# Tier 1 — critical (page on-call)
# Run: at ingest, in dbt build, on every Soda scan
# Severity: error / fail
# Route: PagerDuty + #data-oncall

# Tier 2 — important (Slack-only)
# Run: in dbt build, in Soda scan
# Severity: warn (single failing row tolerated)
# Route: #data-quality channel

# Tier 3 — informational (dashboard-only)
# Run: in Soda anomaly scan, in Elementary anomaly job
# Severity: anomaly_score < 0.5
# Route: Soda Cloud / Elementary dashboard, no Slack

# Example dbt config
models:
  - name: stg_orders
    columns:
      - name: customer_id
        tests:
          - not_null:           # tier 1
              severity: error
          - dbt_expectations.expect_column_value_lengths_to_be_between:  # tier 2
              min_value: 1
              max_value: 64
              severity: warn
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Tier Trigger Route Expected daily volume
1 — critical not_null, unique, schema PagerDuty + Slack #data-oncall 0–1 per day
2 — important enum, range, FK Slack #data-quality 1–5 per day
3 — informational anomaly, freshness drift Dashboard dozens per day, no human triage

The policy is enforced at the framework level (severity: clauses) plus the routing config (PagerDuty integration / Slack webhooks).

Output:

Channel Daily volume On-call paged?
PagerDuty 0–1 yes (critical only)
#data-oncall 0–1 yes (mirrors PagerDuty)
#data-quality 1–5 no — async triage
Soda Cloud dashboard dozens no — review weekly

Why this works — concept by concept:

  • Tiered severity — every check has a tier baked in. Tier 1 is the contract; tier 2 is the smell; tier 3 is the signal. Each tier routes to a different channel with different expected response times.
  • Page only on contract violations — the on-call sleeps unless a column the business depends on is broken. Everything else waits for business hours.
  • Dashboard for anomalies — anomaly scores are noisy by nature; they belong on a dashboard reviewed weekly, not in a Slack channel that paged everyone last Tuesday.
  • Severity at the framework levelseverity: error in dbt, fail: clauses in Soda, action_list in GE — the framework decides whether to escalate, not a custom Python receiver.
  • Audit the routing quarterly — every quarter, count alerts per channel. If #data-quality fired 200 times last quarter, the threshold is wrong or the tests are stale.
  • Cost — zero additional pipeline cost; pure config / routing change. The cost was already paid the day the tests were written.

SQL
Topic — data validation
Data validation problems (SQL)

Practice →


5. Coverage strategy — what to test where, without drowning

80% of outages are caught by not_null + unique + freshness on contract columns — the rest is incremental insurance

The mental model in one line: tier datasets by blast radius (gold = consumed by exec dashboards, silver = consumed by analysts, bronze = raw ingestion), and increase test density per tier. A flat "test everything equally" strategy either bankrupts the warehouse budget or drowns the team in noise — neither failure mode survives a quarterly review.

Three-tier pyramid showing data quality coverage strategy — wide bronze tier at the bottom with light tests, narrower silver tier in the middle with more tests, narrow gold tier at the top with the densest tests; a small cost gauge on the right and a small alert-fatigue meter on the left, on a light PipeCode card.

The 80/20 rule of DQ.

  • not_null on every contract column. The column the business named in the data contract gets a not_null test, full stop.
  • unique on every primary key. Including surrogate keys, natural keys, and grain-defining composite keys.
  • freshness on every table consumed by anything that runs on a schedule. A dashboard that refreshes daily needs a freshness check at ingest and at the mart.
  • accepted_values on every enum. Status, type, category, country code — any column with a fixed domain.
  • relationships (FK) on every join key. Cheap to write; catches half of "the join lost 12% of the rows" tickets.

Tiered datasets by blast radius.

Tier Examples Test density Run frequency
Gold exec dashboards, regulatory reports, revenue metrics 100% contract coverage + anomaly + distribution every run
Silver analyst-facing marts, business-team reports not_null + unique + accepted_values + freshness every run
Bronze raw ingest, change-data-capture tables row_count + freshness every run

The pyramid logic: gold tier is narrow (fewer tables) but dense (many tests); bronze is wide (many tables) but sparse (few tests). The aggregate cost stays manageable while the contract layer gets the strictest scrutiny.

Schema contracts — three flavours.

  • dbt contracts. Built-in to dbt 1.5+. Declare the contract in YAML; dbt enforces column names + types at build time. Native to the dbt build flow.
  • GE schema expectations. expect_table_columns_to_match_ordered_list + expect_column_values_to_be_of_type. Asserts at runtime; integrates with Data Docs.
  • SodaCL schema checks. schema: clause with fail: when wrong column type / fail: when forbidden column present. Most ergonomic in YAML.

Anomaly checks layered on top of deterministic rules.

The deterministic layer (not_null, unique, FK) catches new breakage. The anomaly layer (row-count drift, distribution shift, freshness lag) catches gradual breakage. Together they cover both the cliff-edge failures and the slow-burn failures. Layer order matters: deterministic first, anomaly second — never the other way around.

Cost of running checks.

  • Warehouse credits. Every dbt test compiles to a SELECT; every Soda check is a SELECT; GE checks on Spark / SQL also compile to SELECT. Tests on huge tables can dominate the build budget. Mitigate with sampled tests on bronze and full tests on gold.
  • Run-time budget. A 200-test dbt run that doubles the build time blows the SLA. Profile tests with dbt --debug and prune the slowest.
  • False-positive fatigue. A test that fires once a week without being actionable trains the team to ignore it — and then a real failure goes unseen. Audit fired-but-ignored tests quarterly.

When to combine frameworks.

  • dbt tests for the transform layer. Lowest friction; YAML is already there.
  • GE Checkpoints at ingest + critical anomaly. Python-native, custom expectations for cross-table invariants.
  • Soda scans for cross-source freshness + multi-warehouse audits. SodaCL is the only declarative answer for "test the same metric across Snowflake and Postgres."

Migration patterns.

  • Start on dbt tests. Lowest cost of entry; covers warehouse layer end-to-end.
  • Layer Soda for freshness + anomaly. Once dbt covers the contract, Soda adds the cross-cutting checks dbt does not express natively.
  • Add GE at ingest. When the team has a bad-batch story to tell (e.g., "API silently changed an enum"), GE pays for itself in one quarter.
  • Optional: Elementary on top of dbt. Adds anomaly detection and a dashboard without leaving the dbt project.

Worked example — coverage matrix for a 12-table mart

Detailed explanation. A team has 12 mart tables: 2 gold, 4 silver, 6 bronze. They want a coverage matrix that fits inside a 5-minute test budget. Building the matrix surfaces the trade-offs concretely.

Question. Design a coverage matrix for the 12 tables. Specify which tests run on which tier and estimate the test count.

Input.

Table Tier Row count
mart_executive_revenue gold 10M
mart_executive_users gold 50M
mart_finance_invoices silver 5M
mart_finance_payments silver 8M
mart_marketing_attribution silver 100M
mart_marketing_funnel silver 200M
stg_orders bronze 500M
stg_payments bronze 1B
stg_users bronze 50M
raw_orders bronze 500M
raw_payments bronze 1B
raw_users bronze 50M

Code.

# Gold tier — dense
models:
  - name: mart_executive_revenue
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
    columns:
      - name: day      { tests: [unique, not_null] }
      - name: revenue  { tests: [not_null] }
      - name: region   { tests: [not_null, accepted_values: { values: ['EU','US','APAC'] }] }

# Silver tier — moderate
  - name: mart_finance_invoices
    columns:
      - name: invoice_id  { tests: [unique, not_null] }
      - name: amount      { tests: [not_null] }

# Bronze tier — minimal (run as Soda freshness only)
Enter fullscreen mode Exit fullscreen mode
# Soda — bronze tier freshness
checks for stg_orders:
  - freshness(created_at) < 6h
checks for raw_orders:
  - freshness(loaded_at) < 6h
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Gold tables get full coverage: row_count, not_null on every contract column, unique on PKs, accepted_values on enums. Estimated 8 tests per gold table → 16 tests.
  2. Silver tables get the essentials: not_null on contract columns, unique on PKs. Estimated 4 tests per silver table → 16 tests.
  3. Bronze tables get a single freshness check each (via Soda, not dbt). Estimated 1 test per bronze table → 6 tests.
  4. Total: 38 tests across the 12 tables. Fits comfortably in 5 minutes of warehouse time on a Snowflake medium warehouse.
  5. If the test budget grows, add anomaly checks to the gold tier first (highest blast radius). Add range / FK tests to silver next. Bronze tier stays minimal.

Output.

Tier Tables Tests per table Total tests Approx run time
Gold 2 8 16 ~2 min
Silver 4 4 16 ~2 min
Bronze 6 1 6 ~30s
Total 12 38 ~4.5 min

Rule of thumb. Estimate test budget before writing tests. If the matrix exceeds the SLA, tier down (gold loses anomaly, silver loses FK). If the matrix fits with room, tier up (silver gains anomaly, bronze gains row_count).

Worked example — quarantine pattern with Soda + dbt

Detailed explanation. A team needs partial-data continuity: if 1% of rows fail the DQ check, the report should still ship on the 99% of good rows, with the bad rows quarantined to a side table for investigation. This is the "quarantine-and-continue" pattern, and it requires the assertion to live inside the transform, not as a separate test step.

Question. Write a dbt model that materialises only the rows that pass amount > 0 AND status IN (...), and writes the failing rows to a _quarantine side table. Show the Soda check that monitors the quarantine table size.

Input.

order_id amount status
1 50 placed
2 -10 placed
3 25 unknown
4 0 paid
5 75 paid

Code.

-- models/marts/orders_clean.sql
{{ config(materialized='incremental') }}

WITH passed AS (
  SELECT *
  FROM {{ ref('stg_orders') }}
  WHERE amount > 0
    AND status IN ('placed','paid','shipped','cancelled')
)

SELECT * FROM passed
Enter fullscreen mode Exit fullscreen mode
-- models/marts/orders_quarantine.sql
{{ config(materialized='incremental') }}

SELECT *,
       CURRENT_TIMESTAMP AS quarantined_at,
       CASE
         WHEN amount IS NULL OR amount <= 0 THEN 'bad_amount'
         WHEN status NOT IN ('placed','paid','shipped','cancelled') THEN 'bad_status'
       END AS quarantine_reason
FROM {{ ref('stg_orders') }}
WHERE amount IS NULL
   OR amount <= 0
   OR status NOT IN ('placed','paid','shipped','cancelled')
Enter fullscreen mode Exit fullscreen mode
# Soda — monitor quarantine table size
checks for orders_quarantine:
  - row_count < 1000:
      warn: when between 100 and 999
      fail: when >= 1000
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. orders_clean filters the passing rows. The downstream marts read from this clean table — the bad rows never propagate.
  2. orders_quarantine writes the failing rows to a side table with a quarantine_reason column. The investigation team can query this table directly to triage upstream bugs.
  3. The Soda check on orders_quarantine monitors the size of the quarantine. Below 100: silent. 100–999: warn. ≥1000: fail (something systemic is happening upstream).
  4. The DAG does not fail unless the quarantine size crosses the fail threshold — partial data continues to flow, the reporting layer is happy, the on-call only wakes when the failure rate is genuinely abnormal.
  5. The trade-off: downstream metrics computed on orders_clean exclude the bad rows. Be explicit about this in the metric's definition ("excluding orders with invalid amount or status").

Output.

Table Rows Downstream consumers
orders_clean 3 (ids 1, 5, plus reclassifications) marts, BI
orders_quarantine 3 (ids 2, 3, 4 with reasons) DQ on-call investigation

Rule of thumb. Use quarantine-and-continue when partial data is materially better than no data — most reporting use cases qualify. Use fail-the-run only when partial data is worse than no data, e.g., financial reconciliation where a missing transaction is worse than a delayed report.

SQL interview question on combining frameworks

A senior interviewer might frame this as: "You have a Postgres source, a Snowflake warehouse, and a dbt project. You need DQ coverage from raw ingestion to the executive dashboard. How would you split the work across GE, dbt tests, and Soda Core — and why?"

Solution Using a three-framework layered architecture

# Layer 1 — Pre-ingest (Great Expectations)
# Where: Airflow Python task before Snowflake COPY
# What: not_null on PKs, accepted_values on enums, row_count
# Why: fail-fast quarantine; bad batch never reaches Snowflake

# Layer 2 — In-transform (dbt tests + dbt_expectations)
# Where: dbt build, schema.yml beside each model
# What: not_null, unique, accepted_values, relationships, FK
# Why: gates downstream model materialisation; YAML lives next to the model

# Layer 3 — Post-load freshness + anomaly (Soda Core)
# Where: Soda scan after dbt build completes
# What: freshness, row_count anomaly, distribution drift, cross-source
# Why: declarative anomaly clauses; cross-source checks dbt cannot express
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Layer Framework Test count (illustrative) When it runs Blast radius if it fires
1 GE 12 Airflow ingest task Quarantine batch, skip Snowflake load
2 dbt 60 dbt build Halt downstream models in DAG
3 Soda 20 Post-build scan Alert mart owner; no downstream halt

Output:

Concern Owned by
"Did the right data land?" GE (ingest)
"Did the transform preserve the contract?" dbt (in-build)
"Is the mart fresh and within bounds?" Soda (post-load)
"Is the dashboard right?" All three combined

Why this works — concept by concept:

  • Layered framework choice — each framework owns the layer where it plugs in cleanest. No single framework spans all three layers without significant friction.
  • Fail-fast at ingest, gate at transform, alert at mart — the failure mode scales down by layer. Ingest = block; transform = halt downstream; mart = alert and inspect.
  • YAML proliferation is okay — both dbt and Soda are YAML-first; engineers learn one syntax (declarative DQ) once and apply it twice. The duplication cost is lower than the integration cost of forcing one framework to do all three jobs.
  • **Soda Cloud is optional__ — the Soda layer works on Soda Core alone (free OSS). Adding Soda Cloud later gives a managed dashboard without rewriting any check.
  • Severity per layer — ingest = fail, transform = fail (severity:error) or warn (severity:warn), mart = warn / dashboard. Routing matches the response time the team can sustain.
  • Cost — three frameworks add maybe 5 minutes to the daily DAG. Compared to the cost of a silent week-long outage, it is the cheapest insurance the team will ever buy.

SQL
Topic — validation
Validation problems (SQL)

Practice →

SQL
Topic — null handling
NULL handling problems (SQL)

Practice →


Cheat sheet — DQ framework recipes

  • "Test inside the warehouse, beside the model." dbt generic + singular tests. YAML lives next to schema.yml; dbt build interleaves models and tests; failed test halts downstream.
  • "Python-heavy stack with Airflow / Dagster / Prefect." Great Expectations Checkpoints inside an operator. Pull → DQ → land is the textbook pattern; fail_task_on_validation_failure=True quarantines bad batches.
  • "YAML-first, multi-source, want a cloud dashboard." Soda Core (free) + optional Soda Cloud (paid). SodaCL compiles to source-dialect SQL; CLI drops into any DAG.
  • "DataFrame validation in code." pandera for pandas / Polars / PySpark schemas; Deequ / PyDeequ for Spark. Apply at the boundary of every script.
  • "Schema contract at the edge." dbt contracts (in-warehouse, declared in YAML) or GE schema expectations (expect_table_columns_to_match_ordered_list) or SodaCL schema: clause.
  • "Anomaly + freshness without writing rules." Layer Soda Core (native anomaly score + change clauses) or Elementary (on top of dbt) or a vendor (Monte Carlo, Bigeye) on top of the deterministic test layer.
  • "Quarantine bad rows, continue with good ones." Split the model into _clean (WHERE pass) and _quarantine (WHERE fail) materialisations inside dbt. Monitor quarantine size with a Soda check.
  • "Warn on 1 failing row, page on 10+." dbt: severity: error + error_if: ">= 10" + warn_if: ">= 1". Soda: warn: when between 1 and 9 / fail: when > 9. GE: mostly threshold + Checkpoint action_list routing.
  • "Cross-source DQ (compare Postgres to Snowflake)." SodaCL — only one of the three frameworks that expresses cross-source declaratively. GE / dbt require glue code.
  • "Backfill DQ over the last 30 days after fixing the upstream bug." GE: re-run the Checkpoint with a date-range BatchRequest. dbt: dbt test --select stg_orders --vars '{"start_date":"2026-05-15"}'. Soda: parametrise the checks file with a date variable.
  • "DQ in a notebook for ad hoc inspection." GE — the Python API loads a Suite, runs against a DataFrame, returns a result object you can render inline. The only one of the three that is notebook-ergonomic.

Frequently asked questions

Should I use Great Expectations or dbt tests if I already use dbt?

Use dbt tests for the warehouse-native transform layer — the YAML is already there, dbt build interleaves models and tests, and failed tests halt downstream in a single command. Reach for Great Expectations when you need to assert before the data lands in the warehouse (ingest checks inside Airflow / Dagster), when you need rich custom expectations that go beyond what dbt_expectations covers, or when Data Docs is the human-readable report your stakeholders want. Most mature teams run both: dbt tests own the transform; GE owns ingest + critical anomaly. They do not compete — they cover different layers.

What does Soda Core add that dbt tests don't?

Three things. First, native cross-source checks — SodaCL can run the same check against Postgres and Snowflake without rewriting the SQL, which dbt cannot do declaratively. Second, native anomaly detection — anomaly score for row_count and change for row_count > -10% are built-in clauses, whereas dbt needs Elementary or a singular test. Third, ergonomic freshness — freshness(updated_at) < 1d is a one-line check, whereas dbt source freshness lives in a separate config and runs via a separate command. Adopt Soda Core when your stack is multi-source or when anomaly / freshness coverage is the gap.

Are Great Expectations Checkpoints overkill for small teams?

For a team of one to three engineers running a single dbt project, yes — GE adds setup ceremony (Data Context, Suite, Checkpoint, action_list) that pays off only when you have multiple datasets, multiple environments, or stakeholders who consume Data Docs. Start with dbt tests; add Soda Core when you need freshness or anomaly; bring in GE when you have a bad-batch story (e.g., the API changed an enum and broke the warehouse) and you need fail-fast ingest assertions. The migration cost from "no GE" to "GE at ingest" is a sprint — small teams should defer that sprint until the value is proven.

Can I run Soda Core without Soda Cloud?

Yes — Soda Core is the OSS engine and runs entirely from the CLI or as a Python library. You write a checks.yml, run soda scan -d <warehouse> checks.yml, and the engine prints results to stdout / writes JSON to a file / inserts into a warehouse result table. Soda Cloud (paid) is the managed dashboard + Slack-alert layer that consumes Soda Core scan results, but it is strictly optional. Many teams run Soda Core forever and route results to their own dashboard (Grafana, Metabase, a custom dbt-rendered HTML page) — no licence fee, no vendor dependency.

How do I avoid alert fatigue from data quality tests?

Tier every check by severity before writing it. Tier 1 (contract: not_null on PKs, unique on grain keys, accepted_values on enums) pages the on-call via PagerDuty. Tier 2 (smell: range checks, FK relationships, length bounds) goes to a Slack channel with no paging. Tier 3 (anomaly: row-count drift, freshness lag) goes to a dashboard reviewed weekly. Audit the alert volumes quarterly — if Tier 2 fired 200 times last quarter and nothing was actionable, either the threshold is wrong or the tests are stale. The most expensive class of alert is the one that fires repeatedly without ever being acted on — it trains the team to ignore the next real failure.

Where do dbt contracts fit relative to all three frameworks?

dbt contracts (introduced in dbt 1.5) declare the expected column names + types of a model in YAML, and dbt enforces them at build time — a column rename or type change breaks dbt build before the model ships. Contracts complement but do not replace the assertion frameworks. Use contracts for structural guarantees (the schema is what it claims to be); use dbt tests for content guarantees (the values are not NULL, are unique, are in the enum). GE and Soda offer their own schema-assertion mechanisms (expect_table_columns_to_match_ordered_list, SodaCL schema: clause) which serve the same role for non-dbt or pre-warehouse layers. Adopt contracts inside dbt; mirror the same schema assertion at the ingest layer with GE or Soda for end-to-end structural coverage.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every DQ recipe above ships with hands-on practice rooms where you write the GE Suite, the dbt YAML, and the SodaCL check against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your `not_null + unique + freshness` coverage actually behaves the same on Snowflake as on Postgres.

Practice data validation now →
Validation drills →

Top comments (0)