Gowtham Potureddi

Posted on Jun 15

Data Quality Frameworks: Great Expectations vs dbt Tests vs Soda Core

#python #sql #interview #dataengineering

Picking a data quality framework in 2026 is no longer a "should we test our pipelines?" debate — every senior data engineer knows the answer. The real debate is which framework: Python-native Great Expectations with its expectations + suites + checkpoints, warehouse-native dbt tests living next to models, or YAML-first Soda Core with SodaCL checks and the optional Soda Cloud dashboard. Each one solves the same problem — catch the broken row before the dashboard ships it — but each carries a different team shape, a different runtime contract, and a different blast radius when a test fails.

This guide is the dialect matrix you wished existed the first time a stakeholder asked "why is the metric off by 12%?" and the answer was a NULL row that slipped through ingestion. It walks the data quality testing vocabulary across the three frameworks (an Expectation vs a generic test vs a check is the same idea wearing three name tags), the expectations suite structure and how it maps to dbt's YAML, the generic tests library and how to extend it, and the coverage strategy that prevents alert fatigue. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, and a concept-by-concept breakdown of why it works.

When you want hands-on reps immediately after reading, drill the data validation practice library →, rehearse on NULL handling problems →, and stack the schema muscles with validation drills →.

On this page

Why pick a data quality framework at all
The DQ framework landscape in 2026
Expectations vs tests vs checks — the same idea, three vocabularies
Where they plug into the pipeline
Coverage strategy — what to test where, without drowning
Cheat sheet — DQ framework recipes
Frequently asked questions
Practice on PipeCode

1. Why pick a data quality framework at all

Silent data corruption is the most expensive class of pipeline bug — and BI dashboards are the worst place to catch it

The one-sentence invariant: the cost of catching a bad row at the source is one developer-hour; the cost of catching it in the executive dashboard is one quarter of trust. A data quality framework moves the assertion from "the analyst noticed the number looked weird" to "the pipeline failed at 03:14 and paged the on-call before the report ran" — and that shift is what separates a team that ships metrics from a team that explains metrics.

The four categories of data quality checks.

Row-level. "Every row of orders has a non-NULL order_id." Cheap, deterministic, the 80% of coverage.
Column-level. "The amount column is between 0 and 1,000,000 in 99.9% of rows; the status column belongs to the enum {placed, paid, shipped, cancelled}." Catches schema drift and bad enums.
Freshness. "The events table has a row in the last 60 minutes." Catches stalled ingestion before the report consumer notices.
Schema / contract. "The orders table has exactly these columns with these types in this order." Catches upstream renames, drops, and silent type changes — the hardest class to debug after the fact.

Tests live next to code, not in BI dashboards.

The anti-pattern is a Looker / Power BI / Tableau dashboard with a "data quality" tab that runs the same SQL on the same warehouse and turns red when something breaks. The dashboard is downstream — by the time it turns red, the bad data has already been read by other consumers, the metric snapshot has been computed, and the broken row has propagated. Push the assertion to the place where the data is produced (ingestion, transform, contract boundary) and the bad row never enters the warehouse in the first place.

Build vs buy vs adopt-an-OSS-framework.

Build — write your own checks as SQL UNION ALL queries that return failing rows. Cheap to start; expensive to maintain; no shared vocabulary across teams.
Buy — Monte Carlo, Bigeye, Anomalo. Strong on anomaly detection and lineage; sticker shock; lock-in.
Adopt OSS — Great Expectations, dbt tests, Soda Core. Zero vendor cost; ecosystem of community-contributed assertions; lives next to your code in version control. This guide is about the OSS lane.

Why "we already have dbt tests" is rarely enough alone.

dbt tests are excellent for the warehouse-native transform layer — every model already has YAML, so co-locating tests next to columns is a low-friction win. But dbt only knows what runs in dbt build. It cannot test the ingest layer (the Airflow task that lands the file), it cannot run pre-warehouse assertions, and its anomaly / distribution muscles are weaker than what GE or Soda ship out of the box. Most mature teams end up combining: dbt tests at transform, GE or Soda at the edges.

Worked example — three places NULL can break a dashboard and the four assertion families that catch each one

Detailed explanation. New analysts often assume a single dbt test on the mart table is enough. The reality is that the same NULL row can be introduced at ingest (the source API silently changed and started returning null), at transform (a LEFT JOIN with missing right-side rows), or at contract drift (a column renamed upstream). Each of those failures needs a different assertion family — and a single test at the end of the pipeline catches them too late to attribute the source.

Question. Given a daily_revenue mart table that reads from staging_orders and staging_payments, list the four data-quality test families and the failure mode each one catches first. Show which framework excels at each layer.

Input.

Layer	Table	Common failure mode
Ingest	`raw_orders`	Upstream API returns NULL `customer_id`
Transform	`staging_orders`	LEFT JOIN with missing right side leaks NULL
Mart	`daily_revenue`	SUM over NULL group yields 0 instead of NULL
Contract	`daily_revenue`	Downstream BI expects `revenue` column; it was renamed to `gross_revenue`

Code.

# dbt tests (mart layer) — runs inside `dbt build`
models:
  - name: daily_revenue
    columns:
      - name: revenue
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"
      - name: day
        tests:
          - unique
          - not_null

# Great Expectations (ingest layer) — runs inside an Airflow PythonOperator
suite = context.add_expectation_suite("raw_orders_ingest")
batch.expect_column_values_to_not_be_null("customer_id")
batch.expect_column_values_to_be_in_set(
    "status", ["placed", "paid", "shipped", "cancelled"]
)
batch.expect_table_row_count_to_be_between(min_value=1, max_value=10_000_000)

# Soda Core (contract / freshness) — runs as a CLI step
checks for daily_revenue:
  - row_count > 0
  - schema:
      fail:
        when required column missing: [day, revenue]
  - freshness(day) < 1d

Step-by-step explanation.

The ingest test (GE) catches the upstream API change before the row ever reaches the warehouse — the Airflow task fails at 03:14 instead of the daily report being wrong at 09:00.
The transform test (dbt) catches a NULL leak inside staging_orders — dbt build halts the DAG and the mart never gets the bad day.
The mart test (dbt + Soda) catches the SUM-of-NULL pathology — revenue >= 0 would have been satisfied silently if SUM(NULL) is NULL; the not_null test on the day makes the failure loud.
The contract test (Soda schema check) catches the column rename — the daily_revenue table either has the revenue column or the run fails.
Each layer needs the assertion to live at that layer. A single mart-level test sees only the end result and cannot say which upstream layer broke.

Output.

Layer	Best-fit framework	Why it wins here
Ingest	Great Expectations	Python-native, pluggable into Airflow / Dagster operators
Transform	dbt tests	YAML next to model, runs as part of `dbt build`
Mart	dbt + Soda	dbt for column-level, Soda for freshness + row-count
Contract	Soda schema / dbt contracts	Declarative schema assertion, no glue code

Rule of thumb. Map every dataset to the four assertion families before writing a single test. If a category is empty, that is the silent class of bug that will bite next quarter. Pick the framework per layer based on where it plugs in cleanly, not by which one your favourite blogger uses.

Worked example — the cost of not having a DQ framework

Detailed explanation. Teams often justify "we don't need a DQ framework — our analysts catch issues" until the first big silent outage. The math is unforgiving: a single bad metric that runs for three days on a board-level dashboard costs more in lost trust than three years of GE / Soda licences ever would (and both are free).

Question. Quantify the engineering cost of catching a bad row at four points: at ingest, in the warehouse, in the BI tool, and at the executive review. Show the time-to-detect and time-to-fix multiplier.

Input.

Catch point	Avg time-to-detect	Avg time-to-fix	Trust cost
Ingest test fires	5 min	30 min	none
Warehouse transform fails	30 min	60 min	small
BI tool shows wrong number	4 hours	4 hours	medium
Executive review flags it	3 days	1 day	high

Code.

# A trivial Great Expectations check on the ingest side
import great_expectations as gx

context = gx.get_context()
asset = context.sources.add_pandas("ingest").add_dataframe_asset("raw_orders")
batch_request = asset.build_batch_request(dataframe=df)

suite = context.add_expectation_suite("raw_orders_smoke")
validator = context.get_validator(batch_request=batch_request, expectation_suite=suite)

validator.expect_column_values_to_not_be_null("customer_id")
validator.expect_column_values_to_be_between("amount", min_value=0, max_value=10_000_000)
validator.expect_table_row_count_to_be_between(min_value=100)

results = validator.validate()
if not results.success:
    raise RuntimeError("ingest DQ failed — quarantining batch")

Step-by-step explanation.

Catching the bad row at ingest fires within 5 minutes — the Airflow task fails, the on-call sees a single page, the fix is "rerun after upstream is patched."
Catching it at the transform layer (dbt test on staging) fires within 30 minutes — slightly more cost because some downstream models may have started building.
Catching it at the BI layer is hours later because dashboards refresh on a schedule, and humans only look at certain dashboards on certain days.
The worst case — exec review — is days later, and the cost includes a board-level apology, an "explainability" retro, and weeks of rebuilt trust in the affected metric.
The order of magnitude difference between "ingest fail" and "exec review fail" is roughly 1,000x in eng-hours and "infinite" in trust.

Output.

Catch point	Cost multiplier
Ingest	1x
Warehouse	4x
BI	50x
Executive	1,000x+

Rule of thumb. Every dataset deserves a minimum of not_null + unique + freshness at the ingest or staging boundary. Adding those three lines of YAML / Python costs minutes; missing them costs quarters of trust the first time a row slips through.

SQL interview question on detecting a silent NULL leak

A senior interviewer often opens with: "You shipped a daily_revenue table to the exec dashboard last week. This morning the CFO says the number is off by 12%. Walk me through the audit: where would you look first, and what assertions would have caught it earlier?"

Solution Using a layered audit + retroactive DQ assertion plan

-- 1) Audit the mart for NULL leaks per column
SELECT
    COUNT(*)                                   AS total_rows,
    COUNT(*) - COUNT(customer_id)              AS null_customer,
    COUNT(*) - COUNT(revenue)                  AS null_revenue,
    SUM(CASE WHEN revenue < 0 THEN 1 ELSE 0 END) AS negative_revenue
FROM daily_revenue
WHERE day = CURRENT_DATE - 1;

-- 2) Audit the staging table for upstream drift
SELECT
    COUNT(*) - COUNT(amount) AS null_amount_in_staging,
    COUNT(*) FILTER (WHERE status NOT IN
        ('placed','paid','shipped','cancelled')) AS bad_enum_status
FROM staging_orders
WHERE created_at >= CURRENT_DATE - 1;

-- 3) Compare against the contract: row_count delta day-over-day
WITH day_counts AS (
  SELECT day, COUNT(*) AS row_count
  FROM daily_revenue
  GROUP BY day
)
SELECT day, row_count,
       row_count - LAG(row_count) OVER (ORDER BY day) AS delta
FROM day_counts
ORDER BY day DESC
LIMIT 7;

Step-by-step trace.

Step	Finding	Implied source layer
Mart audit	12% rows have NULL `customer_id`	Either ingest or transform
Staging audit	`null_amount_in_staging = 0`, `bad_enum_status = 4,800`	Ingest layer — upstream API drift
Row-count delta	`delta = -12,000` (down 12% vs prior week)	Confirms the silent drop

The audit narrows the source layer to ingest in three queries. The fix is twofold: a one-time backfill for the affected days, and a permanent GE expectation on the ingest table to prevent recurrence.

Output:

Layer	Retroactive assertion
Ingest (GE)	`expect_column_values_to_be_in_set("status", ["placed","paid","shipped","cancelled"])`
Staging (dbt)	`accepted_values` test on `status` with the same enum
Mart (Soda)	`row_count anomaly` check with a 5% deviation threshold

Why this works — concept by concept:

Layered audit, not single query — running the same NULL count at each layer reveals where the NULLs were introduced. A single mart-level query only confirms "something broke."
Enum drift detection — bad_enum_status = 4,800 proves the upstream API started returning an undocumented status. This is the kind of failure schema-only contracts miss.
Row-count delta — comparing today's count to the trailing window of comparable days exposes silent drops that pass column-level checks. A row that is "missing" never appears in NULL counts.
Retroactive assertions become the regression suite — every audit finding becomes a permanent test. The first time the bug is caught is at exec review; the second time, the pipeline catches it before any consumer sees the bad day.
Cost — three short SQL queries (O(rows) each). The retroactive tests add zero marginal cost to dbt build and seconds to GE / Soda runs.

SQL
Topic — data validation
Data validation problems (SQL)

Practice →

SQL Topic — null handling NULL handling problems (SQL)

Practice →

2. The DQ framework landscape in 2026

Five open-source frameworks dominate — pick by team shape and stack, not by feature count

The mental model in one line: Great Expectations is Python-native, dbt tests are warehouse-native, Soda Core is YAML-first and multi-source — every other contender is either a specialist (pandera, Deequ) or a thin wrapper around one of those three. Once you can pattern-match a team's stack to one of those archetypes, the framework choice is mostly a fait accompli.

Great Expectations (GE) in one paragraph.

Vocabulary. An Expectation is a single assertion. A Suite bundles expectations for a dataset. A Checkpoint binds a Suite to a runtime execution and ships the results to Data Docs (the auto-generated HTML report).
Strengths. Python-native, integrates with Pandas, Spark, SQLAlchemy, and every warehouse. Largest community-contributed expectation library. Data Docs is the gold standard for human-readable test reports.
Weaknesses. Heaviest of the three on setup ceremony; "Data Context" objects can feel ceremonious; recent v3 migration shed weight but added migration burden for v2 users.
Team shape it assumes. A Python-heavy team running Airflow / Dagster / Prefect, comfortable with config-as-code, and willing to invest a sprint in onboarding.

dbt tests in one paragraph.

Vocabulary. A generic test is a reusable macro that returns failing rows (built-ins: not_null, unique, accepted_values, relationships). A singular test is a single SQL file that returns failing rows. A custom generic test is a macro you write that any model can reuse.
Strengths. Lives in YAML next to the model definition — zero context switch. Runs as part of dbt build, which means tests gate the next model. dbt_utils and dbt_expectations packages multiply the built-in library tenfold.
Weaknesses. Warehouse-only — cannot test before the data lands. Anomaly / distribution detection is weak without dbt_expectations. Severity and warn-vs-fail semantics are simpler than GE / Soda.
Team shape it assumes. A warehouse-first team already running dbt for transforms — adding tests is a no-op decision because the YAML is already there.

Soda Core in one paragraph.

Vocabulary. A check is a single SodaCL clause (row_count > 0, missing_count(col) = 0). A scan is one run over a dataset. User-defined metrics extend SodaCL with custom SQL.
Strengths. YAML-first, dialect-agnostic (SodaCL compiles to whatever SQL the source dialect speaks). CLI-friendly — drops into Airflow / GitHub Actions / cron without ceremony. Soda Cloud (optional, paid) adds a dashboard and Slack alerts on top of the OSS engine.
Weaknesses. Smaller community than GE; some advanced expectations (distribution comparisons, partial-suite reuse) require Soda Cloud or DIY SQL.
Team shape it assumes. A multi-source team that wants a single declarative file per dataset across Postgres, Snowflake, BigQuery, Spark, with optional managed dashboard.

The lightweight neighbours.

pandera — DataFrame validation library for pandas / Polars / PySpark. Strong typing, schema inference, in-memory only. Use when the assertion lives in a Python script, not in the warehouse.
Deequ — Spark/Scala library from AWS, ports of which exist in PyDeequ. Strong on big-data anomaly detection. Best when the dataset is already a Spark DataFrame.
Elementary — dbt-native observability layer on top of dbt tests. Adds anomaly detection, lineage, and a dashboard. Treat as a dbt augmentation, not a standalone framework.
Pydantic — application-layer schema validation. Not a DQ framework per se; pairs well at the API / ingest boundary.

The terminology cheat sheet.

Concept	GE	dbt	Soda
One assertion	Expectation	Generic test (per column)	Check
Bundle of assertions	Suite	YAML test block on a model	Checks-file (per dataset)
One run	Checkpoint	`dbt test` / `dbt build`	Scan
Custom assertion	Custom expectation	Custom generic test (macro)	User-defined metric
Severity	Severity (action level)	Severity (warn / error)	Warn / Fail clauses
Result store	Data Docs + Validation Result store	`run_results.json` + manifest	Soda Cloud / file / table

Team-shape decision matrix.

Python team, Airflow / Dagster, multi-source. GE wins. Pluggable into the orchestrator, expectation library beats everything.
dbt-shop, warehouse-native, transform-heavy. dbt tests + dbt_expectations. Don't bring in a second framework until you hit anomaly or freshness needs that dbt can't cover.
Polyglot team, YAML-first, multi-source, want a managed dashboard. Soda Core (free) + Soda Cloud (paid for the dashboard). The CLI scales from one dataset to one hundred without re-architecture.
Data-science / DataFrame-heavy, Pandas / Polars. pandera. Treat schema as code at the boundary of every script.
Big-data Spark shop. Deequ. Native to the engine you're already running.

Worked example — the same assertion in three frameworks

Detailed explanation. The fastest way to internalise the three vocabularies is to write the same assertion in all three. The assertion: "orders.amount is not NULL and is greater than 0." This is the most common production check, and seeing it spelled out three ways highlights where each framework's syntax becomes ergonomic.

Question. Translate "orders.amount IS NOT NULL AND orders.amount > 0" into a Great Expectations Suite, a dbt YAML test block, and a Soda Core checks file. Which is shortest? Which is most readable?

Input.

order_id	amount
1	50
2	0
3	NULL
4	25

Code.

# Great Expectations — Python suite
suite = context.add_expectation_suite("orders_amount")
validator = context.get_validator(
    batch_request=orders_batch_request,
    expectation_suite=suite,
)
validator.expect_column_values_to_not_be_null("amount")
validator.expect_column_values_to_be_between(
    "amount", min_value=0, strict_min=True
)
validator.save_expectation_suite()

# dbt — schema.yml next to the model
models:
  - name: orders
    columns:
      - name: amount
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "> 0"

# Soda Core — checks file
checks for orders:
  - missing_count(amount) = 0
  - min(amount) > 0

Step-by-step explanation.

GE's expression is the most verbose because it carries the Suite + Validator + Batch ceremony. The trade-off: every assertion ships with rich metadata and a Data Docs page automatically.
The dbt version is the shortest if you accept the dbt_utils dependency for expression_is_true. The built-in not_null test handles half the assertion; the other half needs an expression macro.
Soda Core sits between the two on verbosity. missing_count = 0 is the SodaCL idiom for "not null"; min > 0 is more declarative than a per-row predicate but means the same.
All three produce the same failing row in the input table: order_id = 2 (amount = 0) and order_id = 3 (amount NULL).
Readability is taste — but YAML wins for "non-engineer can read the assertion" by a wide margin.

Output.

Framework	Lines of code	Failing rows detected
Great Expectations	9	2 (order_id 2, 3)
dbt tests	6	2 (order_id 2, 3)
Soda Core	3	2 (order_id 2, 3)

Rule of thumb. When the assertion is purely column-level (not_null, range, enum), all three frameworks are roughly equivalent. The choice comes down to where the assertion runs and who maintains it.

Worked example — distribution / anomaly checks where the frameworks diverge

Detailed explanation. "Yesterday's row count was 10,000; today's is 8,000 — is that a 20% drop or normal weekend dip?" This is a distribution check, not a per-row assertion. Each framework's answer is materially different.

Question. Write a "row count must be within 10% of the trailing 7-day average" assertion in GE, dbt (with dbt_expectations), and Soda. Show which is most ergonomic.

Input.

day	row_count
2026-06-08	10,000
2026-06-09	10,200
2026-06-10	9,800
2026-06-11	10,500
2026-06-12	9,900
2026-06-13	10,100
2026-06-14	10,000
2026-06-15	8,000

Code.

# Great Expectations — using a custom expectation
validator.expect_table_row_count_to_be_between(
    min_value=9000, max_value=11000,
    meta={"reason": "trailing 7-day average +/- 10%"}
)
# In practice teams wrap this in a custom expectation that
# reads the trailing window from a metadata table.

# dbt — using dbt_expectations
models:
  - name: daily_revenue
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 9000
          max_value: 11000

# Soda Core — using SodaCL anomaly clause
checks for daily_revenue:
  - anomaly score for row_count < 0.7
  - change for row_count > -10% and change for row_count < 10%

Step-by-step explanation.

GE makes you compute the trailing average yourself (or in a custom expectation) — the built-in expect_table_row_count_to_be_between takes static bounds.
dbt + dbt_expectations is the same — expect_table_row_count_to_be_between is the closest built-in. Anomaly detection requires either Elementary or a singular test that joins against the trailing window.
Soda Core ships native anomaly score and change clauses that compute the trailing window for you. This is where Soda materially outperforms the others on declarative anomaly checks.
For the input on 2026-06-15, the row count is 8,000, which is 20% below the 7-day average of 10,071. All three frameworks fire — but Soda fired without the engineer having to write the trailing-window logic.

Output.

Framework	Lines of declarative config for trailing-window anomaly	Failing day detected
GE	needs custom expectation	2026-06-15
dbt + dbt_expectations	static bounds only without singular test	2026-06-15
Soda Core	2 (built-in `anomaly score` + `change`)	2026-06-15

Rule of thumb. For static / deterministic checks, all three are similar. For trailing-window anomaly checks, Soda is the most ergonomic out of the box; GE catches up with custom expectations; dbt needs Elementary or a singular test.

SQL interview question on choosing a DQ framework

A senior interviewer might frame this as: "Your team runs Airflow + dbt + Snowflake. You already use dbt tests for the warehouse models. Where would you add Great Expectations or Soda Core, and what is the failure mode of each layer if you skip it?"

Solution Using a layered framework decision tree

# Layer 1 — Ingest (Airflow PythonOperator + Great Expectations)
- task_id: ingest_orders_dq
  python_callable: run_ge_checkpoint
  op_kwargs:
    checkpoint_name: raw_orders_smoke
  retries: 0  # fail fast on bad ingest

# Layer 2 — Transform (dbt build runs dbt tests)
- task_id: dbt_build
  bash_command: dbt build --select staging.+ marts.+

# Layer 3 — Mart freshness + row-count anomaly (Soda Core)
- task_id: soda_scan_mart
  bash_command: soda scan -d snowflake checks/marts.yml

Step-by-step trace.

Layer	Framework	Failure mode if skipped
Ingest	Great Expectations	Bad rows enter the warehouse; downstream metrics silently wrong
Transform	dbt tests	NULL leaks across JOIN; marts compute on incomplete data
Mart	Soda Core	Row-count anomalies / freshness lag go undetected until BI tab refreshes

Each framework owns the layer where it plugs in cleanest. Skip a layer and the failure mode pictured above is what bites you next quarter.

Output:

Layer	Test count	Run-time budget	Blast radius if it fails
Ingest GE	12 expectations	~30s	Single batch quarantined
Transform dbt	50 tests	~2 min	Single model halts; downstream gated
Mart Soda	8 checks	~20s	Single dataset flagged; downstream still runs but alerts fire

Why this works — concept by concept:

Layer-per-framework — each framework wins at the layer it was designed for. Forcing GE into the warehouse (or dbt into ingest) costs more than the integration debt of running all three.
Fail-fast at ingest — GE Checkpoint fails the Airflow task; downstream dbt run never starts; nothing bad lands in the warehouse.
Gating transform with dbt build — dbt build runs tests between models, so a failed staging test halts the downstream marts in a single command.
Soda for mart freshness + anomaly — declarative freshness < 1d and anomaly score clauses outperform what dbt tests express natively, without writing a custom test.
Alert routing per layer — ingest fails go to the data-platform on-call; transform fails go to the analytics-engineer on-call; mart anomalies go to the analytics-engineer and the metric owner.
Cost — three framework runs add ~3 minutes to the DAG. The savings: catching the same bad row in BI takes hours, in exec review takes days. Net positive every quarter.

SQL
Topic — validation
Validation problems (SQL)

Practice →

3. Expectations vs tests vs checks — the same idea, three vocabularies

Once you align the three vocabularies on the same row of a matrix, every DQ migration is a syntax exercise — not a semantic one

The mental model in one line: an Expectation, a generic test, and a SodaCL check are isomorphic — same input, same output, same failure semantics, only the names differ. The hard part of DQ work was never the vocabulary; it is the coverage strategy and the alert routing. Once you internalise the synonym table, the migration cost between frameworks falls to a few hours per dataset.

The three vocabularies aligned.

Concept	Great Expectations	dbt	Soda
One assertion	Expectation	generic test invocation	Check
Bundle for a dataset	Expectation Suite	YAML `tests:` block under a model	Checks file (`checks for <dataset>:`)
One execution	Checkpoint	`dbt test` / `dbt build`	Scan
Reusable assertion logic	Custom Expectation class	Custom generic test macro	User-defined metric (SQL)
One-off assertion	Custom Expectation	Singular test (single SQL file)	Inline SQL inside a check
Severity	`action_list` per result	`severity: warn` or `error`	`warn:` / `fail:` clauses
Result artefact	Data Docs + ValidationResults	`run_results.json` + manifest	Scan results (CLI / Cloud / warehouse table)

Common assertions that exist verbatim in all three.

not_null. GE: expect_column_values_to_not_be_null. dbt: not_null. Soda: missing_count(col) = 0.
unique. GE: expect_column_values_to_be_unique. dbt: unique. Soda: duplicate_count(col) = 0.
accepted_values. GE: expect_column_values_to_be_in_set. dbt: accepted_values. Soda: invalid_count(col) = 0 with valid values: clause.
foreign_key / relationships. GE: expect_column_values_to_be_in_set with a query. dbt: relationships. Soda: failed rows query with a join.
row_count. GE: expect_table_row_count_to_be_between. dbt: dbt_utils.expression_is_true or singular test. Soda: row_count > 0.
freshness. GE: expect_column_max_to_be_between on a timestamp. dbt: freshness block in source config. Soda: freshness(updated_at) < 1d.

Custom logic — three styles.

GE custom expectation. Subclass ColumnExpectation or BatchExpectation in Python. Ships with the package; reusable across suites.
dbt custom generic test. A Jinja macro that returns failing rows. Lives in tests/generic/ and is invoked from YAML by name.
SodaCL user-defined metric. Inline SQL inside a metric_name: block; reusable inside the same checks file or imported.

Severity, thresholds, warn-vs-fail.

GE. Each expectation result is success: bool. Severity is configured per Checkpoint via action_list, which can route a "warning" to one Slack channel and a "failure" to another.
dbt. Built-in severity: warn / severity: error. Per-test threshold via error_if / warn_if (e.g., error_if: ">= 10" means "fail if 10+ failing rows").
Soda. Inline warn: and fail: clauses per check. Multi-level threshold by writing two checks with different bounds.

Common interview probes on vocabulary.

"What is the difference between an Expectation and a Suite in GE?" — an Expectation is one assertion; a Suite bundles them for a dataset and is reusable across batches.
"What does dbt build do that dbt test does not?" — dbt build interleaves model materialisation and test execution per node, so a failed test halts downstream models. dbt test runs tests independently after all models.
"Is Soda Core the same as Soda Cloud?" — Soda Core is the OSS engine (CLI + Python library). Soda Cloud is the paid dashboard / Slack-alerts layer that consumes Soda Core results. You can run Soda Core forever without Soda Cloud.
"What is a singular test in dbt vs a generic test?" — singular is one SQL file in tests/; generic is a Jinja macro in tests/generic/ invoked by name from YAML. Singulars are bespoke; generics are reusable.

Worked example — `orders.amount > 0 AND amount IS NOT NULL` in all three frameworks

Detailed explanation. The "amount must be positive and non-NULL" check is the canonical first assertion teams write. Spelling it in all three frameworks side by side surfaces the syntax differences and makes the migration delta concrete.

Question. Write the assertion in (a) a GE Expectation Suite, (b) a dbt YAML schema block, and (c) a Soda Core checks file. For each, show how to add a severity (warn-vs-fail) so the on-call only pages on >10 failing rows.

Input.

order_id	amount
1	50
2	-10
3	NULL
4	25
5	0

Code.

# GE — Expectation Suite with action-list severity
suite = context.add_expectation_suite("orders_amount")
validator = context.get_validator(
    batch_request=orders_batch_request, expectation_suite=suite
)
validator.expect_column_values_to_not_be_null("amount")
validator.expect_column_values_to_be_between(
    "amount", min_value=0, strict_min=True,
    mostly=0.99,  # 99% must pass — under that, fail
)
validator.save_expectation_suite()
# Checkpoint routes: <0.95 -> page; <0.99 -> Slack warn

# dbt — schema.yml with severity and threshold
models:
  - name: orders
    columns:
      - name: amount
        tests:
          - not_null:
              severity: error
              error_if: ">= 10"
              warn_if: ">= 1"
          - dbt_utils.expression_is_true:
              expression: "> 0"
              severity: warn

# Soda — checks file with warn / fail thresholds
checks for orders:
  - missing_count(amount) = 0:
      warn: when between 1 and 9
      fail: when > 9
  - min(amount) > 0:
      warn: when min(amount) <= 0

Step-by-step explanation.

Each framework expresses "0% missing, all positive" with two assertions. The first asserts non-NULL; the second asserts positivity.
GE's mostly=0.99 lets the check pass when 99%+ of rows satisfy — the closest GE primitive to "warn at 1 failing row, fail at 10."
dbt's error_if / warn_if is the cleanest per-test threshold expression — semantically "fail if 10 or more failing rows, warn if 1 or more."
Soda's warn: when between 1 and 9 / fail: when > 9 is the most declarative — the threshold reads like English.
All three correctly classify the input: rows 2, 3, 5 are failing (negative, NULL, zero). With 3 failing rows out of 5, the dbt test would warn (≥1) but not error (<10); Soda would do the same; GE with mostly=0.99 would fail because 60% pass < 99%.

Output.

Framework	Failing rows	Severity classification
GE (mostly 0.99)	3 of 5	Fail (success rate 40% < 99%)
dbt (warn ≥1, error ≥10)	3	Warn
Soda (warn 1-9, fail >9)	3	Warn

Rule of thumb. When migrating between frameworks, the vocabulary is a clean translation but the severity semantics require careful matching. GE thinks in success-rate percentages; dbt and Soda think in failing-row counts. Match the threshold by computing the equivalent.

Worked example — a custom assertion (regex-validate email)

Detailed explanation. A frequent custom-logic check: "every value in users.email matches a basic email regex." None of the three frameworks ship this as a built-in (regex is fast-moving and dialect-specific), so each one's custom-assertion mechanism is exercised.

Question. Write a regex-validation assertion for users.email in (a) a GE custom expectation, (b) a dbt custom generic test, and (c) a SodaCL user-defined metric.

Input.

user_id	email
1	alice@x.com
2	not-an-email
3	bob@y.org
4	NULL

Code.

# GE — using built-in expect_column_values_to_match_regex
validator.expect_column_values_to_match_regex(
    "email",
    r"^[^@\s]+@[^@\s]+\.[^@\s]+$",
    mostly=1.0,
)

-- dbt — tests/generic/test_matches_regex.sql
{% test matches_regex(model, column_name, pattern) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} IS NOT NULL
  AND NOT regexp_like({{ column_name }}, '{{ pattern }}')
{% endtest %}

-- Invoked in schema.yml
models:
  - name: users
    columns:
      - name: email
        tests:
          - matches_regex:
              pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"

# Soda — user-defined metric with valid regex clause
checks for users:
  - invalid_count(email) = 0:
      valid regex: '^[^@\s]+@[^@\s]+\.[^@\s]+$'

Step-by-step explanation.

GE has expect_column_values_to_match_regex built in — the simplest of the three. Pass the regex and mostly threshold; done.
dbt does not ship regex tests, so a custom generic test wraps regexp_like in a Jinja macro. The macro returns failing rows (the dbt contract for any test); dbt counts them and applies severity.
Soda Core ships valid regex as a clause on invalid_count — almost as concise as GE. The check fails when any non-NULL email fails the regex.
All three skip NULLs by convention — regex validation is a content check, not a presence check. Pair with missing_count = 0 if NULLs should also fail.
For the input, all three flag user_id=2 (not-an-email). NULL user_id=4 is skipped.

Output.

Framework	Failing rows	Notes
GE	1	Built-in expectation, smallest LOC
dbt	1	Custom generic test, reusable across columns
Soda	1	`valid regex` clause built-in

Rule of thumb. For common custom logic (regex, range with units, foreign-key with filter), check if the framework already ships it before writing a custom assertion. For bespoke custom logic (cross-table invariants, time-bounded distributions), the custom mechanism is unavoidable — write it once, reuse forever.

SQL interview question on vocabulary alignment

A senior interviewer might ask: "Your team is consolidating a Soda Core checks library into a dbt project for a warehouse-only audit. Walk me through how you'd migrate row_count > 0, missing_count(col) = 0, and freshness(col) < 1d from SodaCL to dbt YAML — and what semantic gaps to flag for the team."

Solution Using a SodaCL-to-dbt migration table + a dbt_expectations bridge

# Original SodaCL checks
checks for daily_revenue:
  - row_count > 0
  - missing_count(revenue) = 0
  - freshness(day) < 1d
  - duplicate_count(day) = 0

# Migrated dbt schema.yml + dbt_utils + dbt_expectations
sources:
  - name: marts
    tables:
      - name: daily_revenue
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
        loaded_at_field: day

models:
  - name: daily_revenue
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
    columns:
      - name: revenue
        tests:
          - not_null
      - name: day
        tests:
          - unique

Step-by-step trace.

SodaCL clause	dbt equivalent	Semantic gap
`row_count > 0`	`dbt_expectations.expect_table_row_count_to_be_between` (min_value=1)	None
`missing_count(revenue) = 0`	`not_null` test on column	None
`freshness(day) < 1d`	`sources.freshness` block, `error_after: 24h`	dbt source freshness runs separately via `dbt source freshness`, not inline in `dbt build`
`duplicate_count(day) = 0`	`unique` test on column	None

The migration is direct for three of four clauses. The freshness check moves from inline (Soda) to a separate dbt source freshness command (dbt) — that is the semantic gap to flag in the migration plan.

Output:

Aspect	SodaCL (before)	dbt (after)
Lines of YAML	4	14
Single-command run	`soda scan ...`	needs both `dbt build` and `dbt source freshness`
Severity granularity	per-clause `warn`/`fail`	per-test `severity: warn`/`error`
Anomaly checks (built-in)	`anomaly score` clause	needs Elementary or singular test

Why this works — concept by concept:

Vocabulary isomorphism — missing_count = 0 maps directly to not_null; duplicate_count = 0 maps to unique; the semantics are identical.
Freshness is split in dbt — source freshness lives in the source config (dbt source freshness), not in dbt build. The migration must add a second command to the orchestrator.
dbt_expectations bridges the row-count gap — the built-in dbt test library covers the column-level surface; dbt_expectations adds the table-level checks Soda has natively.
Severity translation is direct — Soda's warn: / fail: maps to dbt's severity: warn / severity: error with warn_if / error_if thresholds.
Anomaly is the residue — SodaCL's anomaly score and change clauses do not migrate cleanly to dbt. Layer Elementary on top, or accept that anomaly detection moves out of the DQ framework and into a separate observability tool.
Cost — same per-row cost (every framework compiles to similar SQL). The migration cost is a one-time YAML rewrite plus one extra command in the DAG for freshness.

SQL
Topic — data validation
Data validation problems (SQL)

Practice →

4. Where they plug into the pipeline

Placement (pre-ingest vs in-transform vs post-load) decides what you can quarantine — and that decision is bigger than the framework choice

The mental model in one line: GE plugs in via Python operators (Airflow / Dagster / Prefect), dbt tests run as part of dbt build in the warehouse, and Soda scans run as a CLI step anywhere in the DAG. Each placement carries a different blast radius when a check fails — fail-fast at ingest vs warn-only at the mart — and that contract is what the on-call cares about most.

Three placement archetypes.

Pre-ingest. Assertion runs before the data lands in the warehouse — usually inside an Airflow / Dagster task that pulls a file or API response. Fail-fast: the bad batch is quarantined and never enters the table.
In-transform. Assertion runs between transform steps — dbt build interleaves models and tests, so a failed test halts the downstream models in the same DAG.
Post-load. Assertion runs after the data is in the table, often on a schedule independent of the producing pipeline. Cannot fail-fast — the bad data is already there — but is the only option for cross-pipeline checks like freshness and anomaly.

Great Expectations — Checkpoints in Airflow / Dagster / Prefect.

# Airflow — GreatExpectationsOperator
from airflow_provider_great_expectations.operators.great_expectations import (
    GreatExpectationsOperator,
)

ingest_dq = GreatExpectationsOperator(
    task_id="ingest_orders_dq",
    data_context_root_dir="/opt/ge",
    checkpoint_name="raw_orders_smoke",
    fail_task_on_validation_failure=True,
)

land_to_warehouse = SnowflakeOperator(
    task_id="land_to_warehouse", sql="COPY INTO ...",
)

ingest_dq >> land_to_warehouse  # land only if DQ passes

The Checkpoint binds a Suite to a batch, runs it, ships results to Data Docs, and (optionally) raises an exception when validation fails — which the operator translates into a failed task and a paged on-call.

dbt tests — dbt build interleaving.

# Single command runs every model + test in DAG order
dbt build --select staging.+ marts.+

# Equivalent to (conceptually):
#   for each model in topological order:
#     run model
#     run tests configured on this model
#     if any test fails: halt downstream

The interleaving is dbt build's key feature. dbt run (models only) plus a later dbt test (tests only) does not halt downstream models on a test failure — the difference matters and shows up on senior interviews.

Soda Core — CLI step or Soda Cloud agent.

# CLI step inside Airflow BashOperator
soda scan -d snowflake \
  -c configuration.yml \
  checks/marts.yml

# Or, the Soda Cloud agent runs on a managed schedule
# without needing the DAG to call it explicitly.

The CLI version drops into any orchestrator that can run a shell command. The Soda Cloud agent is the managed alternative — useful when you want a centralised dashboard and Slack routing without writing Airflow tasks per dataset.

Failure semantics — three modes.

Fail the run. The task / model / scan fails; the DAG halts; on-call is paged. Use at ingest (GE) and inside dbt build.
Warn-only. Log the failure to Data Docs / run_results.json / Soda Cloud; do not halt the DAG. Use for known-noisy checks (anomaly thresholds, soft enums) until the noise is calibrated.
Quarantine-and-continue. Bad rows are written to a side table; good rows continue to the main table; the DAG does not fail. Use when partial data is better than no data (e.g., reporting on a partial batch is acceptable; missing the batch entirely is not).

Result stores.

GE → Data Docs. Auto-generated HTML report site. Hosts on S3 / GCS / a local FS; rebuilt after every Checkpoint.
dbt → run_results.json + manifest. Machine-readable artefacts; consumed by Elementary, dbt Cloud, or custom dashboards.
Soda → file / warehouse table / Soda Cloud. Most flexible: CLI emits JSON / writes to a warehouse table you specify; Soda Cloud (paid) ingests into a managed dashboard.

Alerting payloads.

GE. Action list per Checkpoint; built-in Slack action, EmailAction, OpsgenieAction. Payload includes a link to the Data Docs page.
dbt. run_results.json is the source; downstream tools (Elementary, dbt Cloud, custom) translate to Slack / PagerDuty.
Soda. Native Slack / Webhook / Soda Cloud integration. Payload includes the failing check, failing row count, and a link to the Cloud dashboard.

Worked example — GE Checkpoint inside an Airflow DAG that fails fast on ingest

Detailed explanation. The textbook ingest pattern: pull → DQ → land. A GreatExpectationsOperator fires between the pull and the land; on failure, the batch is quarantined and the load is skipped. This is the only pattern that guarantees the bad row never reaches the warehouse.

Question. Write an Airflow DAG with three tasks: pull a daily orders dump, run a GE Checkpoint, and land it into Snowflake. The DQ task must halt the DAG on failure.

Input.

Task	Purpose
pull_orders	Download the API dump to S3
dq_orders	Run `raw_orders_smoke` Checkpoint
land_to_warehouse	COPY INTO Snowflake

Code.

from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.s3 import S3CreateObjectOperator
from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator
from airflow_provider_great_expectations.operators.great_expectations import (
    GreatExpectationsOperator,
)

with DAG(
    "orders_ingest",
    start_date=datetime(2026, 6, 15),
    schedule="@daily",
    catchup=False,
) as dag:
    pull_orders = S3CreateObjectOperator(
        task_id="pull_orders",
        s3_bucket="ingest",
        s3_key="orders/{{ ds }}.parquet",
        data="<<api response>>",
    )

    dq_orders = GreatExpectationsOperator(
        task_id="dq_orders",
        data_context_root_dir="/opt/ge",
        checkpoint_name="raw_orders_smoke",
        fail_task_on_validation_failure=True,
    )

    land_to_warehouse = SnowflakeOperator(
        task_id="land_to_warehouse",
        sql="""
            COPY INTO raw.orders
            FROM @ingest_stage/orders/
            PATTERN = '.*{{ ds }}\\.parquet';
        """,
    )

    pull_orders >> dq_orders >> land_to_warehouse

Step-by-step explanation.

The DAG runs daily. pull_orders writes the API response to S3.
dq_orders reads the same Parquet file and runs the raw_orders_smoke Checkpoint. The Checkpoint is configured to assert not_null(customer_id), accepted_values(status), and row_count > 100.
If the Checkpoint succeeds, land_to_warehouse runs and the bad row never had the chance to reach the warehouse.
If the Checkpoint fails, fail_task_on_validation_failure=True raises an exception in the operator; Airflow marks the task as failed; land_to_warehouse is skipped; the on-call is paged via the standard Airflow on-failure callback.
The batch sits in S3 untouched — replaying after a fix is a simple manual re-trigger of land_to_warehouse after the upstream issue is resolved.

Output.

Day	dq_orders status	land_to_warehouse status
2026-06-14	passed	succeeded
2026-06-15	failed (bad enum)	skipped
2026-06-16	passed (after upstream fix)	succeeded

Rule of thumb. The "pull → DQ → land" pattern is the default for ingest DAGs. Add it once per source; every future bad batch is caught before it touches the warehouse.

Worked example — `dbt build` halts downstream models on a failed test

Detailed explanation. The reason senior DEs prefer dbt build over dbt run && dbt test is that build interleaves models and tests in DAG order — a failed test on the staging table halts the downstream marts in the same command. Pre-build, the same pipeline would build the mart on broken staging data and only fail at the final dbt test.

Question. Given stg_orders → int_orders_enriched → mart_daily_revenue, write a dbt build command and explain what happens when stg_orders.not_null(customer_id) fails.

Input.

Model	Dependency
stg_orders	raw.orders
int_orders_enriched	stg_orders + dim_customers
mart_daily_revenue	int_orders_enriched

Code.

# dbt build runs every node in DAG order, with tests interleaved
dbt build --select +mart_daily_revenue

# schema.yml beside stg_orders
models:
  - name: stg_orders
    columns:
      - name: customer_id
        tests:
          - not_null:
              severity: error

Step-by-step explanation.

dbt build materialises stg_orders first.
Immediately after materialisation, it runs the tests configured on stg_orders — including not_null(customer_id).
If not_null(customer_id) fails (severity: error), dbt marks the test as failed and halts downstream — int_orders_enriched and mart_daily_revenue are skipped.
The downstream models retain their previous values (the prior day's data), which is usually safer than overwriting with a half-broken build.
The on-call receives a failure notification from dbt Cloud / Airflow / Elementary with the failing test name and the failing row count.

Output.

Node	dbt build status
stg_orders	materialised + test failed
int_orders_enriched	skipped (upstream test failed)
mart_daily_revenue	skipped (upstream test failed)

Rule of thumb. Always run dbt build, never dbt run followed by dbt test. The interleaved semantics is the entire reason build exists — and it is the only way to halt downstream materialisation on a failed test in a single command.

SQL interview question on alert routing

A senior interviewer might ask: "You have 200 DQ tests across GE, dbt, and Soda. The team is drowning in Slack alerts. Design the routing policy so the on-call only pages on actionable failures."

Solution Using a tiered severity + routing policy

# Tier 1 — critical (page on-call)
# Run: at ingest, in dbt build, on every Soda scan
# Severity: error / fail
# Route: PagerDuty + #data-oncall

# Tier 2 — important (Slack-only)
# Run: in dbt build, in Soda scan
# Severity: warn (single failing row tolerated)
# Route: #data-quality channel

# Tier 3 — informational (dashboard-only)
# Run: in Soda anomaly scan, in Elementary anomaly job
# Severity: anomaly_score < 0.5
# Route: Soda Cloud / Elementary dashboard, no Slack

# Example dbt config
models:
  - name: stg_orders
    columns:
      - name: customer_id
        tests:
          - not_null:           # tier 1
              severity: error
          - dbt_expectations.expect_column_value_lengths_to_be_between:  # tier 2
              min_value: 1
              max_value: 64
              severity: warn

Step-by-step trace.

Tier	Trigger	Route	Expected daily volume
1 — critical	`not_null`, `unique`, schema	PagerDuty + Slack #data-oncall	0–1 per day
2 — important	enum, range, FK	Slack #data-quality	1–5 per day
3 — informational	anomaly, freshness drift	Dashboard	dozens per day, no human triage

The policy is enforced at the framework level (severity: clauses) plus the routing config (PagerDuty integration / Slack webhooks).

Output:

Channel	Daily volume	On-call paged?
PagerDuty	0–1	yes (critical only)
#data-oncall	0–1	yes (mirrors PagerDuty)
#data-quality	1–5	no — async triage
Soda Cloud dashboard	dozens	no — review weekly

Why this works — concept by concept:

Tiered severity — every check has a tier baked in. Tier 1 is the contract; tier 2 is the smell; tier 3 is the signal. Each tier routes to a different channel with different expected response times.
Page only on contract violations — the on-call sleeps unless a column the business depends on is broken. Everything else waits for business hours.
Dashboard for anomalies — anomaly scores are noisy by nature; they belong on a dashboard reviewed weekly, not in a Slack channel that paged everyone last Tuesday.
Severity at the framework level — severity: error in dbt, fail: clauses in Soda, action_list in GE — the framework decides whether to escalate, not a custom Python receiver.
Audit the routing quarterly — every quarter, count alerts per channel. If #data-quality fired 200 times last quarter, the threshold is wrong or the tests are stale.
Cost — zero additional pipeline cost; pure config / routing change. The cost was already paid the day the tests were written.

SQL
Topic — data validation
Data validation problems (SQL)

Practice →

5. Coverage strategy — what to test where, without drowning

80% of outages are caught by `not_null + unique + freshness` on contract columns — the rest is incremental insurance

The mental model in one line: tier datasets by blast radius (gold = consumed by exec dashboards, silver = consumed by analysts, bronze = raw ingestion), and increase test density per tier. A flat "test everything equally" strategy either bankrupts the warehouse budget or drowns the team in noise — neither failure mode survives a quarterly review.

The 80/20 rule of DQ.

not_null on every contract column. The column the business named in the data contract gets a not_null test, full stop.
unique on every primary key. Including surrogate keys, natural keys, and grain-defining composite keys.
freshness on every table consumed by anything that runs on a schedule. A dashboard that refreshes daily needs a freshness check at ingest and at the mart.
accepted_values on every enum. Status, type, category, country code — any column with a fixed domain.
relationships (FK) on every join key. Cheap to write; catches half of "the join lost 12% of the rows" tickets.

Tiered datasets by blast radius.

Tier	Examples	Test density	Run frequency
Gold	exec dashboards, regulatory reports, revenue metrics	100% contract coverage + anomaly + distribution	every run
Silver	analyst-facing marts, business-team reports	not_null + unique + accepted_values + freshness	every run
Bronze	raw ingest, change-data-capture tables	row_count + freshness	every run

The pyramid logic: gold tier is narrow (fewer tables) but dense (many tests); bronze is wide (many tables) but sparse (few tests). The aggregate cost stays manageable while the contract layer gets the strictest scrutiny.

Schema contracts — three flavours.

dbt contracts. Built-in to dbt 1.5+. Declare the contract in YAML; dbt enforces column names + types at build time. Native to the dbt build flow.
GE schema expectations. expect_table_columns_to_match_ordered_list + expect_column_values_to_be_of_type. Asserts at runtime; integrates with Data Docs.
SodaCL schema checks. schema: clause with fail: when wrong column type / fail: when forbidden column present. Most ergonomic in YAML.

Anomaly checks layered on top of deterministic rules.

The deterministic layer (not_null, unique, FK) catches new breakage. The anomaly layer (row-count drift, distribution shift, freshness lag) catches gradual breakage. Together they cover both the cliff-edge failures and the slow-burn failures. Layer order matters: deterministic first, anomaly second — never the other way around.

Cost of running checks.

Warehouse credits. Every dbt test compiles to a SELECT; every Soda check is a SELECT; GE checks on Spark / SQL also compile to SELECT. Tests on huge tables can dominate the build budget. Mitigate with sampled tests on bronze and full tests on gold.
Run-time budget. A 200-test dbt run that doubles the build time blows the SLA. Profile tests with dbt --debug and prune the slowest.
False-positive fatigue. A test that fires once a week without being actionable trains the team to ignore it — and then a real failure goes unseen. Audit fired-but-ignored tests quarterly.

When to combine frameworks.

dbt tests for the transform layer. Lowest friction; YAML is already there.
GE Checkpoints at ingest + critical anomaly. Python-native, custom expectations for cross-table invariants.
Soda scans for cross-source freshness + multi-warehouse audits. SodaCL is the only declarative answer for "test the same metric across Snowflake and Postgres."

Migration patterns.

Start on dbt tests. Lowest cost of entry; covers warehouse layer end-to-end.
Layer Soda for freshness + anomaly. Once dbt covers the contract, Soda adds the cross-cutting checks dbt does not express natively.
Add GE at ingest. When the team has a bad-batch story to tell (e.g., "API silently changed an enum"), GE pays for itself in one quarter.
Optional: Elementary on top of dbt. Adds anomaly detection and a dashboard without leaving the dbt project.

Worked example — coverage matrix for a 12-table mart

Detailed explanation. A team has 12 mart tables: 2 gold, 4 silver, 6 bronze. They want a coverage matrix that fits inside a 5-minute test budget. Building the matrix surfaces the trade-offs concretely.

Question. Design a coverage matrix for the 12 tables. Specify which tests run on which tier and estimate the test count.

Input.

Table	Tier	Row count
mart_executive_revenue	gold	10M
mart_executive_users	gold	50M
mart_finance_invoices	silver	5M
mart_finance_payments	silver	8M
mart_marketing_attribution	silver	100M
mart_marketing_funnel	silver	200M
stg_orders	bronze	500M
stg_payments	bronze	1B
stg_users	bronze	50M
raw_orders	bronze	500M
raw_payments	bronze	1B
raw_users	bronze	50M

Code.

# Gold tier — dense
models:
  - name: mart_executive_revenue
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
    columns:
      - name: day      { tests: [unique, not_null] }
      - name: revenue  { tests: [not_null] }
      - name: region   { tests: [not_null, accepted_values: { values: ['EU','US','APAC'] }] }

# Silver tier — moderate
  - name: mart_finance_invoices
    columns:
      - name: invoice_id  { tests: [unique, not_null] }
      - name: amount      { tests: [not_null] }

# Bronze tier — minimal (run as Soda freshness only)

# Soda — bronze tier freshness
checks for stg_orders:
  - freshness(created_at) < 6h
checks for raw_orders:
  - freshness(loaded_at) < 6h

Step-by-step explanation.

Gold tables get full coverage: row_count, not_null on every contract column, unique on PKs, accepted_values on enums. Estimated 8 tests per gold table → 16 tests.
Silver tables get the essentials: not_null on contract columns, unique on PKs. Estimated 4 tests per silver table → 16 tests.
Bronze tables get a single freshness check each (via Soda, not dbt). Estimated 1 test per bronze table → 6 tests.
Total: 38 tests across the 12 tables. Fits comfortably in 5 minutes of warehouse time on a Snowflake medium warehouse.
If the test budget grows, add anomaly checks to the gold tier first (highest blast radius). Add range / FK tests to silver next. Bronze tier stays minimal.

Output.

Tier	Tables	Tests per table	Total tests	Approx run time
Gold	2	8	16	~2 min
Silver	4	4	16	~2 min
Bronze	6	1	6	~30s
Total	12	—	38	~4.5 min

Rule of thumb. Estimate test budget before writing tests. If the matrix exceeds the SLA, tier down (gold loses anomaly, silver loses FK). If the matrix fits with room, tier up (silver gains anomaly, bronze gains row_count).

Worked example — quarantine pattern with Soda + dbt

Detailed explanation. A team needs partial-data continuity: if 1% of rows fail the DQ check, the report should still ship on the 99% of good rows, with the bad rows quarantined to a side table for investigation. This is the "quarantine-and-continue" pattern, and it requires the assertion to live inside the transform, not as a separate test step.

Question. Write a dbt model that materialises only the rows that pass amount > 0 AND status IN (...), and writes the failing rows to a _quarantine side table. Show the Soda check that monitors the quarantine table size.

Input.

order_id	amount	status
1	50	placed
2	-10	placed
3	25	unknown
4	0	paid
5	75	paid

Code.

-- models/marts/orders_clean.sql
{{ config(materialized='incremental') }}

WITH passed AS (
  SELECT *
  FROM {{ ref('stg_orders') }}
  WHERE amount > 0
    AND status IN ('placed','paid','shipped','cancelled')
)

SELECT * FROM passed

-- models/marts/orders_quarantine.sql
{{ config(materialized='incremental') }}

SELECT *,
       CURRENT_TIMESTAMP AS quarantined_at,
       CASE
         WHEN amount IS NULL OR amount <= 0 THEN 'bad_amount'
         WHEN status NOT IN ('placed','paid','shipped','cancelled') THEN 'bad_status'
       END AS quarantine_reason
FROM {{ ref('stg_orders') }}
WHERE amount IS NULL
   OR amount <= 0
   OR status NOT IN ('placed','paid','shipped','cancelled')

# Soda — monitor quarantine table size
checks for orders_quarantine:
  - row_count < 1000:
      warn: when between 100 and 999
      fail: when >= 1000

Step-by-step explanation.

orders_clean filters the passing rows. The downstream marts read from this clean table — the bad rows never propagate.
orders_quarantine writes the failing rows to a side table with a quarantine_reason column. The investigation team can query this table directly to triage upstream bugs.
The Soda check on orders_quarantine monitors the size of the quarantine. Below 100: silent. 100–999: warn. ≥1000: fail (something systemic is happening upstream).
The DAG does not fail unless the quarantine size crosses the fail threshold — partial data continues to flow, the reporting layer is happy, the on-call only wakes when the failure rate is genuinely abnormal.
The trade-off: downstream metrics computed on orders_clean exclude the bad rows. Be explicit about this in the metric's definition ("excluding orders with invalid amount or status").

Output.

Table	Rows	Downstream consumers
orders_clean	3 (ids 1, 5, plus reclassifications)	marts, BI
orders_quarantine	3 (ids 2, 3, 4 with reasons)	DQ on-call investigation

Rule of thumb. Use quarantine-and-continue when partial data is materially better than no data — most reporting use cases qualify. Use fail-the-run only when partial data is worse than no data, e.g., financial reconciliation where a missing transaction is worse than a delayed report.

SQL interview question on combining frameworks

A senior interviewer might frame this as: "You have a Postgres source, a Snowflake warehouse, and a dbt project. You need DQ coverage from raw ingestion to the executive dashboard. How would you split the work across GE, dbt tests, and Soda Core — and why?"

Solution Using a three-framework layered architecture

# Layer 1 — Pre-ingest (Great Expectations)
# Where: Airflow Python task before Snowflake COPY
# What: not_null on PKs, accepted_values on enums, row_count
# Why: fail-fast quarantine; bad batch never reaches Snowflake

# Layer 2 — In-transform (dbt tests + dbt_expectations)
# Where: dbt build, schema.yml beside each model
# What: not_null, unique, accepted_values, relationships, FK
# Why: gates downstream model materialisation; YAML lives next to the model

# Layer 3 — Post-load freshness + anomaly (Soda Core)
# Where: Soda scan after dbt build completes
# What: freshness, row_count anomaly, distribution drift, cross-source
# Why: declarative anomaly clauses; cross-source checks dbt cannot express

Step-by-step trace.

Layer	Framework	Test count (illustrative)	When it runs	Blast radius if it fires
1	GE	12	Airflow ingest task	Quarantine batch, skip Snowflake load
2	dbt	60	`dbt build`	Halt downstream models in DAG
3	Soda	20	Post-build scan	Alert mart owner; no downstream halt

Output:

Concern	Owned by
"Did the right data land?"	GE (ingest)
"Did the transform preserve the contract?"	dbt (in-build)
"Is the mart fresh and within bounds?"	Soda (post-load)
"Is the dashboard right?"	All three combined

Why this works — concept by concept:

Layered framework choice — each framework owns the layer where it plugs in cleanest. No single framework spans all three layers without significant friction.
Fail-fast at ingest, gate at transform, alert at mart — the failure mode scales down by layer. Ingest = block; transform = halt downstream; mart = alert and inspect.
YAML proliferation is okay — both dbt and Soda are YAML-first; engineers learn one syntax (declarative DQ) once and apply it twice. The duplication cost is lower than the integration cost of forcing one framework to do all three jobs.
**Soda Cloud is optional__ — the Soda layer works on Soda Core alone (free OSS). Adding Soda Cloud later gives a managed dashboard without rewriting any check.
Severity per layer — ingest = fail, transform = fail (severity:error) or warn (severity:warn), mart = warn / dashboard. Routing matches the response time the team can sustain.
Cost — three frameworks add maybe 5 minutes to the daily DAG. Compared to the cost of a silent week-long outage, it is the cheapest insurance the team will ever buy.

SQL
Topic — validation
Validation problems (SQL)

Practice →

SQL
Topic — null handling
NULL handling problems (SQL)

Practice →

Cheat sheet — DQ framework recipes

"Test inside the warehouse, beside the model." dbt generic + singular tests. YAML lives next to schema.yml; dbt build interleaves models and tests; failed test halts downstream.
"Python-heavy stack with Airflow / Dagster / Prefect." Great Expectations Checkpoints inside an operator. Pull → DQ → land is the textbook pattern; fail_task_on_validation_failure=True quarantines bad batches.
"YAML-first, multi-source, want a cloud dashboard." Soda Core (free) + optional Soda Cloud (paid). SodaCL compiles to source-dialect SQL; CLI drops into any DAG.
"DataFrame validation in code." pandera for pandas / Polars / PySpark schemas; Deequ / PyDeequ for Spark. Apply at the boundary of every script.
"Schema contract at the edge." dbt contracts (in-warehouse, declared in YAML) or GE schema expectations (expect_table_columns_to_match_ordered_list) or SodaCL schema: clause.
"Anomaly + freshness without writing rules." Layer Soda Core (native anomaly score + change clauses) or Elementary (on top of dbt) or a vendor (Monte Carlo, Bigeye) on top of the deterministic test layer.
"Quarantine bad rows, continue with good ones." Split the model into _clean (WHERE pass) and _quarantine (WHERE fail) materialisations inside dbt. Monitor quarantine size with a Soda check.
"Warn on 1 failing row, page on 10+." dbt: severity: error + error_if: ">= 10" + warn_if: ">= 1". Soda: warn: when between 1 and 9 / fail: when > 9. GE: mostly threshold + Checkpoint action_list routing.
"Cross-source DQ (compare Postgres to Snowflake)." SodaCL — only one of the three frameworks that expresses cross-source declaratively. GE / dbt require glue code.
"Backfill DQ over the last 30 days after fixing the upstream bug." GE: re-run the Checkpoint with a date-range BatchRequest. dbt: dbt test --select stg_orders --vars '{"start_date":"2026-05-15"}'. Soda: parametrise the checks file with a date variable.
"DQ in a notebook for ad hoc inspection." GE — the Python API loads a Suite, runs against a DataFrame, returns a result object you can render inline. The only one of the three that is notebook-ergonomic.

Frequently asked questions

Should I use Great Expectations or dbt tests if I already use dbt?

Use dbt tests for the warehouse-native transform layer — the YAML is already there, dbt build interleaves models and tests, and failed tests halt downstream in a single command. Reach for Great Expectations when you need to assert before the data lands in the warehouse (ingest checks inside Airflow / Dagster), when you need rich custom expectations that go beyond what dbt_expectations covers, or when Data Docs is the human-readable report your stakeholders want. Most mature teams run both: dbt tests own the transform; GE owns ingest + critical anomaly. They do not compete — they cover different layers.

What does Soda Core add that dbt tests don't?

Three things. First, native cross-source checks — SodaCL can run the same check against Postgres and Snowflake without rewriting the SQL, which dbt cannot do declaratively. Second, native anomaly detection — anomaly score for row_count and change for row_count > -10% are built-in clauses, whereas dbt needs Elementary or a singular test. Third, ergonomic freshness — freshness(updated_at) < 1d is a one-line check, whereas dbt source freshness lives in a separate config and runs via a separate command. Adopt Soda Core when your stack is multi-source or when anomaly / freshness coverage is the gap.

Are Great Expectations Checkpoints overkill for small teams?

For a team of one to three engineers running a single dbt project, yes — GE adds setup ceremony (Data Context, Suite, Checkpoint, action_list) that pays off only when you have multiple datasets, multiple environments, or stakeholders who consume Data Docs. Start with dbt tests; add Soda Core when you need freshness or anomaly; bring in GE when you have a bad-batch story (e.g., the API changed an enum and broke the warehouse) and you need fail-fast ingest assertions. The migration cost from "no GE" to "GE at ingest" is a sprint — small teams should defer that sprint until the value is proven.

Can I run Soda Core without Soda Cloud?

Yes — Soda Core is the OSS engine and runs entirely from the CLI or as a Python library. You write a checks.yml, run soda scan -d <warehouse> checks.yml, and the engine prints results to stdout / writes JSON to a file / inserts into a warehouse result table. Soda Cloud (paid) is the managed dashboard + Slack-alert layer that consumes Soda Core scan results, but it is strictly optional. Many teams run Soda Core forever and route results to their own dashboard (Grafana, Metabase, a custom dbt-rendered HTML page) — no licence fee, no vendor dependency.

How do I avoid alert fatigue from data quality tests?

Tier every check by severity before writing it. Tier 1 (contract: not_null on PKs, unique on grain keys, accepted_values on enums) pages the on-call via PagerDuty. Tier 2 (smell: range checks, FK relationships, length bounds) goes to a Slack channel with no paging. Tier 3 (anomaly: row-count drift, freshness lag) goes to a dashboard reviewed weekly. Audit the alert volumes quarterly — if Tier 2 fired 200 times last quarter and nothing was actionable, either the threshold is wrong or the tests are stale. The most expensive class of alert is the one that fires repeatedly without ever being acted on — it trains the team to ignore the next real failure.

Where do dbt contracts fit relative to all three frameworks?

dbt contracts (introduced in dbt 1.5) declare the expected column names + types of a model in YAML, and dbt enforces them at build time — a column rename or type change breaks dbt build before the model ships. Contracts complement but do not replace the assertion frameworks. Use contracts for structural guarantees (the schema is what it claims to be); use dbt tests for content guarantees (the values are not NULL, are unique, are in the enum). GE and Soda offer their own schema-assertion mechanisms (expect_table_columns_to_match_ordered_list, SodaCL schema: clause) which serve the same role for non-dbt or pre-warehouse layers. Adopt contracts inside dbt; mirror the same schema assertion at the ingest layer with GE or Soda for end-to-end structural coverage.

Practice on PipeCode

Drill the data validation practice library → for the not_null / unique / accepted_values / FK family of probes.
Rehearse on validation problems → for schema and range assertions across SQL dialects.
Stack the NULL handling drills → for the IS NULL / NOT IN / DISTINCT FROM family that every DQ engineer needs cold.
Layer the aggregation library → for the COUNT(*) vs COUNT(col) family that audits use.
Stack the joins practice library → for the LEFT JOIN + COALESCE patterns the quarantine-and-continue model needs.
For the broader surface, read top data engineering interview questions →.
Stack the prerequisites with the only 5 skills you need to become a data engineer →.
Sharpen the warehouse axis with the SQL for data engineering interviews course →.
For long-form pipeline craft, work through ETL system design for DE interviews →.

Pipecode.ai is Leetcode for Data Engineering — every DQ recipe above ships with hands-on practice rooms where you write the GE Suite, the dbt YAML, and the SodaCL check against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your `not_null + unique + freshness` coverage actually behaves the same on Snowflake as on Postgres.

Practice data validation now →
Validation drills →

1. Why pick a data quality framework at all

Silent data corruption is the most expensive class of pipeline bug — and BI dashboards are the worst place to catch it

Worked example — three places NULL can break a dashboard and the four assertion families that catch each one

Worked example — the cost of not having a DQ framework

SQL interview question on detecting a silent NULL leak

Solution Using a layered audit + retroactive DQ assertion plan

2. The DQ framework landscape in 2026

Five open-source frameworks dominate — pick by team shape and stack, not by feature count

Worked example — the same assertion in three frameworks

Worked example — distribution / anomaly checks where the frameworks diverge

SQL interview question on choosing a DQ framework

Solution Using a layered framework decision tree

3. Expectations vs tests vs checks — the same idea, three vocabularies

Once you align the three vocabularies on the same row of a matrix, every DQ migration is a syntax exercise — not a semantic one

Worked example — orders.amount > 0 AND amount IS NOT NULL in all three frameworks

Worked example — a custom assertion (regex-validate email)

SQL interview question on vocabulary alignment

Solution Using a SodaCL-to-dbt migration table + a dbt_expectations bridge

4. Where they plug into the pipeline

Placement (pre-ingest vs in-transform vs post-load) decides what you can quarantine — and that decision is bigger than the framework choice

Worked example — GE Checkpoint inside an Airflow DAG that fails fast on ingest

Worked example — dbt build halts downstream models on a failed test

SQL interview question on alert routing

Solution Using a tiered severity + routing policy

5. Coverage strategy — what to test where, without drowning

80% of outages are caught by not_null + unique + freshness on contract columns — the rest is incremental insurance

Worked example — coverage matrix for a 12-table mart

Worked example — quarantine pattern with Soda + dbt

SQL interview question on combining frameworks

Solution Using a three-framework layered architecture

Cheat sheet — DQ framework recipes

Frequently asked questions

Should I use Great Expectations or dbt tests if I already use dbt?

What does Soda Core add that dbt tests don't?

Are Great Expectations Checkpoints overkill for small teams?

Can I run Soda Core without Soda Cloud?

How do I avoid alert fatigue from data quality tests?

Where do dbt contracts fit relative to all three frameworks?

Practice on PipeCode

Worked example — `orders.amount > 0 AND amount IS NOT NULL` in all three frameworks

Worked example — `dbt build` halts downstream models on a failed test

80% of outages are caught by `not_null + unique + freshness` on contract columns — the rest is incremental insurance