DEV Community

Cover image for dbt Model Contracts, Constraints & Versioning: Production Patterns
Gowtham Potureddi
Gowtham Potureddi

Posted on

dbt Model Contracts, Constraints & Versioning: Production Patterns

dbt model contracts are the single biggest reason teams stopped breaking dashboards on Mondays. Before dbt 1.5 the only thing standing between a renamed column and a Tuesday-morning incident was a tribal Slack ping; after 1.5 a contract.enforced block fails the PR in CI before the rename ever lands. The shape of your warehouse — the column names, the data types, the not-null promises — is now a first-class artefact your repo owns.

This guide walks the dbt contracts + dbt constraints + dbt model versions triple end to end: where each one fits, how the dbt-Core 1.5+ feature timeline lined them up, and the dbt production patterns that make contract enforcement, schema evolution, and dbt versioning survive contact with a multi-team analytics org. Each section ships a worked example with code, a step-by-step trace, an output, and a concept-by-concept Why-this-works.

PipeCode blog header for a dbt model contracts tutorial — bold white headline 'dbt Model Contracts' with subtitle 'constraints · versions · production patterns' and a stylised contract-scroll diagram with version badges on a dark gradient and a small pipecode.ai attribution.

When you want hands-on reps alongside the reading, drill the data modelling practice library →, rehearse on dimensional modelling problems →, and tighten the schema-evolution muscles with slowly-changing-data drills →.


On this page


1. Why dbt models need contracts in production

Contracts catch the kind of bug dbt tests cannot — the interface bug, not the value bug

The one-sentence invariant: dbt tests guarantee that the rows in a model are correct; dbt model contracts guarantee that the shape of the model itself is correct — the columns it exposes, the types of those columns, and the nullability promises downstream consumers depend on. Once you internalise that "tests are about values, contracts are about interfaces," the whole production-hardening surface starts to make sense.

The three places interface bugs hide.

  • Silent column renames. Someone renames customer_email to email_address in stg_customers.sql. Every test still passes (the new column has the same values), every dashboard breaks at midnight when it tries to read the old name. No PR reviewer caught it because the column was added and the old one was removed in the same commit — the diff just looked like "edited a SELECT clause."
  • Data type drift. A staging model exposed order_total as numeric(18,2). Someone refactors and the new SQL emits numeric(38,18). The dashboard still works in dev (Postgres is loose about precision), then a Tableau live connection on Redshift fails on the first row because the consumer expected the old precision.
  • Nullability flips. dim_customer.signup_at was always non-null because the upstream model filtered out incomplete rows. A refactor removes the filter for performance. Now signup_at is sometimes NULL — downstream reverse-ETL crashes on the first NULL it sees.

The dbt-Core 1.5+ feature timeline.

  • dbt 1.5 (April 2023) shipped model contracts (contract.enforced: true) and constraints (the four kinds: not_null, unique, primary_key, foreign_key, plus check). This is the moment dbt projects gained a way to declare the public shape of a model and have the build fail if the shape drifts.
  • dbt 1.5 also shipped model versions — the versions: block, latest_version, deprecation_date, and ref('model', v=1) cross-version references. Together with contracts these three features form the "stable interface" toolkit.
  • dbt 1.6+ (July 2023 onwards) added access: modifiers (private, protected, public) and groups — so a model can be marked private to a single group of authors and ref() from outside that group fails to compile.
  • dbt 1.7+ (Q4 2023 onwards) added the unit testing framework — orthogonal to contracts but synergistic, because unit tests assert the rows that a contracted model produces.

Where contracts fit between tests, constraints, observability.

  • dbt tests. Run after the model materialises; they re-query the table and assert row-level facts (unique, not_null, accepted_values, custom singular tests). They are row-shaped assertions.
  • dbt contracts. Run before the model materialises; they assert that the SELECT's projected columns match the declared columns: block in YAML — names, types, and constraints. They are interface-shaped assertions that fail fast in PR CI.
  • dbt constraints. Translate the YAML declaration into DDL where the warehouse supports it; otherwise they remain informational metadata. They are contract reinforcement — when paired with a warehouse that enforces them, they fail the load instead of poisoning a downstream join.
  • Data observability platforms (Monte Carlo, Bigeye, Lightup). Detect drift in production after the fact — useful, but reactive. Contracts make the same drift a PR-time failure, which is two orders of magnitude cheaper to fix.

The 2026 reality.

  • Contracts are now table-stakes for public models. Any model ref()-ed from outside its owning group, exported to reverse-ETL, or surfaced in BI should have contract.enforced: true.
  • Constraints are warehouse-dependent. Postgres and Redshift (mostly) enforce them; Snowflake and BigQuery treat most as informational. dbt translates declarations to DDL in both cases, but the runtime behaviour differs.
  • Versions are how dbt does SemVer. Breaking changes get a version bump (v2, v3); non-breaking additions stay on the same version. deprecation_date and latest_version give you a 30–90 day overlap window to migrate consumers.

Worked example — the silent column rename that broke Monday

Detailed explanation. A weekend refactor of dim_customer renames signup_at to signed_up_at. Every dbt test passes (the values are unchanged). On Monday, three Looker tiles, a HubSpot reverse-ETL sync, and a Snowflake share to a partner all fail. Total time-to-detect: 14 hours. Total cost: 11 stakeholder threads and one apology email.

Question. Show the dbt YAML diff for adding contract.enforced: true to dim_customer and demonstrate how the same rename would fail in CI instead.

Input — current models/marts/customer/dim_customer.yml.

field value
name dim_customer
materialized table
columns customer_id, signup_at, email
tests unique on customer_id, not_null on signup_at

Code.

# models/marts/customer/dim_customer.yml
version: 2

models:
  - name: dim_customer
    config:
      materialized: table
      contract:
        enforced: true        # <- the upgrade
    columns:
      - name: customer_id
        data_type: bigint
        constraints:
          - type: not_null
          - type: primary_key
      - name: signup_at        # <- the contract anchor
        data_type: timestamp
        constraints:
          - type: not_null
      - name: email
        data_type: varchar
Enter fullscreen mode Exit fullscreen mode
-- models/marts/customer/dim_customer.sql AFTER the rename
SELECT
    customer_id,
    signed_up_at,             -- renamed from signup_at
    email
FROM {{ ref('stg_customer') }};
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. dbt 1.5+ compiles dim_customer.sql against the YAML contract. It runs the SELECT once in a transaction (or as a dry-run on warehouses that support it) and inspects the returned column metadata.
  2. The contract declares signup_at as a column. The SELECT returns signed_up_at instead. dbt diffs the two sets and emits a contract violation.
  3. The CI job — dbt build --select state:modified+ — fails. The PR cannot be merged. The "Monday morning incident" became a "Friday afternoon code-review comment."
  4. The author either rolls back the rename (cheap) or coordinates a versioning bump (dim_customer_v2) so consumers can migrate on their own schedule.

Output.

Compilation Error in model dim_customer
  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract
  match the columns in your model's definition.

  | column_name      | definition_type | contract_type | mismatch_reason     |
  | ---------------- | --------------- | ------------- | ------------------- |
  | signed_up_at     | TIMESTAMP       |               | missing in contract |
  | signup_at        |                 | TIMESTAMP     | missing in definition|
Enter fullscreen mode Exit fullscreen mode

Rule of thumb. Every model that is ref()-ed from outside its group, or that has any non-dbt consumer (BI, reverse-ETL, share), should carry contract.enforced: true. The cost is a one-time YAML block; the saving is every "why did the dashboard explode?" incident you never have to write a postmortem for.

Worked example — tests catch a value bug, contracts catch an interface bug

Detailed explanation. A common confusion: "I already have a not_null test on this column — why do I also need a contract?" Tests run after the model loads and re-query the warehouse. They catch the column being NULL today. Contracts encode the promise that the column exists, has a name, has a type, and may have a not-null constraint — and they fail the build before the model materialises.

Question. A staging model stg_orders accidentally drops the order_id column in a refactor. Compare what happens with only dbt tests vs with a contract.

Input — the broken refactor.

-- BEFORE refactor (correct)
SELECT
    order_id,
    customer_id,
    amount
FROM {{ source('raw', 'orders') }};

-- AFTER refactor (accidentally drops order_id)
SELECT
    customer_id,
    amount
FROM {{ source('raw', 'orders') }};
Enter fullscreen mode Exit fullscreen mode

Code — tests-only YAML.

version: 2
models:
  - name: stg_orders
    columns:
      - name: order_id
        tests: [unique, not_null]
      - name: customer_id
        tests: [not_null]
Enter fullscreen mode Exit fullscreen mode

Code — tests + contract YAML.

version: 2
models:
  - name: stg_orders
    config:
      contract:
        enforced: true
    columns:
      - name: order_id
        data_type: bigint
        constraints: [{ type: not_null }, { type: unique }]
      - name: customer_id
        data_type: bigint
        constraints: [{ type: not_null }]
      - name: amount
        data_type: numeric
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Tests only. dbt build runs the broken SELECT. The model materialises successfully (it just has two columns now). Then dbt tries to test order_id — and gets a "column does not exist" error from the warehouse. The test "fails" but with a runtime database error, not a contract-style error. Worse: the table is already broken in the dev schema by the time the test runs.
  2. Tests + contract. dbt build compiles the model against the contract before running it. The contract declares three columns; the SELECT only projects two. The compile fails with a clear contract-violation message naming the missing column. Nothing materialises; nothing breaks.
  3. The contract catches the bug two phases earlier in the dbt graph (compile, not test) and emits a domain-specific error ("contract violation: missing column order_id") instead of a warehouse error.

Output.

Strategy Detected at Error type Side effects
Tests only After build, during test warehouse "column not found" broken table left in dev schema
Tests + contract At compile, before build dbt "contract violation" nothing materialised

Rule of thumb. Tests are still essential — they catch value drift (NULLs creeping in, a unique key suddenly duplicating). Contracts catch interface drift (columns disappearing, types changing). You want both. Think "belt + braces," not "either/or."

Worked example — contracts on incremental models

Detailed explanation. A common worry: "does contract.enforced work with incremental materialisation?" Yes, with one caveat: dbt enforces the contract on every full-refresh build and on the schema check at the start of every incremental run. The incremental delta INSERT must produce the contracted column set, or the run fails.

Question. Show the YAML for a contracted incremental fact model fct_orders and explain when contract enforcement runs.

Code.

version: 2
models:
  - name: fct_orders
    config:
      materialized: incremental
      unique_key: order_id
      on_schema_change: fail   # belt-and-braces: explicit schema-change policy
      contract:
        enforced: true
    columns:
      - name: order_id
        data_type: bigint
        constraints: [{ type: not_null }, { type: primary_key }]
      - name: customer_id
        data_type: bigint
        constraints:
          - type: not_null
          - type: foreign_key
            expression: "{{ ref('dim_customer') }} (customer_id)"
      - name: order_ts
        data_type: timestamp
        constraints: [{ type: not_null }]
      - name: amount
        data_type: numeric
        constraints: [{ type: not_null }]
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Full refresh. dbt build --full-refresh --select fct_orders runs the SELECT, validates the projected columns against the contract, then drops-and-recreates the table with the declared DDL (including constraints, where supported). The contract is checked once, decisively.
  2. Incremental run. dbt build --select fct_orders (no --full-refresh) inspects the existing target table and compares its column set to the contract. If they match, dbt runs the incremental delta SELECT, validates its projection against the contract, then INSERTs / MERGEs into the target.
  3. on_schema_change: fail is critical when contracts are on. Without it, dbt's default incremental behaviour might append a new column silently — which would still pass the contract check (the new column is in both the SELECT and the table) but would drift the contract's declared shape over time. Fail-on-change keeps the table strictly in sync with the YAML.

Output. A contracted incremental model behaves like a frozen interface from the consumer's perspective. The table at version N exposes exactly the columns in the contract, with exactly the declared types, on every load — and any drift in the SQL that would change that shape is caught before INSERT.

Rule of thumb. Set on_schema_change: fail whenever contract.enforced: true is on for an incremental model. The two flags compose to give you "the table never changes shape without a YAML edit" — which is exactly what your downstream consumers want.

dbt interview question on the contracts vs tests axis

A senior interviewer often probes: "Walk me through the difference between a dbt test and a dbt contract. Give me one scenario where a contract catches a bug that tests cannot, and one where a test catches a bug that contracts cannot."

Solution Using the contracts-tests matrix

# models/marts/customer/dim_customer.yml
version: 2
models:
  - name: dim_customer
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: bigint
        constraints: [{ type: not_null }, { type: primary_key }]
        tests: [unique, not_null]
      - name: email
        data_type: varchar
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "email like '%@%'"
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Bug scenario Tests-only outcome Contract-only outcome Both outcome
customer_id renamed to cust_id in SQL runtime warehouse error during test PR fails at compile PR fails at compile
customer_id type changed bigintstring tests still pass (values unique, non-null) PR fails at compile PR fails at compile
email column suddenly contains NULLs not_null test fails post-build contract still passes (column exists) not_null test fails post-build
email column missing the @ expression test fails post-build contract still passes expression test fails post-build

The matrix surfaces the orthogonality crisply: contracts catch shape changes (rename, type drift, missing column); tests catch value changes (NULL appearing where it shouldn't, a unique key duplicating, a format violation). Neither subsumes the other.

Output:

Bug class Catch with Catches before Detection cost
Renamed column contract model materialises low (compile-time)
Type drift contract model materialises low (compile-time)
NULL creeping in tests downstream consumer medium (post-build)
Format violation tests downstream consumer medium (post-build)

Why this works — concept by concept:

  • contract.enforced as a compile-time gate — runs before any DDL is issued. dbt compiles the SELECT, inspects the projected columns via the warehouse's metadata (or a dry-run plan), and diffs them against the YAML. Mismatches abort the build.
  • dbt tests as a post-build sentinel — run after the model materialises. They re-query the table and assert row-level facts. Cheap to write, but they catch issues after the broken table exists in dev / CI.
  • The two are orthogonal axes — contracts cover the columns-types-nullability axis, tests cover the values-and-relationships axis. Mature projects use both, with the contract as the first line of defence and the tests as the audit layer.
  • on_schema_change as the third leg — for incremental models, the contract pins the current shape; on_schema_change: fail ensures the shape cannot drift silently between contract edits. Without it, the table can grow extra columns invisibly.
  • Cost — contracts add O(columns) compile-time work per build (negligible); tests add one SELECT per test per build. Both are dominated by the actual model build time on any realistic dataset.

Data modeling
Topic — design
Design problems (data modeling)

Practice →


2. Anatomy of a dbt model contract

contract.enforced: true plus a filled-out columns block is the entire vocabulary — but every field has a precise job

The mental model in one line: a dbt contract is a YAML declaration that names every column, its data type, its constraints, and its description — and contract.enforced: true makes dbt verify the SELECT matches that declaration before the model is allowed to materialise. The block is small; the semantics are precise; the failure mode is "build aborts," not "warning printed and continues."

Exploded-view diagram of a dbt contract card — a parent rounded card labelled 'contract.enforced' with four child sub-cards floating around it labelled columns, data_type, constraints, description — each child has a tiny illustrative icon, on a light PipeCode card.

The five mandatory pieces.

  • config.contract.enforced: true — the master switch. Without it, the rest of the YAML is documentation. With it, dbt diffs the SELECT against the columns block at compile time and aborts on mismatch.
  • columns: - name: ... — every column the model projects must appear in the columns block, by name, in any order. Extra YAML columns not in the SELECT, or extra SELECT columns not in YAML, both fail the contract.
  • data_type: — the warehouse-canonical type (bigint, varchar, numeric(18,2), timestamp, boolean). dbt normalises common synonyms (int8bigint on Postgres) but it pays to use the exact word the warehouse echoes.
  • constraints: — a list of constraint declarations. Each has a type: (not_null, unique, primary_key, foreign_key, check) and optional fields (name:, expression:, columns: for composite, warn_unenforced: / warn_unsupported:).
  • description: — free-form prose; surfaced in dbt docs and the catalog. Not strictly enforced but is the single best place to document the semantic intent of the column for downstream consumers.

Compile-time vs run-time enforcement.

  • Compile-time (the default). dbt asks the warehouse to plan the SELECT without running it, inspects the projected columns from the plan metadata, and diffs them against the contract. Cheap and fast — milliseconds per model. Fails the PR in CI before any data moves.
  • Run-time. On warehouses that enforce constraints (Postgres, Redshift for some, Databricks Unity Catalog), the CREATE TABLE statement carries the constraints as actual DDL. Inserting a NULL into a not_null column raises a database error at write time. This is in addition to the compile-time contract check, not instead of it.
  • The handshake. Contracts give you the compile-time interface guarantee; constraints (on enforcing warehouses) give you the run-time value guarantee. They overlap on names like not_null but cover different failure modes.

A "shape" assertion, not a "value" assertion.

  • The contract checks that order_id is declared as bigint and the SELECT produces a bigint column called order_id. It does not check that any particular row's order_id is non-null.
  • Adding constraints: [{ type: not_null }] to the contract is the bridge — it asks dbt to also attempt warehouse-level enforcement of "no NULL values in this column." On Postgres that becomes a NOT NULL DDL clause. On Snowflake it becomes informational metadata (the warehouse does not enforce).
  • For the value-level audit you still want tests: [not_null] — that runs a SELECT COUNT(*) FROM t WHERE col IS NULL after the build and asserts zero.

Interactions with materialisation.

  • materialized: table — full DDL re-created on each build. Constraints are emitted as part of the CREATE TABLE. Contracts checked at compile.
  • materialized: view — view definition checked at compile. Constraints in the YAML are documentation only because most warehouses do not attach constraints to views.
  • materialized: incremental — full DDL on --full-refresh; incremental INSERT / MERGE on normal runs. Contracts checked on every run (compile-time). Combine with on_schema_change: fail.
  • materialized: ephemeral — no DDL; the model is inlined as a CTE in consumers. Contracts cannot apply (no projected table). dbt warns if you try.

Common interview probes on contract anatomy.

  • "What happens if the SELECT projects an extra column not in the contract?" — contract violation, build aborts.
  • "What happens if YAML declares an extra column not in the SELECT?" — same — contract violation.
  • "Is column order part of the contract?" — no. dbt diffs the set of columns, not the ordering.
  • "Does the contract validate types end-to-end?" — yes, but the matching is dialect-aware (int and bigint are not interchangeable; numeric and numeric(18,2) can differ depending on warehouse).

Worked example — turning an unmodelled dim_customer into a public contracted model

Detailed explanation. The team has been treating dim_customer as "internal" for a year. As of this quarter, the marketing-ops team wants to ref() it from a new mart, and reverse-ETL is going to sync it to HubSpot. That makes it public by every definition that matters. Time to ship a contract.

Question. Promote dim_customer.sql from an unmodelled table to a contracted, constrained, public-ready model. Show the YAML diff and explain each line.

Input — current YAML.

version: 2
models:
  - name: dim_customer
Enter fullscreen mode Exit fullscreen mode

Code — promoted YAML.

version: 2
models:
  - name: dim_customer
    description: >
      One row per customer. Source of truth for downstream marts, BI tiles,
      and reverse-ETL syncs to HubSpot. Schema is public — bump the version
      for any breaking change.
    config:
      materialized: table
      contract:
        enforced: true
      access: public
      group: customer
    columns:
      - name: customer_id
        data_type: bigint
        description: "Surrogate key; stable across all loads."
        constraints:
          - type: not_null
          - type: primary_key
        tests: [unique, not_null]
      - name: email
        data_type: varchar
        description: "Primary contact email; lowercased, trimmed."
        constraints:
          - type: not_null
          - type: unique
          - type: check
            expression: "email like '%@%.%'"
        tests:
          - not_null
          - unique
      - name: signup_at
        data_type: timestamp
        description: "First successful account creation (UTC)."
        constraints: [{ type: not_null }]
        tests: [not_null]
      - name: tier
        data_type: varchar
        description: "Loyalty tier  one of {bronze, silver, gold}."
        constraints:
          - type: check
            expression: "tier in ('bronze','silver','gold')"
        tests:
          - accepted_values:
              values: [bronze, silver, gold]
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. description: is now mandatory in spirit — it is the first thing a consumer reads in dbt docs. Keep it short, concrete, and oriented toward consumers, not authors.
  2. materialized: table makes the constraint DDL meaningful. On Postgres the table will be created with customer_id bigint PRIMARY KEY NOT NULL, email varchar UNIQUE NOT NULL, ....
  3. contract.enforced: true is the master switch. The first time you dbt build this model, the SELECT must already project exactly {customer_id, email, signup_at, tier} with matching types — otherwise the build fails.
  4. access: public and group: customer declare the visibility of the model. Combined with the contract, this is dbt's full "public API" pattern: a ref('dim_customer') from any other group will be allowed; from within the same group it is free. A private model can ignore most of this YAML.
  5. Each column has both contract constraints: and dbt tests:. The constraints are compile-time + DDL-time guards (the warehouse may enforce them); the tests are post-build value audits. The redundancy is the point — belt and braces.

Output. First build on Postgres emits:

CREATE TABLE analytics.dim_customer (
    customer_id bigint NOT NULL,
    email       varchar NOT NULL,
    signup_at   timestamp NOT NULL,
    tier        varchar,
    PRIMARY KEY (customer_id),
    UNIQUE (email),
    CHECK (email like '%@%.%'),
    CHECK (tier in ('bronze','silver','gold'))
);
Enter fullscreen mode Exit fullscreen mode

On Snowflake the same DDL is emitted but most constraints land as informational metadata (visible in INFORMATION_SCHEMA but not enforced on INSERT). dbt then runs the tests post-build and asserts the value-level facts.

Rule of thumb. When you promote a model to public, ship the whole anatomy in one PR: description, contract.enforced: true, access: public, group:, full columns: block with types + constraints + descriptions, and matching tests. Splitting it across multiple PRs is how teams end up with partially-contracted models that look safe in the catalog but skip half the enforcement.

Worked example — what dbt does to the warehouse on first build

Detailed explanation. Knowing the exact CREATE TABLE that dbt emits per warehouse is the difference between "I trust the contract" and "I checked what landed." Each warehouse translates the YAML differently, and the gaps are the source of most "I thought my FK was enforced" surprises.

Question. Given the contracted dim_customer from above, write out the literal CREATE TABLE statements dbt emits on Postgres, Snowflake, BigQuery, and Redshift.

Code — Postgres (full enforcement).

CREATE TABLE analytics.dim_customer (
    customer_id bigint NOT NULL,
    email       varchar NOT NULL,
    signup_at   timestamp NOT NULL,
    tier        varchar,
    CONSTRAINT dim_customer_pk PRIMARY KEY (customer_id),
    CONSTRAINT dim_customer_email_uk UNIQUE (email),
    CONSTRAINT dim_customer_email_chk CHECK (email like '%@%.%'),
    CONSTRAINT dim_customer_tier_chk  CHECK (tier in ('bronze','silver','gold'))
);
Enter fullscreen mode Exit fullscreen mode

Code — Snowflake (mostly informational).

CREATE OR REPLACE TABLE analytics.dim_customer (
    customer_id bigint NOT NULL,
    email       varchar NOT NULL,
    signup_at   timestamp_ntz NOT NULL,
    tier        varchar,
    CONSTRAINT dim_customer_pk PRIMARY KEY (customer_id) NOT ENFORCED,
    CONSTRAINT dim_customer_email_uk UNIQUE (email) NOT ENFORCED
);
-- CHECK and FK in Snowflake are not supported / informational; dbt logs a warning
Enter fullscreen mode Exit fullscreen mode

Code — BigQuery (only NOT NULL + primary-key metadata).

CREATE OR REPLACE TABLE `proj.analytics.dim_customer` (
    customer_id INT64 NOT NULL,
    email       STRING NOT NULL,
    signup_at   TIMESTAMP NOT NULL,
    tier        STRING,
    PRIMARY KEY (customer_id) NOT ENFORCED
);
-- UNIQUE / CHECK not supported as DDL; dbt logs a warning
Enter fullscreen mode Exit fullscreen mode

Code — Redshift (NOT NULL enforced, others informational).

CREATE TABLE analytics.dim_customer (
    customer_id bigint NOT NULL,
    email       varchar NOT NULL,
    signup_at   timestamp NOT NULL,
    tier        varchar,
    PRIMARY KEY (customer_id),   -- informational only
    UNIQUE (email)               -- informational only
);
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Postgres is the only of the four to enforce every declared constraint at the database level. Inserting a NULL into signup_at, a duplicate email, or an invalid tier value all raise an error at write time.
  2. Snowflake enforces NOT NULL and that is it. PRIMARY KEY and UNIQUE are declared as NOT ENFORCED for documentation / catalog / query-planner-hint purposes. CHECK and FOREIGN KEY are not supported as DDL at all — dbt logs a warning and drops them.
  3. BigQuery enforces NOT NULL. As of recent versions it supports PRIMARY KEY ... NOT ENFORCED and FOREIGN KEY ... NOT ENFORCED for query-planner hints only. UNIQUE and CHECK are not supported.
  4. Redshift enforces NOT NULL. PRIMARY KEY, UNIQUE, and FOREIGN KEY are accepted syntactically but are informational only (the optimizer may use them as hints; insertions are not blocked).
  5. The contract itself is a compile-time guarantee on all four — dbt diffs the SELECT against the YAML regardless of warehouse. The runtime enforcement is what differs.

Output.

Warehouse NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY CHECK
Postgres enforced enforced enforced enforced enforced
Snowflake enforced informational informational not supported not supported
BigQuery enforced not supported informational informational not supported
Redshift enforced informational informational informational not supported

Rule of thumb. Always pair contracted constraints with matching dbt tests. The constraint is the warehouse-side aspiration; the test is the actual audit. On enforcing warehouses (Postgres) you may consider the test redundant — but the moment your project becomes multi-warehouse, the tests are the only thing that keeps the behaviour identical.

Worked example — a contracted view (and why most teams use tables)

Detailed explanation. Contracts on materialized: view are compile-time only — the column projection of the view's SELECT is diffed against the YAML, but no DDL constraints are attached (views in most warehouses cannot carry constraints). This is sometimes a deal-breaker; more often it is the correct choice for cheap, derived models.

Question. Show a contracted view for vw_active_customers (filters dim_customer to non-deleted rows) and explain what the contract does and does not guarantee.

Code.

version: 2
models:
  - name: vw_active_customers
    config:
      materialized: view
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: bigint
        constraints: [{ type: not_null }]
      - name: email
        data_type: varchar
        constraints: [{ type: not_null }]
      - name: tier
        data_type: varchar
Enter fullscreen mode Exit fullscreen mode
-- models/marts/customer/vw_active_customers.sql
SELECT
    customer_id,
    email,
    tier
FROM {{ ref('dim_customer') }}
WHERE deleted_at IS NULL;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. dbt compiles the view, inspects the SELECT's projection, and diffs against the YAML — same as for a table.
  2. dbt then issues CREATE OR REPLACE VIEW analytics.vw_active_customers AS SELECT customer_id, email, tier FROM analytics.dim_customer WHERE deleted_at IS NULL;. No constraints attach.
  3. The contract guarantees: at compile time, the SELECT projects exactly {customer_id, email, tier} with matching types. After deploy, queries against the view always see those three columns with those types.
  4. The contract does not guarantee NULL-safety at the warehouse level. If dim_customer.email happens to contain a NULL row that passes the deleted_at IS NULL filter, the view will return it. The contract only documents the intent; you still need a test (tests: [not_null]) to audit value-level facts.

Output. The view materialises as a stable interface. Downstream consumers can rely on the column set; they cannot rely on the constraints being enforced at write time (because the view does not write — it reads from a base table). All value-level promises must come from tests on the base table or on the view itself.

Rule of thumb. Contracted views are great for "cheap, stable façades" — filter-only or projection-only models that wrap a public table. The moment you need actual constraint enforcement, switch to materialized: table. The cost is one storage copy; the benefit is real DDL guarantees.

dbt interview question on contract anatomy

A senior interviewer often probes: "I gave you a model that is ref()-ed by five downstream marts and a reverse-ETL sync. Walk me through the minimum YAML I should ship to make it contract-safe, and explain what each field defends against."

Solution Using the full public-model pattern

version: 2
models:
  - name: dim_product
    description: >
      One row per product. Public — every breaking schema change ships
      as a new version (v2, v3) with a 60-day deprecation window.
    config:
      materialized: table
      contract:
        enforced: true
      access: public
      group: product
    columns:
      - name: product_id
        data_type: bigint
        description: "Stable surrogate key."
        constraints: [{ type: not_null }, { type: primary_key }]
        tests: [unique, not_null]
      - name: sku
        data_type: varchar
        description: "Vendor SKU  uppercase, no whitespace."
        constraints: [{ type: not_null }, { type: unique }]
        tests: [unique, not_null]
      - name: category_id
        data_type: bigint
        description: "FK to dim_category.category_id."
        constraints:
          - type: not_null
          - type: foreign_key
            expression: "{{ ref('dim_category') }} (category_id)"
        tests:
          - not_null
          - relationships:
              to: ref('dim_category')
              field: category_id
      - name: price
        data_type: numeric(18,2)
        description: "List price in USD."
        constraints:
          - type: not_null
          - type: check
            expression: "price >= 0"
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "price >= 0"
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

YAML field What it defends against Catches at
contract.enforced: true renamed / removed / retyped columns in SQL compile
access: public + group: accidental ref() from outside the owning group on private models compile
data_type: on every column type drift (bigintint, numeric(18,2)numeric(38,18)) compile
not_null constraint NULL insertion (Postgres / Redshift / Snowflake / BigQuery) run-time
primary_key constraint duplicate keys (Postgres only); query-plan hint elsewhere run-time / planner
foreign_key constraint orphan rows (Postgres only); query-plan hint elsewhere run-time / planner
check constraint invalid values (Postgres only); informational elsewhere run-time
tests: block actual value drift in production after build post-build

Output:

Layer Guarantees Cost
Contract Column set, names, types Compile-time (ms per model)
Constraints (Postgres) NULL-safety, uniqueness, referential integrity, check DDL + insertion overhead
Constraints (Snowflake / BigQuery / Redshift) NULL-safety only; rest are catalog metadata Negligible
Tests Value-level audits One SELECT per test per build

Why this works — concept by concept:

  • contract.enforced as the interface lock — the YAML becomes the source of truth for "what columns does this model expose," and dbt fails any build that drifts from it. Consumers can refactor with confidence.
  • access: public + group — visibility metadata. Private models can be refactored freely within their group; public models are the ones that need versions when the shape changes. This is dbt's analog to "public API vs internal helper."
  • Constraints as the warehouse-side aspiration — the YAML declares the constraint; the warehouse may or may not enforce it. Either way, the declaration shows up in dbt docs and the catalog, making the intent discoverable.
  • Tests as the audit — every constraint should have a matching test, because the test runs identically on all warehouses. Tests are the dialect-independent way to guarantee value semantics.
  • Description as the consumer doc — surfaced in dbt docs and in IDE tooltips. Costs five seconds; saves the consumer from a Slack ping every time they want to know "is signup_at UTC or local?"
  • Cost — compile-time overhead is negligible (milliseconds per model). The biggest "cost" is the discipline to keep the YAML in sync with the SQL — which is exactly the discipline you want.

Data modeling
Topic — dimensional-modeling
Dimensional modeling problems (data modeling)

Practice →


3. Constraints — primary key, foreign key, not null, check

Five constraint kinds, five very different stories about whether the warehouse actually enforces them

The mental model in one line: dbt declares five constraint kinds in YAML; each warehouse picks a different subset to actually enforce at write time, and the rest live as informational metadata for the catalog and the query planner. Once you can name which constraints land as real DDL on your warehouse, the rest of the constraint conversation is about choosing where tests fill the gap.

Four-column comparison matrix listing the constraint kinds (not_null, unique, primary_key, foreign_key, check) along the rows and four warehouses (Postgres, Snowflake, BigQuery, Redshift) along the columns, with tick / informational / cross icons in each cell, on a light PipeCode card.

The five constraint kinds.

  • not_null — "no row of this column may be NULL." Every major warehouse enforces this at INSERT time. The cheapest, most universal, and most useful constraint.
  • unique — "no two rows share this value." Postgres enforces; Snowflake / BigQuery / Redshift declare informationally.
  • primary_key — "this column (or set) is the row identity." Implies not_null + unique. Postgres enforces both halves; the others treat as informational metadata that the query planner may consult.
  • foreign_key — "this column references a column in another table." Postgres enforces (subject to indexes); Snowflake / BigQuery declare informationally; Redshift declares informationally.
  • check — "this column satisfies a boolean expression." Postgres enforces. Snowflake / BigQuery / Redshift do not support CHECK as DDL — dbt logs a warning and skips.

Single-column vs composite constraints.

  • Single-column. Declare inside the column's constraints: list. Most natural for not_null / unique / primary_key.
  • Composite. Declare at the model level under model-level constraints:. Example: a composite primary key on (order_id, line_no). Each constraint declaration includes a columns: list naming the participating columns.

Informational vs enforced — the practical impact.

  • Enforced. The warehouse refuses INSERTs / MERGEs that would violate the constraint. Bugs surface at write time, often immediately, with a clear error from the database.
  • Informational. The constraint is recorded in the warehouse catalog but not checked at write time. The query planner may use it to rewrite joins (e.g. eliminate a DISTINCT when joining on a primary key). Bugs surface downstream, often hours or days later.
  • The practical rule. On informational warehouses (Snowflake / BigQuery / Redshift), the constraint is documentation + query-planner hint. You still need a matching dbt test to actually audit the values.

Constraint + test — belt and braces, not duplicated work.

  • The constraint tells the warehouse what the model promises. On enforcing warehouses, it is real. On informational ones, it is a hint.
  • The test tells dbt (and the CI / scheduler) to run a value-level audit after every build. It works identically on every warehouse and surfaces silent drift.
  • For mature projects, ship both. The constraint is the declaration of intent; the test is the verification.

Foreign-key gotchas in warehouses with no FK enforcement.

  • Snowflake allows FOREIGN KEY ... NOT ENFORCED syntactically (some versions). dbt emits the DDL where possible; otherwise warns and drops.
  • BigQuery supports FOREIGN KEY ... NOT ENFORCED for query-planner hints (since late 2023). The constraint is metadata only.
  • Redshift accepts FOREIGN KEY syntactically; the optimiser uses it as a join hint but does not enforce.
  • Postgres is the outlier — FKs are real, but they require an index on the referenced column (otherwise INSERT performance suffers).
  • The pragmatic FK pattern on non-Postgres warehouses. Declare the FK in YAML for documentation and catalog clarity, then add a dbt relationships test for the actual audit:
- name: customer_id
  constraints:
    - type: foreign_key
      expression: "{{ ref('dim_customer') }} (customer_id)"
  tests:
    - relationships:
        to: ref('dim_customer')
        field: customer_id
Enter fullscreen mode Exit fullscreen mode

Common interview probes on constraints.

  • "On Snowflake, does declaring a PRIMARY KEY actually prevent duplicates?" — no; it is informational. Add a unique test.
  • "What is the difference between primary_key and unique + not_null?" — semantically identical (PK = unique + not null); syntactically PK is one declaration, the catalog distinguishes them, and the query planner treats PK as "the canonical row identity."
  • "When would you skip declaring an FK?" — when the referenced table is enormous and the FK overhead would matter (rare in analytics warehouses; common in OLTP). In analytics, declare the FK informationally on every column that joins to a dimension.
  • "Why do constraints not duplicate tests?" — they cover different failure modes. Constraints prevent bad writes (where supported); tests audit existing data post-build. You need both.

Worked example — a contracted star-schema fact with FKs to dims

Detailed explanation. A fct_orders fact table joins to three dimensions: dim_customer, dim_product, dim_date. The contract declares an FK to each, plus a composite PK on (order_id, line_no) for the order-line grain, plus a check on quantity.

Question. Write the YAML for a contracted, constrained fct_orders model with three FKs and a composite PK.

Input — model SQL.

-- models/marts/sales/fct_orders.sql
SELECT
    order_id,
    line_no,
    customer_id,
    product_id,
    date_id,
    quantity,
    unit_price,
    quantity * unit_price AS line_amount
FROM {{ ref('int_orders') }};
Enter fullscreen mode Exit fullscreen mode

Code — YAML with composite PK and three FKs.

version: 2
models:
  - name: fct_orders
    description: "Fact table at the order-line grain."
    config:
      materialized: table
      contract:
        enforced: true
      access: public
      group: sales

    constraints:
      - type: primary_key
        columns: [order_id, line_no]
      - type: foreign_key
        columns: [customer_id]
        expression: "{{ ref('dim_customer') }} (customer_id)"
      - type: foreign_key
        columns: [product_id]
        expression: "{{ ref('dim_product') }} (product_id)"
      - type: foreign_key
        columns: [date_id]
        expression: "{{ ref('dim_date') }} (date_id)"

    columns:
      - name: order_id
        data_type: bigint
        constraints: [{ type: not_null }]
      - name: line_no
        data_type: integer
        constraints:
          - type: not_null
          - type: check
            expression: "line_no >= 1"
      - name: customer_id
        data_type: bigint
        constraints: [{ type: not_null }]
        tests:
          - relationships:
              to: ref('dim_customer')
              field: customer_id
      - name: product_id
        data_type: bigint
        constraints: [{ type: not_null }]
        tests:
          - relationships:
              to: ref('dim_product')
              field: product_id
      - name: date_id
        data_type: integer
        constraints: [{ type: not_null }]
      - name: quantity
        data_type: integer
        constraints:
          - type: not_null
          - type: check
            expression: "quantity > 0"
      - name: unit_price
        data_type: numeric(18,2)
        constraints:
          - type: not_null
          - type: check
            expression: "unit_price >= 0"
      - name: line_amount
        data_type: numeric(18,2)
        constraints: [{ type: not_null }]
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The composite primary_key is declared at the model levelconstraints: directly under the model, with a columns: list naming the two participating columns. Composite PKs cannot be declared inside a single column's block because no single column owns the constraint.
  2. The three foreign_key constraints are also declared at the model level (one per FK). Each names the local columns: and the expression: referencing the target table and column.
  3. Column-level constraints: carry not_null and check for each column. Note the check (quantity > 0) and check (unit_price >= 0) — these are quality guarantees that surface as DDL on Postgres and as informational hints elsewhere.
  4. The relationships tests on customer_id and product_id are the dbt-side audit that catches orphans on any warehouse, regardless of FK enforcement.
  5. On Postgres the DDL is fully enforced: any INSERT with a missing FK target, duplicate (order_id, line_no), or quantity <= 0 raises an error. On Snowflake / BigQuery / Redshift the constraints are informational; the not_null portion still enforces, but PK / FK / CHECK do not.

Output. A fct_orders table whose interface is locked: composite PK, three FKs, quantity-must-be-positive, unit-price-must-be-non-negative. Any drift in the SELECT fails the build at compile time; any orphan in the data fails the relationships test post-build.

Rule of thumb. Composite keys always live at the model level. Single-column constraints live inside the column block. Every FK should be paired with a relationships test (or it is a hint, not a guarantee, on most warehouses).

Worked example — a check constraint that catches a tier typo

Detailed explanation. The product team wants to lock the allowed values of tier to {bronze, silver, gold}. On Postgres a CHECK (tier in (...)) constraint will refuse the offending INSERT. On Snowflake the check is unsupported as DDL but the dbt accepted_values test still audits the same property.

Question. Show the YAML for the tier column with both a check constraint and an accepted_values test, and explain when each fires.

Code.

- name: tier
  data_type: varchar
  description: "Loyalty tier  one of {bronze, silver, gold}."
  constraints:
    - type: not_null
    - type: check
      expression: "tier in ('bronze','silver','gold')"
  tests:
    - not_null
    - accepted_values:
        values: [bronze, silver, gold]
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. On Postgres. The CREATE TABLE includes tier varchar NOT NULL, CONSTRAINT dim_customer_tier_chk CHECK (tier in ('bronze','silver','gold')). Any INSERT with tier = 'platinum' raises a database error and aborts the transaction.
  2. On Snowflake. The CHECK is not supported as DDL; dbt emits a warning ("CHECK constraint is not supported on Snowflake — skipping") and the constraint becomes documentation only.
  3. The accepted_values test. After build, dbt runs SELECT COUNT(*) FROM dim_customer WHERE tier NOT IN ('bronze','silver','gold') and asserts the count is zero. This works identically on every warehouse.
  4. The combined effect: Postgres catches the bad row at write time; Snowflake catches it post-build. Either way, the bad row never makes it to production — but the latency-to-detection differs by minutes (Postgres) vs the test phase (Snowflake).

Output. A tier column whose semantics are documented in YAML, enforced at write time on Postgres, audited post-build on every warehouse. The constraint and the test together cover every workflow.

Rule of thumb. Always pair check constraints with matching accepted_values or expression_is_true tests. The constraint is the warehouse-side aspiration; the test is the cross-warehouse guarantee.

Worked example — composite unique on a deduplicated staging model

Detailed explanation. A staging model stg_orders should have one row per (source_system, source_order_id). The single-column order_id is not unique on its own — different source systems can collide. A composite unique constraint expresses the actual identity.

Question. Write the YAML composite unique for (source_system, source_order_id) on stg_orders.

Code.

version: 2
models:
  - name: stg_orders
    config:
      materialized: table
      contract:
        enforced: true

    constraints:
      - type: unique
        columns: [source_system, source_order_id]

    columns:
      - name: source_system
        data_type: varchar
        constraints: [{ type: not_null }]
      - name: source_order_id
        data_type: varchar
        constraints: [{ type: not_null }]
      - name: order_id
        data_type: bigint
        description: "Surrogate key, not unique alone  see composite unique."
      - name: order_ts
        data_type: timestamp
        constraints: [{ type: not_null }]

    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: [source_system, source_order_id]
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The composite unique constraint is declared at model level with columns: [source_system, source_order_id]. On Postgres it becomes UNIQUE (source_system, source_order_id) — enforced.
  2. On Snowflake / BigQuery / Redshift the constraint is informational. The dbt_utils.unique_combination_of_columns test fills the audit gap — it runs a post-build SELECT that GROUPs by the combination and asserts every group has size 1.
  3. The order_id column carries a description that explains it is not unique alone — important for downstream consumers who might be tempted to JOIN on order_id alone.

Output. A staging model whose composite identity is declared, enforced where supported, and audited on every warehouse via the dbt-utils test. New consumers reading the YAML immediately see "the natural key is (source_system, source_order_id)."

Rule of thumb. When the natural key is composite, declare it as a composite unique (model-level) and always add a matching dbt_utils.unique_combination_of_columns test. The combination handles both "informational warehouse" and "value drift" failure modes.

dbt interview question on constraint enforcement

A senior interviewer often probes: "You are on Snowflake. Why does it matter that your contract declares primary_key and foreign_key constraints if Snowflake doesn't enforce them? Walk me through what value you get and what you still need."

Solution Using the constraint + test split

version: 2
models:
  - name: fct_orders
    config:
      materialized: incremental
      unique_key: [order_id, line_no]
      on_schema_change: fail
      contract:
        enforced: true
      access: public
      group: sales

    constraints:
      - type: primary_key
        columns: [order_id, line_no]
      - type: foreign_key
        columns: [customer_id]
        expression: "{{ ref('dim_customer') }} (customer_id)"

    columns:
      - name: order_id
        data_type: bigint
        constraints: [{ type: not_null }]
      - name: line_no
        data_type: integer
        constraints: [{ type: not_null }]
      - name: customer_id
        data_type: bigint
        constraints: [{ type: not_null }]
        tests:
          - relationships:
              to: ref('dim_customer')
              field: customer_id
      - name: amount
        data_type: numeric(18,2)
        constraints: [{ type: not_null }]

    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: [order_id, line_no]
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Bug class Snowflake DDL guards? dbt test guards? What you would lose without each
NULL order_id yes (NOT NULL is enforced) yes (column-level not_null is implied by PK declaration; explicit test optional) nothing extra needed
Duplicate (order_id, line_no) no — PK is informational yes (unique_combination_of_columns) the only line of defence on Snowflake
Orphan customer_id no — FK is informational yes (relationships) the only line of defence on Snowflake
Renamed customer_id column in SQL n/a — contract catches at compile n/a compile-time guarantee from contract.enforced
Type drift numericvarchar n/a — contract catches at compile n/a compile-time guarantee from contract.enforced

The declaration of primary_key and foreign_key in the YAML still buys you four things on Snowflake: (1) catalog metadata (visible in dbt docs, useful for downstream consumers); (2) query-planner hints (Snowflake's optimiser uses informational PKs / FKs to rewrite joins and skip DISTINCT operations); (3) contract-level type enforcement (the column types are pinned even if the constraint is informational); and (4) documentation of intent (the next engineer reading the YAML knows the model's identity story).

Output:

Warehouse NOT NULL UNIQUE / PK FK CHECK Tests audit gap
Postgres enforced enforced enforced enforced optional belt-and-braces
Snowflake enforced informational informational not supported mandatory
BigQuery enforced not supported informational not supported mandatory
Redshift enforced informational informational not supported mandatory

Why this works — concept by concept:

  • Constraints as catalog metadata — even when not enforced, the declarations appear in INFORMATION_SCHEMA, dbt docs, and the catalog. This is how lineage tools (Atlan, Castor, Stemma) discover the relationships and render the right diagrams.
  • Query-planner hints — Snowflake's optimiser will, for example, skip a DISTINCT pass when joining on a column declared unique/PK. Same on BigQuery for FK-driven join elimination. The constraint is "advisory" but has real performance impact.
  • Contract.enforced type pinning — independent of constraint enforcement. The contract diff at compile catches renames and type drift on every warehouse — that part is rock-solid regardless of constraint reality.
  • dbt tests as the cross-warehouse auditunique, not_null, relationships, accepted_values, and the dbt_utils.* family run identically on every warehouse. They are the portable enforcement layer.
  • The pairing as the actual production pattern — declare the constraint (for catalog + planner + contract), add the matching test (for audit). The intentional redundancy is what makes the project survive a warehouse migration.
  • Cost — constraints add zero DDL cost on informational warehouses; minimal DDL cost on Postgres (one index per UNIQUE/PK). Tests cost one SELECT per test per build — already part of any mature dbt CI.

Data modeling
Topic — cardinality
Cardinality problems (data modeling)

Practice →


4. Versioning strategy for public models

Versions are how dbt does SemVer — breaking changes get a new number, non-breaking stay on the same one

The mental model in one line: a dbt model version is a sibling model with the same logical name but a different shape, identified by a v= suffix; you publish v2 alongside v1, give v1 a deprecation_date, and let consumers migrate on their own schedule. Versions are the cleanest way to ship breaking changes without a war-room rollout.

Horizontal timeline showing version evolution of a dbt model with v1, v2, and v3 rounded badges, each tagged with major/minor/patch labels and small change icons (add column, rename column, doc-only), plus a deprecation_date marker on v1, on a light PipeCode card.

The versions: block.

  • Top-level declaration. Inside the model YAML, add a versions: list. Each entry declares a v: number and optional overrides (description, columns, contract, defined_in).
  • latest_version: — names the version that ref('model') (without a v= argument) resolves to. Consumers without a v= get the latest by default.
  • defined_in: — the SQL filename for that version. If absent, defaults to model_vN.sql. Useful when versions live in separate files for clarity.
  • deprecation_date: — a date after which the version should not be used. dbt emits warnings during compile if any consumer still references a deprecated version after the date.

SemVer for data — the three rules of thumb.

  • MAJOR (v2, v3). Breaking change — column removed, renamed, retyped to an incompatible type, semantics changed (e.g. "amount in USD" → "amount in customer's local currency"). Consumers must migrate.
  • MINOR. Non-breaking addition — new column added at the end, new constraint added (where consumers are not relying on its absence), new test. Stays on the same version number.
  • PATCH. Doc-only or comment change. Stays on the same version.

The breaking-vs-non-breaking heuristic.

  • Breaking. Anything a downstream SELECT * would notice as a removal or rename. Anything a downstream WHERE or JOIN predicate would silently drop rows over (e.g. nullability flip on a join key). Anything a downstream type cast would fail on (e.g. bigintstring).
  • Non-breaking. Adding a new column at the end (downstream SELECT * gets one extra column; downstream named-column queries are unaffected). Adding a new test. Adding a new constraint that the data already satisfies (it just becomes formally checked).
  • The grey zone. Tightening a constraint (e.g. relaxing not_null to nullable, or vice versa). Treat tightening as non-breaking if consumers are not relying on the relaxed state; treat relaxing as breaking because a previously non-null column becoming nullable can crash downstream type-narrowed code.

Cross-version refs.

  • ref('model') — resolves to latest_version.
  • ref('model', v=1) — resolves to the v1 incarnation. Lets consumers stay on the old version explicitly.
  • ref('model', v=2) — resolves to the v2 incarnation.
  • Naming. Physical tables get the suffix: dim_customer_v1, dim_customer_v2. dbt manages the suffixing automatically; consumers only see logical names.

Common interview probes on versioning.

  • "When would you bump a version vs just edit the model?" — bump when the change is breaking for a public consumer. Edit when it is private, or when the change is non-breaking (additive column, doc, test).
  • "What is deprecation_date for?" — to advertise the sunset of an older version. dbt warns on compile if consumers still reference it after that date.
  • "Can two versions of the same model run in the same dbt project?" — yes; they materialise to separate physical tables (suffixed _v1, _v2). Each can have its own contract, columns, and constraints.
  • "How do I roll back a version?" — keep v1 alive (do not remove it) until you've confirmed v2 has zero issues. Roll back by re-pointing latest_version to v1.

Worked example — shipping v2 of fct_orders with a renamed column

Detailed explanation. The team needs to rename order_amount (USD) to order_amount_usd for clarity, in preparation for adding order_amount_eur later. This is a breaking change for every consumer that already references order_amount. Time to ship v2.

Question. Show the YAML and SQL diff for promoting fct_orders from v1 to v2 with the renamed column. Set a 60-day deprecation_date on v1.

Input — current single-version YAML.

version: 2
models:
  - name: fct_orders
    config:
      contract: { enforced: true }
      access: public
    columns:
      - name: order_id
        data_type: bigint
      - name: customer_id
        data_type: bigint
      - name: order_amount
        data_type: numeric(18,2)
Enter fullscreen mode Exit fullscreen mode

Code — versioned YAML.

version: 2
models:
  - name: fct_orders
    config:
      access: public
    latest_version: 2
    versions:
      - v: 1
        config:
          contract: { enforced: true }
        deprecation_date: 2026-08-15  # 60 days from today
        columns:
          - name: order_id
            data_type: bigint
          - name: customer_id
            data_type: bigint
          - name: order_amount
            data_type: numeric(18,2)
            description: "Order total, USD. Renamed to order_amount_usd in v2."
      - v: 2
        config:
          contract: { enforced: true }
        columns:
          - name: order_id
            data_type: bigint
          - name: customer_id
            data_type: bigint
          - name: order_amount_usd
            data_type: numeric(18,2)
            description: "Order total in USD."
Enter fullscreen mode Exit fullscreen mode

Code — SQL files.

-- models/marts/sales/fct_orders_v1.sql (unchanged, kept alive)
SELECT
    order_id,
    customer_id,
    order_amount
FROM {{ ref('int_orders') }};
Enter fullscreen mode Exit fullscreen mode
-- models/marts/sales/fct_orders_v2.sql (new, the renamed column)
SELECT
    order_id,
    customer_id,
    order_amount AS order_amount_usd
FROM {{ ref('int_orders') }};
Enter fullscreen mode Exit fullscreen mode
-- A downstream consumer that wants to stay on v1 explicitly
SELECT customer_id, sum(order_amount) AS total
FROM {{ ref('fct_orders', v=1) }}
GROUP BY 1;

-- A downstream consumer on the latest version (v2)
SELECT customer_id, sum(order_amount_usd) AS total_usd
FROM {{ ref('fct_orders') }}    -- latest_version = 2
GROUP BY 1;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. latest_version: 2 makes ref('fct_orders') resolve to v2 — every new consumer gets the new shape by default. Existing consumers using ref('fct_orders', v=1) stay on v1.
  2. The versions: list declares both versions side-by-side. Each version has its own columns: block — v1 keeps order_amount, v2 has order_amount_usd.
  3. deprecation_date: 2026-08-15 on v1 tells dbt to start warning consumers 60 days from now. After the deprecation date, any compile that still references v1 emits a "this version is deprecated" warning (and can be configured to error).
  4. Two SQL files (fct_orders_v1.sql, fct_orders_v2.sql) materialise to two physical tables (fct_orders_v1, fct_orders_v2). Both load on every dbt run; storage cost is the only overhead.
  5. Consumers migrate at their own pace by changing ref('fct_orders', v=1)ref('fct_orders') (or v=2) and updating their column references.

Output. Two physical tables alongside each other. Consumers see the rename as a publish event (v2 is now available) rather than a break event (the column disappeared from under them). The 60-day window gives every team enough runway to plan the migration without a war room.

Rule of thumb. Every breaking change to a public model gets a version bump. Every rename is breaking. Every type narrowing is breaking. Every dropped column is breaking. If you are not sure, default to "ship a v2" — the storage cost of an overlap window is trivial compared to the social cost of a Monday incident.

Worked example — adding a column without a version bump

Detailed explanation. Adding a new column at the end of a model is non-breaking for every consumer that uses named columns. SELECT customer_id, amount FROM fct_orders continues to return the same two columns. SELECT * consumers get one extra column, but the existing ones are unchanged. No version bump needed.

Question. Show the YAML for adding currency to fct_orders without bumping the version.

Code — model-level edit.

- v: 2
  config:
    contract: { enforced: true }
  columns:
    - name: order_id
      data_type: bigint
    - name: customer_id
      data_type: bigint
    - name: order_amount_usd
      data_type: numeric(18,2)
    - name: currency                 # <- new column appended at end
      data_type: varchar
      constraints: [{ type: not_null }]
      description: "ISO 4217 currency code."
Enter fullscreen mode Exit fullscreen mode
-- SQL update
SELECT
    order_id,
    customer_id,
    order_amount AS order_amount_usd,
    coalesce(currency, 'USD') AS currency
FROM {{ ref('int_orders') }};
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The new currency column is appended at the end of the columns: block. The contract diff at compile sees one extra column in the SELECT — but it is also in the YAML, so the diff matches. The build succeeds.
  2. Existing consumers that wrote SELECT order_id, customer_id, order_amount_usd FROM fct_orders continue to work unchanged — they never named currency, so the new column does not affect them.
  3. New consumers can opt-in to currency simply by adding it to their SELECT.
  4. No version bump needed because nothing breaks for existing consumers. The semantic versioning rule is "minor change → same version" — this is the canonical minor change.
  5. The coalesce(currency, 'USD') AS currency backfills a default for any historical rows where currency was NULL — important because we declared not_null on the new column and the contract would fail otherwise.

Output. A table with one extra column. Existing dashboards, marts, and reverse-ETL syncs are unaffected. New consumers can immediately use the new column. The cost is one YAML edit + one SQL edit + one PR.

Rule of thumb. Append new columns; never insert them. Add columns; never rename them. Loosen constraints with care; tighten them freely (after verifying the data already satisfies the tighter form). These three rules turn 80% of schema evolutions into non-breaking changes that ship in a single PR with zero coordination.

Worked example — a doc-only patch with no contract change

Detailed explanation. A column's description is wrong. Updating it is a pure documentation change — no schema impact, no contract impact, no version impact.

Question. Show a patch that fixes a column description and explain why no version bump is needed.

Code.

- name: signup_at
  data_type: timestamp
  description: >
    Timestamp of first successful account creation, in UTC.
    Was previously documented as "local time" — that was wrong
    on every load. Corrected 2026-06-15.
  constraints: [{ type: not_null }]
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The patch only edits the description: field. No column rename, no type change, no constraint change.
  2. The contract diff at compile is unchanged — same column name, same type, same constraints.
  3. No consumer was reading description from the YAML at runtime, so no consumer breaks.
  4. The catalog (dbt docs) refreshes with the new description on next build. The lineage tools (Atlan, Castor) refresh on their next pull.
  5. No version bump because nothing about the interface changed. The semantic versioning rule is "patch → same version" — this is the canonical patch.

Output. Updated documentation, zero downstream impact. The cost is one PR with one YAML hunk.

Rule of thumb. Use description: for everything you wish you could write on the column. Future-you (and every consumer) will thank you. Treat description edits as a free PR — they need no version bump, no rollout coordination, no migration window.

Worked example — cross-version ref() from a downstream mart

Detailed explanation. A downstream mart agg_revenue_by_customer aggregates fct_orders. The mart owner wants to stay on v1 (with the old order_amount name) for one more quarter while their team plans the migration to v2.

Question. Show the SQL diff for the downstream mart to pin itself to fct_orders v1.

Code.

-- models/marts/sales/agg_revenue_by_customer.sql
SELECT
    customer_id,
    sum(order_amount) AS total_revenue
FROM {{ ref('fct_orders', v=1) }}
GROUP BY customer_id;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The ref('fct_orders', v=1) macro resolves to the physical table fct_orders_v1 — the v1 incarnation, with the old column name order_amount.
  2. The mart's SELECT uses order_amount (the v1 name). It compiles and runs against v1's contract, which still declares order_amount.
  3. When the mart team is ready, they change ref('fct_orders', v=1)ref('fct_orders') (or v=2) and rename order_amountorder_amount_usd in their SELECT. One PR per consumer.
  4. The producer team can drop v1 once every consumer has migrated and the deprecation_date has passed.

Output. The mart stays on v1 indefinitely (or until v1 is removed). The producer ships v2 in parallel. Consumers migrate at their pace.

Rule of thumb. Cross-version ref() is the migration safety net. It lets every team plan its own migration without coordinating on the producer's calendar. The cost is one extra argument in the macro; the benefit is "every team owns its own schedule."

dbt interview question on versioning a public model

A senior interviewer often probes: "Walk me through publishing v2 of a public model that renames a column. What's in the YAML, what's in the SQL, how do consumers stay on v1, and when can you remove v1?"

Solution Using the publish-overlap-deprecate pattern

version: 2
models:
  - name: dim_customer
    config:
      access: public
      group: customer
    latest_version: 2
    versions:
      - v: 1
        config:
          contract: { enforced: true }
        deprecation_date: 2026-09-15  # 90 days from publish
        columns:
          - { name: customer_id, data_type: bigint }
          - { name: signup_at,   data_type: timestamp }   # renamed in v2
          - { name: email,       data_type: varchar }
      - v: 2
        config:
          contract: { enforced: true }
        columns:
          - { name: customer_id,   data_type: bigint }
          - { name: signed_up_at,  data_type: timestamp }  # the renamed column
          - { name: email,         data_type: varchar }
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Date Producer action Consumer action Effect
t=0 Publish v2 alongside v1; set deprecation_date 90 days out Consumers continue on v1 by default until they migrate both physical tables alive
t=0..30 Comms to consumers: "v2 published, 90-day window" Forward-looking consumers migrate first dbt warns on ref('model', v=1) after deprecation_date
t=30..75 Track v1 consumers via dbt selectors + query logs Most consumers migrate; laggards get reminders v1 traffic shrinks
t=75..90 Final reminder; sunset PR drafted Last consumers migrate v1 traffic approaches zero
t=90 Merge sunset PR — remove v1 from YAML and SQL Any straggler ref('model', v=1) now fails to compile clean state, only v2 alive

Output:

Stage What exists Who is affected Cost
Publish v2 v1 + v2 both alive nobody (consumers still on v1) one PR for producer
Overlap window v1 + v2 both alive consumers migrate at own pace storage cost of v1
Deprecation warnings dbt compile warns on v=1 laggard consumers see warnings none
Sunset v1 only v2 alive nobody (everyone migrated) one PR removing v1

Why this works — concept by concept:

  • Publish overlap as the migration safety net — v1 and v2 coexist for the deprecation window. Consumers migrate when they are ready, not when the producer demands. Zero coordination meetings required.
  • SemVer for data — the bump-or-not decision is a type decision (breaking → bump; non-breaking → same version). Once the team internalises the rule, every PR self-classifies and no one argues.
  • deprecation_date as the social contract — the date is the producer's promise to keep v1 alive that long. It is the consumer's deadline to migrate. dbt's warning at compile is the gentle nag that prevents the deadline from slipping unnoticed.
  • Cross-version ref() — the migration mechanism. Consumers explicitly pin to v1 with v=1; new consumers default to latest_version. The mechanism is the minimum coupling: one argument per ref.
  • Sunset PR as the final cut — removing v1 is one YAML edit + one SQL file delete. Any straggler consumer gets a clean compile error pointing at the removed version, not a silent break.
  • Cost — storage cost of the duplicate table during the overlap window. On most warehouses this is negligible for analytics-scale dims and facts. Compute cost is also low: v1 only loads what was already loading before; v2 loads in parallel.

Data modeling
Topic — slowly-changing-data
Slowly-changing-data problems (data modeling)

Practice →


5. Rollout and deprecation playbook

Coordinating dbt + BI + reverse-ETL on a single timeline — the four-phase rollout that retires v1 without drama

The mental model in one line: the rollout playbook has four phases — Publish, Overlap, Migrate, Sunset — and every stakeholder (producer, consumer, platform) has a defined role inside each phase. Tie the phases to dates in the YAML (deprecation_date) and in your comms calendar, and the social cost of a breaking change drops to near zero.

Swimlane diagram of the rollout playbook — lanes labelled Producer, Consumer, and Platform; phases labelled Publish v2, Overlap window, Migrate, Sunset v1; tiny PR, Slack, and ticket icons marking each milestone, on a light PipeCode card.

The four-phase playbook.

  • Phase 1 — Publish. Producer ships v2 in a single PR. v1 stays alive. deprecation_date is set on v1 (typically 30–90 days out). Comms go out: announcement, migration guide, FAQ, office hours.
  • Phase 2 — Overlap. Both versions run on every dbt build. Consumers migrate on their own schedule. Producer tracks adoption via dbt selectors and query logs. Comms cadence: weekly reminder, fortnightly tracker.
  • Phase 3 — Migrate. As deprecation_date approaches, producer surfaces remaining v1 consumers, opens tickets per team, runs office hours for stragglers. dbt compile warnings start firing.
  • Phase 4 — Sunset. After deprecation_date passes (with confirmation that v1 traffic is zero), producer ships a PR removing v1's YAML, SQL, and (eventually) the physical table.

The overlap window — how long?

  • 30 days. Minimum for any non-trivial public model. Fine for internal teams with tight dbt slack channels.
  • 60 days. A reasonable default for most production analytics orgs. Covers a typical sprint cadence and a vacation overlap.
  • 90 days. For models used by many teams, by BI dashboards owned by non-engineers, or by external (partner-facing) consumers.
  • The pragmatic rule. Default to 60; bump to 90 if any consumer is non-technical or external; bump to 120 for regulated reporting where audit signoff is required.

Stakeholder comms template.

  • The announcement (day 0). Short Slack message + email: "We've published dim_customer_v2. v1 is deprecated as of today; sunset is YYYY-MM-DD (60 days). Migration guide: . Office hours: ."
  • The weekly reminder (day 7, 14, 21, ...). "v2 adoption: X/Y consumers migrated. Stragglers: . Office hours: ."
  • The pre-sunset warning (day -7). "Sunset in 7 days. Outstanding v1 consumers: . Please migrate or open a ticket for an extension."
  • The sunset PR (day 0 + window). "v1 removed. v2 is now the only version. Postmortem doc: ."

Tracking consumer migration.

  • dbt list --select +dim_customer_v1 — every model that downstream-references v1. The list shrinks as consumers migrate.
  • Query logs — warehouse query history filtered to dim_customer_v1 table name. Surfaces BI tools, reverse-ETL syncs, and ad-hoc consumers that dbt cannot see.
  • dbt exposures — declarative exposure: YAML blocks let you register BI dashboards, ML jobs, and external consumers as first-class graph nodes. dbt list --select +exposure:dim_customer_v1 then shows everything that depends on v1, including non-dbt artefacts.
  • The catalog / lineage tool — Atlan / Castor / Stemma surface upstream-downstream relationships including BI tiles. Often the most complete view.

Tying it all to CI.

  • PR CI. Run dbt build --defer --select state:modified+ on every PR — builds only the modified models (and downstream) against a baseline. Contracts and constraints catch interface changes at compile.
  • Slim CI. Use --defer against the prod state so the PR build doesn't need to rebuild every upstream model. Faster, cheaper, identical contract enforcement.
  • Block on contract violations. The contract failure is a build failure — make the PR check required for merge. No exceptions.
  • Deprecation warnings. Configure CI to fail (not just warn) when consumers reference a model past its deprecation_date. dbt 1.6+ has a --warn-error flag for this.

Coordinating with downstream BI and reverse-ETL.

  • Looker. Materialised LookML views referencing the dbt table by name need updating. Use a LookML view rename PR in the Looker repo when v2 is published.
  • Tableau. Live connections reference the table directly. Schedule a "Tableau update day" within the overlap window — extract → swap source → re-publish.
  • Hightouch / Census (reverse-ETL). Source models reference the dbt table by name. Update the source mapping when v2 is published.
  • Snowflake Share / BigQuery Authorised Views. External consumers see a view, not the underlying table. Re-create the share / authorised view against v2 during the overlap window so external consumers can migrate on their own schedule.

Postmortems for "contract broke prod" — what to add to the checklist.

  • Was the model marked contract.enforced: true? If not, why not.
  • Was the model marked access: public or group:? If not, why was it reachable from outside.
  • Was the change behind a version bump? If a breaking change shipped without a version, that is the primary root cause.
  • Did dbt CI catch it? If not, why — was state:modified+ not configured, was contract enforcement off in CI, was the test missing?
  • Did the comms go out? If not, why — and add to the playbook.
  • Was the rollback path documented? If not, add a "rollback PR" template.

Worked example — publishing v2 with a 60-day deprecation window

Detailed explanation. Walk through the producer's PR sequence for shipping v2 of dim_customer with a renamed column. Each PR is small and reviewable; the rollout is the sequence of PRs, not one giant change.

Question. Sketch the four PRs the producer ships during the rollout of dim_customer_v2.

Code — PR 1: Publish v2.

# models/marts/customer/dim_customer.yml — PR 1
version: 2
models:
  - name: dim_customer
    config: { access: public, group: customer }
    latest_version: 2
    versions:
      - v: 1
        deprecation_date: 2026-08-15
        config: { contract: { enforced: true } }
        columns:
          - { name: customer_id, data_type: bigint }
          - { name: signup_at,   data_type: timestamp }
          - { name: email,       data_type: varchar }
      - v: 2
        config: { contract: { enforced: true } }
        columns:
          - { name: customer_id,  data_type: bigint }
          - { name: signed_up_at, data_type: timestamp }
          - { name: email,        data_type: varchar }
Enter fullscreen mode Exit fullscreen mode

Code — PR 2: Update one consumer (agg_revenue_by_customer).

-- agg_revenue_by_customer.sql — PR 2
SELECT
    c.customer_id,
    c.signed_up_at,                 -- was: c.signup_at
    sum(o.amount) AS total
FROM {{ ref('dim_customer') }} c    -- now resolves to v2
LEFT JOIN {{ ref('fct_orders') }} o
       ON o.customer_id = c.customer_id
GROUP BY 1, 2;
Enter fullscreen mode Exit fullscreen mode

Code — PR 3: Track remaining v1 consumers.

# tracking script — PR 3 (CI cron job)
dbt list --select +dim_customer_v1 \
  --output name \
  > reports/v1_consumers.txt

# Plus warehouse query log scrape for BI tools
echo "Consumers still on dim_customer_v1:"
cat reports/v1_consumers.txt
Enter fullscreen mode Exit fullscreen mode

Code — PR 4: Sunset v1 after the deprecation date.

# models/marts/customer/dim_customer.yml — PR 4 (after 2026-08-15)
version: 2
models:
  - name: dim_customer
    config: { access: public, group: customer }
    latest_version: 2
    versions:
      - v: 2
        config: { contract: { enforced: true } }
        columns:
          - { name: customer_id,  data_type: bigint }
          - { name: signed_up_at, data_type: timestamp }
          - { name: email,        data_type: varchar }
# v1 block removed; dim_customer_v1.sql file deleted
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. PR 1 (day 0). Add v2, mark v1 deprecated. The PR is tiny: new YAML version block + new SQL file. CI verifies both versions contract-pass. Merged → both versions materialise on next build.
  2. PR 2 (days 1–60). Each consumer team migrates in its own PR. The mart that owns agg_revenue_by_customer updates its SELECT and re-points ref('dim_customer') to the latest version (which is now v2). No coordination with other teams.
  3. PR 3 (continuous). A CI job runs dbt list --select +dim_customer_v1 weekly and posts the shrinking list of remaining consumers to a Slack channel. Producer pings stragglers around day 45.
  4. PR 4 (day 60+). Once dim_customer_v1 has zero remaining consumers, the producer removes the v1 block from YAML, deletes dim_customer_v1.sql, and (eventually, after one more clean build) drops the physical table.

Output.

Phase Day Producer Consumer Platform
Publish 0 PR 1 merged both tables alive
Overlap 1–60 comms, tracking migrate at own pace shrinking v1 traffic
Sunset 60+ PR 4 merged only v2 alive

Rule of thumb. Every rollout is a sequence of small PRs, not one big PR. The producer ships PR 1 and PR 4; consumer teams ship PR 2 themselves; PR 3 is the visibility layer. The sequence is reproducible across every breaking change you ever ship.

Worked example — exposures as the BI/reverse-ETL visibility layer

Detailed explanation. dbt exposures: are declarative YAML blocks that register downstream consumers (BI dashboards, reverse-ETL syncs, ML jobs) as first-class nodes in the dbt graph. They are the bridge between dbt's compile-time visibility and the real world of "who actually uses this model."

Question. Show the YAML for an exposure: registering a Looker dashboard that depends on dim_customer, and explain how it surfaces during the rollout.

Code.

version: 2
exposures:
  - name: customer_360_dashboard
    type: dashboard
    maturity: high
    url: https://looker.internal/dashboards/customer-360
    description: "Marketing-ops Customer 360 dashboard."
    depends_on:
      - ref('dim_customer')           # latest_version
      - ref('fct_orders')
    owner:
      name: Marketing Analytics
      email: marketing-analytics@example.com
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The exposure: registers the dashboard as a downstream node. dbt list --select +dim_customer now includes exposure:customer_360_dashboard in the output.
  2. During the rollout, the producer runs dbt list --select +dim_customer_v1 and immediately sees if the dashboard is still on v1. The exposure makes the BI tile visible to dbt for the first time.
  3. The owner: block tells the producer who to message — automated comms can ping marketing-analytics@example.com directly.
  4. When the dashboard migrates to v2, the owner updates the exposure to ref('dim_customer', v=2) (or leaves it at ref('dim_customer') to follow latest_version). dbt re-runs the list and the dashboard drops off the v1 consumer roster.

Output. A dbt graph that includes BI dashboards as real nodes, with full ownership metadata. Rollouts can be coordinated end-to-end inside the dbt project — no separate spreadsheet of "what depends on what."

Rule of thumb. Register every important BI dashboard, reverse-ETL sync, and ML job as an exposure:. The five-minute cost per consumer pays back the first time you need to know "who am I about to break?" during a rollout.

Worked example — the contract-broke-prod postmortem template

Detailed explanation. When a contract breaks prod (rare but never zero), the postmortem is the artefact that drives the next playbook iteration. A reusable template keeps every postmortem comparable.

Question. Sketch the template structure for a "contract broke prod" postmortem.

Code — markdown template.

# Postmortem — dim_customer v1→v2 rollout incident

## Summary
[1-2 sentences: what broke, when, who noticed]

## Timeline
- t-7d  Publish v2 + 60-day deprecation_date on v1
- t-2d  Reminder ping in #data-platform
- t=0  v1 dropped (sunset PR merged)
- t+1h Looker tile X errors out; marketing-ops opens ticket
- t+2h Rollback PR re-introduces v1
- t+4h Resolution: tile migrated to v2 by hand; v1 dropped again

## Root cause
[Exact reason: e.g., exposure not registered for Looker tile X;
 weekly tracking script missed it; sunset PR proceeded with one
 unmigrated consumer]

## What worked
- contract.enforced caught two unrelated drift PRs during the overlap window
- Slack pings during weeks 4 and 6 surfaced 3 of 4 stragglers
- Rollback PR (re-add v1 block) restored service in ~30 minutes

## What didn't
- Looker tile X was not registered as an exposure
- Query-log scrape missed it because tile X uses an extract refreshed weekly

## Action items
- [ ] Register every Looker tile that reads marts/* as an exposure
- [ ] Extend rollout playbook with a "scrape extract schedules" step
- [ ] CI: fail (not warn) on compile when an unmigrated consumer references a deprecated version
- [ ] Update on-call runbook with "rollback PR" recipe
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Summary is the one-paragraph version a busy executive reads.
  2. Timeline documents the events with t-relative times — easy to copy into other tooling.
  3. Root cause names the specific gap (in this case: exposure not registered, weekly query-log scrape missed an extract).
  4. What worked is the positive section — never skip it. Every postmortem needs to celebrate what the system did catch.
  5. What didn't is the gap analysis. Be specific. "Comms were unclear" is not actionable; "Looker tile X was not registered as an exposure" is.
  6. Action items are the playbook updates. Each one feeds back into the rollout checklist for the next release.

Output. A postmortem that teaches the next engineer. The playbook gets one new step. The CI gets one new check. The incident never happens the same way twice.

Rule of thumb. Every "contract broke prod" incident, no matter how small, gets a postmortem with at least one action item. The action item updates the playbook. The playbook updates everyone's defaults. This is how the rollout discipline compounds over years.

dbt interview question on the rollout playbook

A senior interviewer often probes: "Walk me through a 60-day rollout for replacing dim_customer with a breaking-change v2. What happens on day 0, day 30, day 60. Who pings whom. When does CI start failing instead of warning."

Solution Using the four-phase rollout

# 60-day rollout — dim_customer v2

## Day 0 — Publish
- PR 1 merges: v2 alongside v1, contract.enforced on both, deprecation_date = day 60
- Comms: Slack announcement + email to data-platform-consumers@
- Migration guide: pinned in #data-platform
- Office hours: open every Friday for the next 8 weeks
- CI: contract enforcement on, deprecation warnings on (compile warning, no fail yet)

## Days 1-30 — Overlap (warning phase)
- Producer publishes weekly "v1 consumer count" Slack post
- Consumer teams migrate; each ships their own PR re-pointing `ref('dim_customer')`
- CI: continues to warn on `ref('dim_customer', v=1)` references
- Tracking: dbt list + warehouse query log + exposure metadata

## Days 31-60 — Migrate (escalation phase)
- Day 30: producer opens a JIRA ticket for each remaining v1 consumer team
- Day 45: producer pings each ticket owner directly
- Day 55: pre-sunset reminder Slack post + email
- Day 58: CI flip — `--warn-error` enabled for deprecation warnings; PRs that still reference v1 fail
- Day 60: deprecation_date reached

## Day 60+ — Sunset
- Confirm zero v1 traffic via query log for past 48 hours
- PR 4 merges: v1 YAML block removed; SQL file deleted
- After one clean dbt run, drop the physical v1 table
- Post-rollout note in #data-platform: "v1 sunset complete; v2 is now the only version"
- Postmortem only if anything went wrong; otherwise a brief retro

## Rollback paths
- During overlap: revert PR 1 (re-add v1 block if it was removed prematurely)
- After sunset: re-create v1 from the v2 SQL with a one-PR add-back if a critical consumer surfaces late
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Day Producer action Consumer state CI behaviour Risk if skipped
0 publish v2; deprecation_date set all on v1 contract pass; v=1 ref compiles cleanly rollout has no anchor date
7 weekly tracker post early adopters migrating unchanged no visibility into adoption pace
30 open per-team tickets ~50% migrated warn on v=1 ref stragglers never feel urgency
45 direct pings to laggards ~80% migrated warn last 20% slip past deadline
58 flip CI to fail on v=1 ref ~95% migrated fail on v=1 ref sunset breaks last consumers
60 sunset PR; remove v1 100% migrated only v2 references compile hard break if any consumer remains
60+ drop physical v1 table done done storage cost only

Output:

Day What exists in warehouse What CI does Risk profile
0 v1 + v2 alive warn on v=1 low — overlap covers everyone
30 v1 + v2 alive warn on v=1 low — half migrated
58 v1 + v2 alive fail on v=1 medium — forces last migrations
60 v1 + v2 alive fail on v=1 resolved — final cut
60+ only v2 alive normal clean steady state

Why this works — concept by concept:

  • Publish-overlap-migrate-sunset as four discrete phases — each phase has a clear start, a clear end, and a clear set of stakeholder actions. The producer is never "trying to figure out what to do next."
  • deprecation_date as the social contract — the date is fixed at publish time and visible in YAML. Everyone — producer, consumer, BI owner — sees the same deadline.
  • CI escalation from warn to fail — the gradual ratchet (warn for 58 days, fail for 2 days, sunset) gives consumers maximum runway with a final forcing function.
  • Per-team JIRA tickets at day 30 — turns the comms from "broadcast" to "directed." Each laggard team has an owner and a deadline.
  • Exposures as the BI visibility layer — without them, the query-log scrape is your only signal for non-dbt consumers. With them, every dashboard and reverse-ETL sync is a first-class graph node.
  • Postmortem only on incident — most rollouts are uneventful. Reserve the postmortem ritual for the times when something genuinely went wrong; otherwise a brief retro is enough.
  • Cost — producer time: ~4 hours over 60 days. Consumer time: ~30 min per team per migration. Storage cost: one duplicate table for 60 days. Compared to the cost of one broken-Monday incident, this is rounding error.

Data modeling
Topic — event-modeling
Event modeling problems (data modeling)

Practice →


Cheat sheet — dbt contract recipes

  • Mark a model public. Add config.contract.enforced: true, fill out the columns: block with name + data_type + constraints + description for every column, add config.access: public and config.group:. Ship as v: 1 in a versions: block from day one — saves a YAML refactor when you ship v: 2 later.
  • Add a column without breaking anyone. Append the new column at the end of the columns: block, ship the YAML + SQL in one PR, and do not bump the version. The change is non-breaking because no existing consumer named the new column.
  • Rename a column. Ship v: 2 alongside v: 1. Give v1 a 30–90 day deprecation_date. Update one consumer per PR. Drop v1 after the deprecation date and zero remaining traffic. Never edit v1 to rename in place.
  • Tighten a constraint (loose → strict). Verify the data already satisfies the strict form (run the test once against prod data). Edit the YAML to add not_null / check / unique. Ship as a non-breaking change if the data already satisfies it; otherwise bump the version because the change can fail consumers who insert NULLs.
  • Loosen a constraint (strict → loose). Treat as breaking. Removing not_null means downstream consumers that rely on the non-null contract may now crash. Ship as v: 2.
  • FK to a dim on Snowflake / BigQuery / Redshift. Declare the FK in the YAML (informational metadata + catalog + query-planner hint) and add a tests: relationships: test for the value-level audit. Belt and braces.
  • FK to a dim on Postgres. Declare the FK in YAML; index the referenced column for INSERT performance; add a tests: relationships: test as an audit layer. The DDL enforcement is real; the test is the cross-warehouse guarantee.
  • Composite primary key. Declare at the model level under constraints: with columns: [a, b]. Add a matching dbt_utils.unique_combination_of_columns test.
  • Check constraint. Add a check constraint with an expression: (e.g. "price >= 0"). Pair with a dbt_utils.expression_is_true or accepted_values test. The constraint enforces on Postgres; the test enforces on every warehouse.
  • Enforce at PR time. Configure CI to run dbt build --defer --select state:modified+ against the prod state. Make the contract-failure check required for merge. Use --warn-error to escalate deprecation warnings into failures.
  • Track who still consumes v1. Run dbt list --select +dim_customer_v1 --output name in a weekly CI cron. Scrape warehouse query logs for non-dbt consumers. Register every BI tile and reverse-ETL sync as a exposures: block.
  • Sunset v1 cleanly. Confirm zero traffic in the 48 hours before the cut. Ship a single PR that removes the v1 YAML block + deletes the v1 SQL file. Drop the physical table only after one clean build verifies nothing references it.
  • Roll back a breaking change. During the overlap window: revert the PR that removed v1 (re-add the YAML block and SQL file). After sunset: open a fresh PR that re-introduces v1 with the same shape. Both paths are quick because the v1 SQL is in git history.
  • Document intent in every column. Add description: to every column. Future-you (and every consumer) will thank you. Description edits are doc-only patches with no version bump.

Frequently asked questions

Are dbt constraints enforced by the warehouse?

It depends on the warehouse and the constraint. NOT NULL is enforced everywhere. PRIMARY KEY and UNIQUE are enforced on Postgres; informational on Snowflake, BigQuery, and Redshift. FOREIGN KEY is enforced on Postgres; informational or unsupported elsewhere. CHECK is enforced on Postgres; unsupported on Snowflake, BigQuery, and Redshift. The contract itself (contract.enforced: true) is enforced at compile time on every warehouse — it's a dbt-side check that the SQL projection matches the YAML declaration, independent of warehouse capabilities. Always pair informational constraints with matching dbt tests (unique, not_null, relationships, accepted_values) — the test is the cross-warehouse audit layer that catches the bugs the warehouse cannot.

Do I need contracts if I already have dbt tests?

Yes — they catch different bug classes. Tests catch value drift after the model materialises: a NULL appearing where it shouldn't, a unique key duplicating, a format violation. They run after the build and require the broken table to already exist in dev / CI. Contracts catch interface drift at compile time: a column renamed, removed, or retyped in the SQL. They run before anything materialises and abort the build immediately, with a domain-specific error message. The two are orthogonal axes — contracts on the columns/types/nullability axis, tests on the values/relationships axis. Mature projects use both: the contract is the first line of defence at PR time, the tests are the post-build audit layer.

When should I bump a model version?

Use SemVer-for-data as the rule. MAJOR (v2, v3): bump for any breaking change — column removed, renamed, retyped to an incompatible type, semantics changed (e.g. "amount in USD" → "amount in local currency"), nullability flipped from non-null to nullable on a column consumers JOIN on. MINOR: do not bump for non-breaking additions — a new column appended at the end, a new constraint that the data already satisfies, a new test. PATCH: do not bump for doc-only edits (descriptions, comments). The pragmatic heuristic: if any consumer's existing SELECT, WHERE, or JOIN could behave differently, bump the version. If consumers are unaffected, edit in place.

Can I have contracts on incremental models?

Yes — contract.enforced: true works with materialized: incremental. dbt validates the contract on every run: at compile (the SELECT must project the contracted columns) and at the schema check that starts every incremental run (the existing target table must match). Combine with on_schema_change: fail so dbt aborts instead of silently appending new columns on schema drift. On a --full-refresh build, dbt drops and recreates the table with the full DDL (including constraints, where supported). On a normal incremental run, dbt validates the schema check, runs the delta SELECT, validates its projection against the contract, then INSERTs / MERGEs. The contract is enforced at exactly the points where drift could leak in.

How do I get foreign keys in Snowflake or BigQuery?

You can declare them in the contract YAML (type: foreign_key with an expression: referencing the target table and column), but the warehouse will not enforce them at write time — Snowflake records them as informational metadata (visible in INFORMATION_SCHEMA, useful to the query planner), and BigQuery supports FOREIGN KEY ... NOT ENFORCED as a query-planner hint only. For actual value-level FK enforcement on those warehouses, pair the declared constraint with a tests: relationships: test. The test runs SELECT count(*) FROM child WHERE child.fk NOT IN (SELECT pk FROM parent) and asserts zero — exactly what an enforcing FK would block, but at audit time instead of write time. This is the standard "belt and braces" pattern: the constraint declares intent, the test verifies the data.

What's the difference between contracts and dbt-expectations?

dbt contracts are part of dbt-Core (since 1.5). They validate the shape of a model — column names, data types, constraint declarations — at compile time, and translate constraints to warehouse DDL where supported. They are the interface-locking layer. dbt-expectations is a community package (modelled on Python's great_expectations library) that ships a large catalog of value-level tests — distribution tests, statistical tests, regex tests, percent-NULL tests, etc. They run post-build like any dbt test and audit values. The two are complementary: contracts lock the shape; dbt-expectations enriches the value-level audit beyond the built-in unique / not_null / accepted_values / relationships. Mature projects use contracts on every public model and dbt-expectations on top of dbt tests wherever statistical or distribution checks add signal.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every contract recipe, constraint pattern, and rollout phase above ships with hands-on practice rooms where you design the YAML block, defend the version bump, and walk the four-phase deprecation playbook against real graded interview-style scenarios. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your `dim_customer` v2 plan actually survives contact with a Looker dashboard, a HubSpot sync, and a Snowflake share at the same time.

Practice data modeling now →
Dimensional modelling drills →

Top comments (0)