Borys Generalov

Posted on May 14 • Originally published at blog.bgener.nl

SQLMesh for dbt users: the migration path

#sqlmesh #dbt #duckdb

How to move from dbt to SQLMesh, understand the plan/apply cycle, leverage native column-level lineage, and run dev environments without schema suffixing.

What you get: an introduction to SQLMesh's core differences, the exact commands to run, how tests became audits, how dev environments work, and screenshots from a running demo.

What is dbt

Data Build Tool (dbt) is an open-source framework that allows analysts and engineers to transform data in their warehouses by writing SQL SELECT statements enriched with Jinja templating. This special syntax adds dynamic logic—like loops, variables, and dependency referencing—directly into your SQL. Instead of manually writing CREATE TABLE statements, dbt takes your templated queries and automatically materializes them in the database for you. When a data platform scales, nested Jinja templates become impossible to debug, and copying production data to build staging environments gets expensive fast.

What is SQLMesh

This is where SQLMesh comes in. As a modern framework challenging the status quo for the transformation (the 'T') phase of ETL/ELT pipelines, SQLMesh is designed specifically to address these growing pains. It provides a more robust, scalable way to manage data transformations by bringing strict software engineering practices—like stateful dry runs and zero-copy environments—directly into the data warehouse. Because it solves the exact operational bottlenecks that slow down mature data teams, it has rapidly grown in popularity, ultimately leading to its contribution to the Linux Foundation in March 2026.

Under the Hood: Text Templating vs. AST Parsing

The fundamental difference between dbt and SQLMesh comes down to how they read your code.

dbt is a text templater. It uses Jinja to stitch strings of SQL together and sends the final block to the data warehouse to execute. It doesn't semantically understand the SQL it generates.

SQLMesh is a semantic parser. It reads your code and builds an Abstract Syntax Tree (AST) using its underlying engine, SQLGlot. Because SQLMesh understands your SQL before it hits the warehouse, it knows exactly which columns are being selected, aliased, or joined at compile time.

Because SQLMesh actually reads your code, it does three things that dbt cannot do natively:

A Terraform-like plan/apply cycle that guarantees you know the exact impact before spending warehouse compute.
Column-level lineage natively, without requiring expensive third-party data catalogs.
Virtual Data Environments that allow developers to test changes in isolation without physically copying production data.

The plan/apply cycle: seeing the impact

In dbt, you run dbt run and wait to see what breaks. In SQLMesh, you use a stateful workflow inspired by Terraform: plan and apply.

sqlmesh plan

This command evaluates your local SQL files against the current state of the database and generates an execution plan. It outputs exactly what will be built, whether it's a full refresh or incremental, and asks for confirmation:

======================================================================
Plan: prod
======================================================================
New environment `prod` will be created from `empty`

Added Models:
├── jaffle_shop.customers    (Full Refresh)
└── jaffle_shop.orders (Full Refresh)

Apply - Create prod environment and backfill models [y/n]:

When you add a column to stg_payments.sql, the next sqlmesh plan highlights the real dependency chain before any warehouse compute is consumed. In this demo, orders reads from stg_payments, and customers reads from orders, so both appear as indirectly modified:

You review the impact, then type y to apply. You stop paying for blind warehouse queries just to see if your code works.

The plan and apply execution from the running demo looks like this:

Column-level lineage out of the box

Because SQLMesh uses SQLGlot to parse your SQL, it natively understands how data flows from source to destination at the column level. It traces dependencies through complex aliases, window functions, and joins without you ever writing a YAML definition.

SQLMesh does have a dbt-docs-style DAG view, but it is exposed through the data catalog and lineage graph instead of a separate static docs site. The useful difference is that the graph is tied to the selected SQLMesh environment and can show model fields, not only model boxes.

The lineage view for the demo model looks like this. The point is not that SQLMesh has a DAG. dbt already has that. The useful part is column visibility: you can inspect which upstream columns feed a downstream model, which is what makes impact analysis and PII tracing practical.

In this small demo, the graph shows seed_model feeding incremental_model, then full_model, with the columns visible on each node.

Use Case 1: Safe Deprecation of Columns

Say you need to drop a deprecated user_phone column from the production database.

In a dbt setup, understanding the full impact is problematic. You would have to do a global text search for user_phone across hundreds of SQL files. Even then, if the column was aliased (SELECT user_phone AS phone_number), a simple text search might miss downstream models that rely on phone_number.

SQLMesh solves this instantly. By parsing the AST, it tracks the column through every alias and CTE. You simply open the built-in UI editor and click the column name.

It visually traces and highlights exactly which downstream reporting tables or BI dashboards will break if you drop the source column. You can confidently deprecate columns without causing massive data outages.

Use Case 2: Tracing PII Data for Compliance

Data privacy regulations (like GDPR or CCPA) require strict tracking of Personally Identifiable Information (PII). If you mistakenly join a table containing an email_address into a public-facing metrics aggregate, it could result in a massive compliance violation.

With SQLMesh, you can visually trace the flow of sensitive data using the built-in UI:

sqlmesh ui

This spins up a local web server displaying an interactive lineage graph. For a sensitive field such as email, the lineage should make the path easy to inspect:

`raw_customers.email` (PII) -> `stg_customers.email` -> `customers.email`

Because the lineage is native, data governance teams can instantly verify that PII is masked or excluded before it reaches downstream aggregates.

Dev environments without schema suffixing (Virtual Environments)

This is how SQLMesh directly cuts warehouse compute costs.

In dbt, a development environment is physical. It's a target schema (e.g., analytics_dev_yourname). When you build your models to test a change, you physically copy or rebuild the data into your dev schema.

In SQLMesh, a dev environment is a Virtual Data Environment.

When you run sqlmesh plan dev, SQLMesh doesn't copy data. Instead, it creates lightweight, virtualized database views that simply point to the existing physical tables from prod.

The SQLMesh UI keeps those environments visible in the same toolbar used by the editor, plan view, and data catalog. In this demo, prod and dev both exist, and prod is marked as the production environment.

The model is simple:

prod.customers can point to customers_v1, while prod.orders points to orders_v1.
dev.customers can point to a new customers_v2, while dev.orders still points to the existing orders_v1.

Only the changed model needs new physical storage. Everything unchanged can keep pointing at the already-built production tables.

Use Case 1: Multi-Developer Collaboration (Dev)

In a fast-moving data team, multiple developers are working on different features simultaneously. Alice is updating the customers model, and Bob is updating the orders model.

In traditional setups, Alice and Bob either step on each other's toes in a shared staging schema, or they both have to spend 20 minutes copying gigabytes of data into alice_dev and bob_dev schemas before they can start working.

With SQLMesh, Alice simply creates her own virtual environment (sqlmesh plan alice_feature). SQLMesh builds only her modified customers table physically, while her orders view points to the production physical table. Bob does the same for his feature. They get perfect isolation instantly, with zero duplicated data and zero wasted compute cost.

Use Case 2: Instant Rollbacks and Blue/Green Deployments (Prod)

Deploying pipeline changes to production is risky. A flawed query can corrupt downstream tables and break executive BI dashboards.

SQLMesh handles production deployments using Blue/Green deployments natively via virtual environments.

When you merge your code to main and deploy, SQLMesh builds the new physical tables in the background (the "Green" state). Your prod environment views still point to the old tables (the "Blue" state).

Once the new tables are fully built, populated, and automatically audited for quality, SQLMesh simply updates the prod views to point to the new physical tables. This pointer swap takes milliseconds. If a bug is discovered after deployment, you can instantly rollback by swapping the view pointers back to the previous physical tables. No data needs to be rebuilt, providing a massive safety net for data teams.

Upgrading a Model: dbt vs SQLMesh

Here is what this looks like in practice. We need to build a customers model that cleans up user data from jaffle_shop.stg_customers and joins it with jaffle_shop.orders to calculate lifetime value.

In dbt, you first write the SQL template (models/customers.sql):

{{ config(
    materialized='table'
) }}

WITH customer_orders AS (
  SELECT
    customer_id,
    MIN(order_date) AS first_order_date,
    MAX(order_date) AS most_recent_order_date,
    COUNT(*) AS number_of_orders,
    SUM(amount) AS customer_lifetime_value
  FROM {{ ref('orders') }}
  WHERE status <> 'returned'
  GROUP BY customer_id
)

SELECT
  c.customer_id,
  c.customer_name,
  c.email,
  c.signup_date,
  o.first_order_date,
  o.most_recent_order_date,
  COALESCE(o.number_of_orders, 0) AS number_of_orders,
  COALESCE(o.customer_lifetime_value, 0) AS customer_lifetime_value
FROM {{ ref('stg_customers') }} AS c
LEFT JOIN customer_orders AS o
  ON c.customer_id = o.customer_id

But you're not done. In dbt, your SQL file only contains the logic. To validate that the output data is actually correct—for example, to test that your customer_id is unique and never null—you have to leave your SQL file, open a completely separate YAML configuration file (models/schema.yml), and write your validation tests there:

version: 2
models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null

The dbt docs graph for this looks like this:

When you migrate this same model to SQLMesh, the scattered configuration disappears. Everything—metadata, dependencies, materialization logic, and testing—consolidates into the SQL file itself via the MODEL block.

MODEL (
  name jaffle_shop.customers,
  kind FULL,
  grain customer_id
);

WITH customer_orders AS (
  SELECT
    customer_id,
    MIN(order_date) AS first_order_date,
    MAX(order_date) AS most_recent_order_date,
    COUNT(*) AS number_of_orders,
    SUM(amount) AS customer_lifetime_value
  FROM jaffle_shop.orders
  WHERE status <> 'returned'
  GROUP BY customer_id
)

SELECT
  c.customer_id::INT AS customer_id,
  c.customer_name::TEXT AS customer_name,
  c.email::TEXT AS email,
  c.signup_date::DATE AS signup_date,
  o.first_order_date::DATE AS first_order_date,
  o.most_recent_order_date::DATE AS most_recent_order_date,
  COALESCE(o.number_of_orders, 0)::INT AS number_of_orders,
  COALESCE(o.customer_lifetime_value, 0)::DOUBLE AS customer_lifetime_value
FROM jaffle_shop.stg_customers AS c
LEFT JOIN customer_orders AS o
  ON c.customer_id = o.customer_id

Notice what happened here:

No Jinja Refs: You simply query jaffle_shop.stg_customers and jaffle_shop.orders using standard SQL. Because SQLMesh parses the AST, it automatically detects the dependencies for both tables.
No YAML Tests: The grain customer_id property inside the MODEL block automatically declares the natural key. You do not need to write explicit tests. When you want to validate the data, you simply run sqlmesh audit in the terminal, and SQLMesh will automatically generate and execute the uniqueness and not-null validation checks for you based on that grain.

The generated DAG for this model in SQLMesh looks like this:

This is the closest match to the familiar dbt docs graph: a model dependency DAG generated from parsed SQL. The browser UI goes one step further by letting you inspect the same dependency chain with environment context and column metadata.

If you want to try this yourself locally (using DuckDB, so no warehouse account needed), installation is simple:

pip install "sqlmesh[duckdb]"
mkdir sqlmesh-demo && cd sqlmesh-demo
sqlmesh init -t empty

Automating this with CI/CD (GitHub Actions)

The plan/apply cycle and virtual environments are impressive locally, but they become a superpower when integrated into your CI/CD pipeline.

SQLMesh provides a native GitHub Action (TobikoData/sqlmesh-action) that automates this entire workflow:

The PR Plan: When a developer opens a Pull Request, the GitHub Action automatically creates a temporary Virtual Environment (e.g., pr_123) and runs a plan.
Automated Review: The bot posts the exact execution plan as a comment directly on the PR. The reviewer instantly sees which models are modified and which downstream models will be impacted, without having to run anything locally.
Zero-Copy Deployment: Upon merging to main, the deployment job doesn't need to rebuild the data. It simply swaps the pointers in the prod environment to reference the physical tables that were already built and tested during the PR.

This guarantees that main is always in a deployable state and developers never accidentally merge breaking schema changes into production.

Final thoughts

Moving from text-based templating to semantic AST parsing is the structural shift required to solve data environment bloat and lineage blindness.

To evaluate this transition yourself:

Audit your current warehouse spend strictly related to developer sandbox schemas.
Run sqlmesh init --template dbt on a local branch of your existing dbt project.
Migrate 5 connected models to the MODEL block syntax.
Execute sqlmesh plan dev to observe the virtual pointer creation in your warehouse.

When the tool understands the code, the infrastructure manages itself.

Q: Do I need to rewrite all my dbt SQL immediately?
No, SQLMesh can parse and run your existing Jinja-based dbt models directly using its dbt adapter.

Q: Does SQLMesh require DuckDB?
No, DuckDB is just used here for a fast local demo; SQLMesh natively supports Snowflake, BigQuery, Databricks, and Redshift.

Q: How does SQLMesh handle dbt macros?
It executes existing dbt macros perfectly during the transition, though rewriting them as native Python macros eventually unlocks deeper AST-level validation.

Q: Is this relevant if my team only has three data models?
The immediate value for small teams is the automated testing and plan visibility, but the cost savings of virtual environments truly compound at scale.

DEV Community