Ship consistent, DRY models by generating column lists with Jinja, macros, and metadata.
Who this is for
Analytics engineers and data engineers who want to avoid copy‑pasting column lists across models and instead generate them from templates with dbt + Jinja.
What you’ll build
You’ll create a small dbt project that:
- Centralizes column definitions in macros (with optional aliases/prefixes)
- Generates SELECT lists from YAML metadata and adapter‑discovered schemas
- Applies policy‑style transforms (e.g., lowercase emails) consistently
- Supports environment‑specific or source‑specific column sets
- Automatically tests and documents the templated columns
You can copy/paste each step. By the end, you’ll have a reusable pattern you can drop into any dbt project.
Prerequisites
- dbt (Core or Cloud) installed
- A warehouse connection configured (
profiles.yml
) - Basic awareness of Jinja syntax
Project structure
.
├─ dbt_project.yml
├─ models/
│ ├─ marts/
│ │ └─ customers.sql
│ ├─ staging/
│ │ └─ stg_customers.sql
│ └─ schema.yml
└─ macros/
├─ columns/
│ ├─ select_common_columns.sql
│ ├─ policy_columns.sql
│ ├─ yaml_columns.sql
│ └─ discover_columns.sql
└─ utilities.sql
Create the macros/columns
folder; we’ll populate it as we go.
Step 1 — The simplest win: variables → columns
Put repeatable column names in dbt_project.yml
as vars.
dbt_project.yml
name: my_project
version: 1.0.0
config-version: 2
vars:
common_columns:
- id
- created_at
- updated_at
models/staging/stg_customers.sql
select
{{ ', '.join(var('common_columns')) }},
email,
first_name,
last_name
from {{ source('app','customers') }}
When to use: lightweight reuse without custom logic.
Step 2 — Macro‑driven SELECT lists (with prefixes/aliases)
Centralize the column list in a macro so you can pass a table alias and keep your SQL tidy.
macros/columns/select_common_columns.sql
{% macro select_common_columns(alias='') %}
{{ alias }}id,
{{ alias }}created_at,
{{ alias }}updated_at
{% endmacro %}
models/marts/customers.sql
with base as (
select * from {{ ref('stg_customers') }}
)
select
{{ select_common_columns('b.') }},
b.email,
b.first_name,
b.last_name
from base as b
Why it’s great: consistent, alias‑safe, easy to extend.
Step 3 — Policy columns: apply consistent transforms
Wrap transforms (PII handling, normalisation) in a macro and reuse them everywhere.
macros/columns/policy_columns.sql
{% macro customer_policy_columns(alias='') %}
{{ alias }}id,
lower({{ alias }}email) as email,
{{ alias }}first_name,
{{ alias }}last_name,
{{ alias }}created_at,
{{ alias }}updated_at
{% endmacro %}
models/marts/customers.sql
select
{{ customer_policy_columns('b.') }}
from {{ ref('stg_customers') }} as b
Pattern: treat a column template like a “policy” you can apply repeatedly.
Step 4 — YAML‑driven column generation
Keep column specs as metadata in schema.yml
and generate the SELECT list from it.
models/schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: id
- name: email
tests: [not_null]
meta:
transform: "lower({col})"
- name: first_name
- name: last_name
- name: created_at
- name: updated_at
macros/columns/yaml_columns.sql
{#
Generate a SELECT list from model YAML metadata.
- Reads columns defined under the *upstream* model name provided.
- Supports a per-column meta.transform, where `{col}` is replaced by the column name.
#}
{% macro yaml_columns(model_name, alias='') %}
{% set model = graph.nodes[model_name] %}
{% if not model %}
{# Fail early if the name is wrong #}
{% do exceptions.raise_compiler_error('yaml_columns: model ' ~ model_name ~ ' not found in graph') %}
{% endif %}
{% set rendered = [] %}
{% for c in model.columns.values() %}
{% set transform = c.meta.get('transform') if c.meta else none %}
{% if transform %}
{% set expr = transform.replace('{col}', alias ~ c.name) ~ ' as ' ~ c.name %}
{% else %}
{% set expr = alias ~ c.name %}
{% endif %}
{% do rendered.append(expr) %}
{% endfor %}
{{ rendered | join(',\n ') }}
{% endmacro %}
Usage
select
{{ yaml_columns('model.stg_customers', 's.') }}
from {{ ref('stg_customers') }} as s
Notes
- The
graph
object is available at compile time and includes YAML column metadata. - Use the fully qualified name:
model.<node_name>
for models andsource.<source_name>.<table_name>
for sources.
Step 5 — Discover columns dynamically from the warehouse
Sometimes you need to reflect the actual schema at compile time.
macros/columns/discover_columns.sql
{#
Gets column names from a relation at compile time and optionally maps/filters them.
#}
{% macro discovered_columns_from(relation, include=[], exclude=[], alias='') %}
{% set cols = adapter.get_columns_in_relation(relation) %}
{% set names = cols | map(attribute='name') | list %}
{# Apply include/exclude if provided #}
{% if include and include | length > 0 %}
{% set names = names | select('in', include) | list %}
{% endif %}
{% if exclude and exclude | length > 0 %}
{% set names = names | reject('in', exclude) | list %}
{% endif %}
{{ names | map('lower') | map('string') | map('trim') | map('regex_replace', '^', alias) | join(',\n ') }}
{% endmacro %}
Usage
{% set rel = ref('stg_customers') %}
select
{{ discovered_columns_from(rel, exclude=['_load_dt'], alias='c.') }}
from {{ rel }} as c
When to use: quickly mirror upstream schema, or build pass‑through layers.
⚠️ Caveat:
adapter.get_columns_in_relation
runs at compile time; if the relation doesn’t exist yet (e.g., first run), create it once or fall back to a known list.
Step 6 — Column mapping & renaming at scale
Create a mapping to rename raw columns to canonical names, with optional transforms.
macros/utilities.sql
{% macro render_mapping(mapping, alias='') %}
{# mapping: list of dicts: {from: "EMAIL_ADDRESS", to: "email", transform: "lower({col})"} #}
{% set parts = [] %}
{% for m in mapping %}
{% set expr = (m.get('transform') or '{col}').replace('{col}', alias ~ m['from']) %}
{% do parts.append(expr ~ ' as ' ~ m['to']) %}
{% endfor %}
{{ parts | join(',\n ') }}
{% endmacro %}
models/staging/stg_customers.sql
{% set mapping = [
{"from": "CUSTOMER_ID", "to": "id"},
{"from": "EMAIL_ADDRESS", "to": "email", "transform": "lower({col})"},
{"from": "FIRST_NAME", "to": "first_name"},
{"from": "LAST_NAME", "to": "last_name"},
{"from": "CREATED_TS", "to": "created_at"},
{"from": "UPDATED_TS", "to": "updated_at"}
] %}
select
{{ render_mapping(mapping, 'r.') }}
from {{ source('app','raw_customers') }} as r
Benefit: auditably documents how raw columns map to curated names.
Step 7 — Environment / source specific templates
Use vars or target.name
to toggle column sets across environments.
macros/columns/select_common_columns.sql (extended)
{% macro select_common_columns(alias='') %}
{% if target.name in ['prod', 'staging'] %}
{{ alias }}id,
{{ alias }}created_at,
{{ alias }}updated_at,
{{ alias }}ingested_at
{% else %}
{{ alias }}id,
{{ alias }}created_at,
{{ alias }}updated_at
{% endif %}
{% endmacro %}
Tip: For source‑specific logic, branch on this.name
/ this.schema
or pass a flag into the macro.
Step 8 — Tests & docs that follow your templates
When you template columns, also template the tests and descriptions.
models/schema.yml
version: 2
models:
- name: customers
description: Curated customers with standardized email and timestamps.
columns:
- name: id
tests: [unique, not_null]
- name: email
description: Lower‑cased email address.
tests:
- not_null
- accepted_values:
values: ['@' ] # example; replace with proper regex test via package
- name: created_at
tests: [not_null]
Docs blocks
{% docs email_col %}
Email stored in lowercase for deduplication and matching.
{% enddocs %}
Reference the doc block in YAML:
- name: email
description: "{{ doc('email_col') }}"
Result: dbt docs generate
mirrors your templated columns with accurate docs.
Step 9 — Guardrails, debugging, and CI
-
Render‑only checks:
dbt compile
to verify generated SQL before running. -
Preview macro output: use
{% do log(your_macro(), info=True) %}
temporarily. - Unit‑like tests: For complex macros, add models that snapshot the macro output and assert against expected fixtures.
-
CI: run
dbt deps && dbt compile && dbt run --select state:modified+ && dbt test
on PRs.
Step 10 — Production proofing & performance
- Prefer static lists for high‑critical marts; use discovery only in staging.
- Keep macros pure (deterministic) where possible; avoid warehouse calls in hot paths.
- Use CTEs to keep the final SELECT flat and easy to debug.
- Centralize transforms in a few macros to minimize surface area for change.
Reusable snippets (copy/paste)
A. Policy column template
{% macro user_policy_columns(alias='') %}
{{ alias }}user_id,
lower({{ alias }}email) as email,
{{ alias }}created_at,
{{ alias }}updated_at
{% endmacro %}
B. YAML‑driven generator
{% macro yaml_columns(model_name, alias='') %}
{% set node = graph.nodes[model_name] %}
{% if not node %}
{% do exceptions.raise_compiler_error('unknown model: ' ~ model_name) %}
{% endif %}
{% set items = [] %}
{% for c in node.columns.values() %}
{% set t = c.meta.get('transform') if c.meta else none %}
{% set expr = (t or '{col}').replace('{col}', alias ~ c.name) %}
{% do items.append(expr ~ ' as ' ~ c.name) %}
{% endfor %}
{{ items | join(',\n ') }}
{% endmacro %}
C. Discovery‑based generator
{% macro discovered_columns_from(relation, include=[], exclude=[], alias='') %}
{% set cols = adapter.get_columns_in_relation(relation) %}
{% set names = cols | map(attribute='name') | list %}
{% if include %}{% set names = names | select('in', include) | list %}{% endif %}
{% if exclude %}{% set names = names | reject('in', exclude) | list %}{% endif %}
{{ names | map('lower') | map('string') | map('trim') | map('regex_replace', '^', alias) | join(',\n ') }}
{% endmacro %}
Putting it all together
models/marts/customers.sql
with base as (
select * from {{ ref('stg_customers') }}
),
final as (
select
{{ user_policy_columns('b.') }}
from base as b
)
select * from final
This compiles to a clean, consistent SELECT with all policy rules applied.
Troubleshooting
-
Macro not found: ensure the file is in
macros/
and the macro name matches. -
KeyError on
graph.nodes[...]
: use the fully qualified node name (model.<name>
orsource.<src>.<table>
), and confirm the YAML exists. - Relation not found (discovery): run once to create the relation, or guard with a fallback list:
{% set rel = ref('possibly_missing') %}
{% if rel is not none %}
{{ discovered_columns_from(rel) }}
{% else %}
id, created_at
{% endif %}
Checklist for your PR
- [ ] Columns are defined via macro(s) or YAML metadata
- [ ] Transforms are centralized (policy macros)
- [ ] Tests & docs reference the canonical column names
- [ ]
dbt compile
output looks correct and readable - [ ] CI runs
deps
,compile
,run
, andtest
Where to go next
- Extend the YAML driver to support type casting, default values, or PII tags via
meta
. - Build package‑style macros so multiple projects can share your policies.
- Use exposures to tie templated columns to downstream assets.
Happy templating! 🧩
Top comments (0)