Chetan Gupta

Posted on Oct 18

Templating Columns in dbt: Helpful path to forward

#dbt #datatransformation #datamodelling #sql

Ship consistent, DRY models by generating column lists with Jinja, macros, and metadata.

Who this is for

Analytics engineers and data engineers who want to avoid copy‑pasting column lists across models and instead generate them from templates with dbt + Jinja.

What you’ll build

You’ll create a small dbt project that:

Centralizes column definitions in macros (with optional aliases/prefixes)
Generates SELECT lists from YAML metadata and adapter‑discovered schemas
Applies policy‑style transforms (e.g., lowercase emails) consistently
Supports environment‑specific or source‑specific column sets
Automatically tests and documents the templated columns

You can copy/paste each step. By the end, you’ll have a reusable pattern you can drop into any dbt project.

Prerequisites

dbt (Core or Cloud) installed
A warehouse connection configured (profiles.yml)
Basic awareness of Jinja syntax

Project structure

.
├─ dbt_project.yml
├─ models/
│  ├─ marts/
│  │  └─ customers.sql
│  ├─ staging/
│  │  └─ stg_customers.sql
│  └─ schema.yml
└─ macros/
   ├─ columns/
   │  ├─ select_common_columns.sql
   │  ├─ policy_columns.sql
   │  ├─ yaml_columns.sql
   │  └─ discover_columns.sql
   └─ utilities.sql

Create the macros/columns folder; we’ll populate it as we go.

Step 1 — The simplest win: variables → columns

Put repeatable column names in dbt_project.yml as vars.

dbt_project.yml

name: my_project
version: 1.0.0
config-version: 2

vars:
  common_columns:
    - id
    - created_at
    - updated_at

models/staging/stg_customers.sql

select
  {{ ', '.join(var('common_columns')) }},
  email,
  first_name,
  last_name
from {{ source('app','customers') }}

When to use: lightweight reuse without custom logic.

Step 2 — Macro‑driven SELECT lists (with prefixes/aliases)

Centralize the column list in a macro so you can pass a table alias and keep your SQL tidy.

macros/columns/select_common_columns.sql

{% macro select_common_columns(alias='') %}
    {{ alias }}id,
    {{ alias }}created_at,
    {{ alias }}updated_at
{% endmacro %}

models/marts/customers.sql

with base as (
  select * from {{ ref('stg_customers') }}
)
select
  {{ select_common_columns('b.') }},
  b.email,
  b.first_name,
  b.last_name
from base as b

Why it’s great: consistent, alias‑safe, easy to extend.

Step 3 — Policy columns: apply consistent transforms

Wrap transforms (PII handling, normalisation) in a macro and reuse them everywhere.

macros/columns/policy_columns.sql

{% macro customer_policy_columns(alias='') %}
    {{ alias }}id,
    lower({{ alias }}email) as email,
    {{ alias }}first_name,
    {{ alias }}last_name,
    {{ alias }}created_at,
    {{ alias }}updated_at
{% endmacro %}

models/marts/customers.sql

select
  {{ customer_policy_columns('b.') }}
from {{ ref('stg_customers') }} as b

Pattern: treat a column template like a “policy” you can apply repeatedly.

Step 4 — YAML‑driven column generation

Keep column specs as metadata in schema.yml and generate the SELECT list from it.

models/schema.yml

version: 2
models:
  - name: stg_customers
    columns:
      - name: id
      - name: email
        tests: [not_null]
        meta:
          transform: "lower({col})"
      - name: first_name
      - name: last_name
      - name: created_at
      - name: updated_at

macros/columns/yaml_columns.sql

{#
  Generate a SELECT list from model YAML metadata.
  - Reads columns defined under the *upstream* model name provided.
  - Supports a per-column meta.transform, where `{col}` is replaced by the column name.
#}
{% macro yaml_columns(model_name, alias='') %}
  {% set model = graph.nodes[model_name] %}
  {% if not model %}
    {# Fail early if the name is wrong #}
    {% do exceptions.raise_compiler_error('yaml_columns: model ' ~ model_name ~ ' not found in graph') %}
  {% endif %}

  {% set rendered = [] %}
  {% for c in model.columns.values() %}
    {% set transform = c.meta.get('transform') if c.meta else none %}
    {% if transform %}
      {% set expr = transform.replace('{col}', alias ~ c.name) ~ ' as ' ~ c.name %}
    {% else %}
      {% set expr = alias ~ c.name %}
    {% endif %}
    {% do rendered.append(expr) %}
  {% endfor %}
  {{ rendered | join(',\n  ') }}
{% endmacro %}

Usage

select
  {{ yaml_columns('model.stg_customers', 's.') }}
from {{ ref('stg_customers') }} as s

Notes

The graph object is available at compile time and includes YAML column metadata.
Use the fully qualified name: model.<node_name> for models and source.<source_name>.<table_name> for sources.

Step 5 — Discover columns dynamically from the warehouse

Sometimes you need to reflect the actual schema at compile time.

macros/columns/discover_columns.sql

{#
  Gets column names from a relation at compile time and optionally maps/filters them.
#}
{% macro discovered_columns_from(relation, include=[], exclude=[], alias='') %}
  {% set cols = adapter.get_columns_in_relation(relation) %}
  {% set names = cols | map(attribute='name') | list %}

  {# Apply include/exclude if provided #}
  {% if include and include | length > 0 %}
    {% set names = names | select('in', include) | list %}
  {% endif %}
  {% if exclude and exclude | length > 0 %}
    {% set names = names | reject('in', exclude) | list %}
  {% endif %}

  {{ names | map('lower') | map('string') | map('trim') | map('regex_replace', '^', alias) | join(',\n  ') }}
{% endmacro %}

Usage

{% set rel = ref('stg_customers') %}
select
  {{ discovered_columns_from(rel, exclude=['_load_dt'], alias='c.') }}
from {{ rel }} as c

When to use: quickly mirror upstream schema, or build pass‑through layers.

⚠️ Caveat: adapter.get_columns_in_relation runs at compile time; if the relation doesn’t exist yet (e.g., first run), create it once or fall back to a known list.

Step 6 — Column mapping & renaming at scale

Create a mapping to rename raw columns to canonical names, with optional transforms.

macros/utilities.sql

{% macro render_mapping(mapping, alias='') %}
  {# mapping: list of dicts: {from: "EMAIL_ADDRESS", to: "email", transform: "lower({col})"} #}
  {% set parts = [] %}
  {% for m in mapping %}
    {% set expr = (m.get('transform') or '{col}').replace('{col}', alias ~ m['from']) %}
    {% do parts.append(expr ~ ' as ' ~ m['to']) %}
  {% endfor %}
  {{ parts | join(',\n  ') }}
{% endmacro %}

models/staging/stg_customers.sql

{% set mapping = [
  {"from": "CUSTOMER_ID", "to": "id"},
  {"from": "EMAIL_ADDRESS", "to": "email", "transform": "lower({col})"},
  {"from": "FIRST_NAME", "to": "first_name"},
  {"from": "LAST_NAME", "to": "last_name"},
  {"from": "CREATED_TS", "to": "created_at"},
  {"from": "UPDATED_TS", "to": "updated_at"}
] %}

select
  {{ render_mapping(mapping, 'r.') }}
from {{ source('app','raw_customers') }} as r

Benefit: auditably documents how raw columns map to curated names.

Step 7 — Environment / source specific templates

Use vars or target.name to toggle column sets across environments.

macros/columns/select_common_columns.sql (extended)

{% macro select_common_columns(alias='') %}
  {% if target.name in ['prod', 'staging'] %}
    {{ alias }}id,
    {{ alias }}created_at,
    {{ alias }}updated_at,
    {{ alias }}ingested_at
  {% else %}
    {{ alias }}id,
    {{ alias }}created_at,
    {{ alias }}updated_at
  {% endif %}
{% endmacro %}

Tip: For source‑specific logic, branch on this.name / this.schema or pass a flag into the macro.

Step 8 — Tests & docs that follow your templates

When you template columns, also template the tests and descriptions.

models/schema.yml

version: 2
models:
  - name: customers
    description: Curated customers with standardized email and timestamps.
    columns:
      - name: id
        tests: [unique, not_null]
      - name: email
        description: Lower‑cased email address.
        tests:
          - not_null
          - accepted_values:
              values: ['@' ] # example; replace with proper regex test via package
      - name: created_at
        tests: [not_null]

Docs blocks

{% docs email_col %}
Email stored in lowercase for deduplication and matching.
{% enddocs %}

Reference the doc block in YAML:

- name: email
  description: "{{ doc('email_col') }}"

Result: dbt docs generate mirrors your templated columns with accurate docs.

Step 9 — Guardrails, debugging, and CI

Render‑only checks: dbt compile to verify generated SQL before running.
Preview macro output: use {% do log(your_macro(), info=True) %} temporarily.
Unit‑like tests: For complex macros, add models that snapshot the macro output and assert against expected fixtures.
CI: run dbt deps && dbt compile && dbt run --select state:modified+ && dbt test on PRs.

Step 10 — Production proofing & performance

Prefer static lists for high‑critical marts; use discovery only in staging.
Keep macros pure (deterministic) where possible; avoid warehouse calls in hot paths.
Use CTEs to keep the final SELECT flat and easy to debug.
Centralize transforms in a few macros to minimize surface area for change.

Reusable snippets (copy/paste)

A. Policy column template

{% macro user_policy_columns(alias='') %}
    {{ alias }}user_id,
    lower({{ alias }}email) as email,
    {{ alias }}created_at,
    {{ alias }}updated_at
{% endmacro %}

B. YAML‑driven generator

{% macro yaml_columns(model_name, alias='') %}
  {% set node = graph.nodes[model_name] %}
  {% if not node %}
    {% do exceptions.raise_compiler_error('unknown model: ' ~ model_name) %}
  {% endif %}
  {% set items = [] %}
  {% for c in node.columns.values() %}
    {% set t = c.meta.get('transform') if c.meta else none %}
    {% set expr = (t or '{col}').replace('{col}', alias ~ c.name) %}
    {% do items.append(expr ~ ' as ' ~ c.name) %}
  {% endfor %}
  {{ items | join(',\n  ') }}
{% endmacro %}

C. Discovery‑based generator

{% macro discovered_columns_from(relation, include=[], exclude=[], alias='') %}
  {% set cols = adapter.get_columns_in_relation(relation) %}
  {% set names = cols | map(attribute='name') | list %}
  {% if include %}{% set names = names | select('in', include) | list %}{% endif %}
  {% if exclude %}{% set names = names | reject('in', exclude) | list %}{% endif %}
  {{ names | map('lower') | map('string') | map('trim') | map('regex_replace', '^', alias) | join(',\n  ') }}
{% endmacro %}

Putting it all together

models/marts/customers.sql

with base as (
  select * from {{ ref('stg_customers') }}
),
final as (
  select
    {{ user_policy_columns('b.') }}
  from base as b
)
select * from final

This compiles to a clean, consistent SELECT with all policy rules applied.

Troubleshooting

Macro not found: ensure the file is in macros/ and the macro name matches.
KeyError on graph.nodes[...]: use the fully qualified node name (model.<name> or source.<src>.<table>), and confirm the YAML exists.
Relation not found (discovery): run once to create the relation, or guard with a fallback list:

  {% set rel = ref('possibly_missing') %}
  {% if rel is not none %}
    {{ discovered_columns_from(rel) }}
  {% else %}
    id, created_at
  {% endif %}

Checklist for your PR

[ ] Columns are defined via macro(s) or YAML metadata
[ ] Transforms are centralized (policy macros)
[ ] Tests & docs reference the canonical column names
[ ] dbt compile output looks correct and readable
[ ] CI runs deps, compile, run, and test

Where to go next

Extend the YAML driver to support type casting, default values, or PII tags via meta.
Build package‑style macros so multiple projects can share your policies.
Use exposures to tie templated columns to downstream assets.

Happy templating! 🧩

DEV Community

Templating Columns in dbt: Helpful path to forward

Who this is for

What you’ll build

Prerequisites

Project structure

Step 1 — The simplest win: variables → columns

Step 2 — Macro‑driven SELECT lists (with prefixes/aliases)

Step 3 — Policy columns: apply consistent transforms

Step 4 — YAML‑driven column generation

Step 5 — Discover columns dynamically from the warehouse

Step 6 — Column mapping & renaming at scale

Step 7 — Environment / source specific templates

Step 8 — Tests & docs that follow your templates

Step 9 — Guardrails, debugging, and CI

Step 10 — Production proofing & performance

Reusable snippets (copy/paste)

A. Policy column template

B. YAML‑driven generator

C. Discovery‑based generator

Putting it all together

Troubleshooting

Checklist for your PR

Where to go next

Top comments (0)