DEV Community

Chetan Gupta
Chetan Gupta

Posted on

Templating Columns in dbt: Helpful path to forward

Ship consistent, DRY models by generating column lists with Jinja, macros, and metadata.


Who this is for

Analytics engineers and data engineers who want to avoid copy‑pasting column lists across models and instead generate them from templates with dbt + Jinja.

What you’ll build

You’ll create a small dbt project that:

  • Centralizes column definitions in macros (with optional aliases/prefixes)
  • Generates SELECT lists from YAML metadata and adapter‑discovered schemas
  • Applies policy‑style transforms (e.g., lowercase emails) consistently
  • Supports environment‑specific or source‑specific column sets
  • Automatically tests and documents the templated columns

You can copy/paste each step. By the end, you’ll have a reusable pattern you can drop into any dbt project.


Prerequisites

  • dbt (Core or Cloud) installed
  • A warehouse connection configured (profiles.yml)
  • Basic awareness of Jinja syntax

Project structure

.
├─ dbt_project.yml
├─ models/
│  ├─ marts/
│  │  └─ customers.sql
│  ├─ staging/
│  │  └─ stg_customers.sql
│  └─ schema.yml
└─ macros/
   ├─ columns/
   │  ├─ select_common_columns.sql
   │  ├─ policy_columns.sql
   │  ├─ yaml_columns.sql
   │  └─ discover_columns.sql
   └─ utilities.sql
Enter fullscreen mode Exit fullscreen mode

Create the macros/columns folder; we’ll populate it as we go.


Step 1 — The simplest win: variables → columns

Put repeatable column names in dbt_project.yml as vars.

dbt_project.yml

name: my_project
version: 1.0.0
config-version: 2

vars:
  common_columns:
    - id
    - created_at
    - updated_at
Enter fullscreen mode Exit fullscreen mode

models/staging/stg_customers.sql

select
  {{ ', '.join(var('common_columns')) }},
  email,
  first_name,
  last_name
from {{ source('app','customers') }}
Enter fullscreen mode Exit fullscreen mode

When to use: lightweight reuse without custom logic.


Step 2 — Macro‑driven SELECT lists (with prefixes/aliases)

Centralize the column list in a macro so you can pass a table alias and keep your SQL tidy.

macros/columns/select_common_columns.sql

{% macro select_common_columns(alias='') %}
    {{ alias }}id,
    {{ alias }}created_at,
    {{ alias }}updated_at
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

models/marts/customers.sql

with base as (
  select * from {{ ref('stg_customers') }}
)
select
  {{ select_common_columns('b.') }},
  b.email,
  b.first_name,
  b.last_name
from base as b
Enter fullscreen mode Exit fullscreen mode

Why it’s great: consistent, alias‑safe, easy to extend.


Step 3 — Policy columns: apply consistent transforms

Wrap transforms (PII handling, normalisation) in a macro and reuse them everywhere.

macros/columns/policy_columns.sql

{% macro customer_policy_columns(alias='') %}
    {{ alias }}id,
    lower({{ alias }}email) as email,
    {{ alias }}first_name,
    {{ alias }}last_name,
    {{ alias }}created_at,
    {{ alias }}updated_at
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

models/marts/customers.sql

select
  {{ customer_policy_columns('b.') }}
from {{ ref('stg_customers') }} as b
Enter fullscreen mode Exit fullscreen mode

Pattern: treat a column template like a “policy” you can apply repeatedly.


Step 4 — YAML‑driven column generation

Keep column specs as metadata in schema.yml and generate the SELECT list from it.

models/schema.yml

version: 2
models:
  - name: stg_customers
    columns:
      - name: id
      - name: email
        tests: [not_null]
        meta:
          transform: "lower({col})"
      - name: first_name
      - name: last_name
      - name: created_at
      - name: updated_at
Enter fullscreen mode Exit fullscreen mode

macros/columns/yaml_columns.sql

{#
  Generate a SELECT list from model YAML metadata.
  - Reads columns defined under the *upstream* model name provided.
  - Supports a per-column meta.transform, where `{col}` is replaced by the column name.
#}
{% macro yaml_columns(model_name, alias='') %}
  {% set model = graph.nodes[model_name] %}
  {% if not model %}
    {# Fail early if the name is wrong #}
    {% do exceptions.raise_compiler_error('yaml_columns: model ' ~ model_name ~ ' not found in graph') %}
  {% endif %}

  {% set rendered = [] %}
  {% for c in model.columns.values() %}
    {% set transform = c.meta.get('transform') if c.meta else none %}
    {% if transform %}
      {% set expr = transform.replace('{col}', alias ~ c.name) ~ ' as ' ~ c.name %}
    {% else %}
      {% set expr = alias ~ c.name %}
    {% endif %}
    {% do rendered.append(expr) %}
  {% endfor %}
  {{ rendered | join(',\n  ') }}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

Usage

select
  {{ yaml_columns('model.stg_customers', 's.') }}
from {{ ref('stg_customers') }} as s
Enter fullscreen mode Exit fullscreen mode

Notes

  • The graph object is available at compile time and includes YAML column metadata.
  • Use the fully qualified name: model.<node_name> for models and source.<source_name>.<table_name> for sources.

Step 5 — Discover columns dynamically from the warehouse

Sometimes you need to reflect the actual schema at compile time.

macros/columns/discover_columns.sql

{#
  Gets column names from a relation at compile time and optionally maps/filters them.
#}
{% macro discovered_columns_from(relation, include=[], exclude=[], alias='') %}
  {% set cols = adapter.get_columns_in_relation(relation) %}
  {% set names = cols | map(attribute='name') | list %}

  {# Apply include/exclude if provided #}
  {% if include and include | length > 0 %}
    {% set names = names | select('in', include) | list %}
  {% endif %}
  {% if exclude and exclude | length > 0 %}
    {% set names = names | reject('in', exclude) | list %}
  {% endif %}

  {{ names | map('lower') | map('string') | map('trim') | map('regex_replace', '^', alias) | join(',\n  ') }}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

Usage

{% set rel = ref('stg_customers') %}
select
  {{ discovered_columns_from(rel, exclude=['_load_dt'], alias='c.') }}
from {{ rel }} as c
Enter fullscreen mode Exit fullscreen mode

When to use: quickly mirror upstream schema, or build pass‑through layers.

⚠️ Caveat: adapter.get_columns_in_relation runs at compile time; if the relation doesn’t exist yet (e.g., first run), create it once or fall back to a known list.


Step 6 — Column mapping & renaming at scale

Create a mapping to rename raw columns to canonical names, with optional transforms.

macros/utilities.sql

{% macro render_mapping(mapping, alias='') %}
  {# mapping: list of dicts: {from: "EMAIL_ADDRESS", to: "email", transform: "lower({col})"} #}
  {% set parts = [] %}
  {% for m in mapping %}
    {% set expr = (m.get('transform') or '{col}').replace('{col}', alias ~ m['from']) %}
    {% do parts.append(expr ~ ' as ' ~ m['to']) %}
  {% endfor %}
  {{ parts | join(',\n  ') }}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

models/staging/stg_customers.sql

{% set mapping = [
  {"from": "CUSTOMER_ID", "to": "id"},
  {"from": "EMAIL_ADDRESS", "to": "email", "transform": "lower({col})"},
  {"from": "FIRST_NAME", "to": "first_name"},
  {"from": "LAST_NAME", "to": "last_name"},
  {"from": "CREATED_TS", "to": "created_at"},
  {"from": "UPDATED_TS", "to": "updated_at"}
] %}

select
  {{ render_mapping(mapping, 'r.') }}
from {{ source('app','raw_customers') }} as r
Enter fullscreen mode Exit fullscreen mode

Benefit: auditably documents how raw columns map to curated names.


Step 7 — Environment / source specific templates

Use vars or target.name to toggle column sets across environments.

macros/columns/select_common_columns.sql (extended)

{% macro select_common_columns(alias='') %}
  {% if target.name in ['prod', 'staging'] %}
    {{ alias }}id,
    {{ alias }}created_at,
    {{ alias }}updated_at,
    {{ alias }}ingested_at
  {% else %}
    {{ alias }}id,
    {{ alias }}created_at,
    {{ alias }}updated_at
  {% endif %}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

Tip: For source‑specific logic, branch on this.name / this.schema or pass a flag into the macro.


Step 8 — Tests & docs that follow your templates

When you template columns, also template the tests and descriptions.

models/schema.yml

version: 2
models:
  - name: customers
    description: Curated customers with standardized email and timestamps.
    columns:
      - name: id
        tests: [unique, not_null]
      - name: email
        description: Lower‑cased email address.
        tests:
          - not_null
          - accepted_values:
              values: ['@' ] # example; replace with proper regex test via package
      - name: created_at
        tests: [not_null]
Enter fullscreen mode Exit fullscreen mode

Docs blocks

{% docs email_col %}
Email stored in lowercase for deduplication and matching.
{% enddocs %}
Enter fullscreen mode Exit fullscreen mode

Reference the doc block in YAML:

- name: email
  description: "{{ doc('email_col') }}"
Enter fullscreen mode Exit fullscreen mode

Result: dbt docs generate mirrors your templated columns with accurate docs.


Step 9 — Guardrails, debugging, and CI

  • Render‑only checks: dbt compile to verify generated SQL before running.
  • Preview macro output: use {% do log(your_macro(), info=True) %} temporarily.
  • Unit‑like tests: For complex macros, add models that snapshot the macro output and assert against expected fixtures.
  • CI: run dbt deps && dbt compile && dbt run --select state:modified+ && dbt test on PRs.

Step 10 — Production proofing & performance

  • Prefer static lists for high‑critical marts; use discovery only in staging.
  • Keep macros pure (deterministic) where possible; avoid warehouse calls in hot paths.
  • Use CTEs to keep the final SELECT flat and easy to debug.
  • Centralize transforms in a few macros to minimize surface area for change.

Reusable snippets (copy/paste)

A. Policy column template

{% macro user_policy_columns(alias='') %}
    {{ alias }}user_id,
    lower({{ alias }}email) as email,
    {{ alias }}created_at,
    {{ alias }}updated_at
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

B. YAML‑driven generator

{% macro yaml_columns(model_name, alias='') %}
  {% set node = graph.nodes[model_name] %}
  {% if not node %}
    {% do exceptions.raise_compiler_error('unknown model: ' ~ model_name) %}
  {% endif %}
  {% set items = [] %}
  {% for c in node.columns.values() %}
    {% set t = c.meta.get('transform') if c.meta else none %}
    {% set expr = (t or '{col}').replace('{col}', alias ~ c.name) %}
    {% do items.append(expr ~ ' as ' ~ c.name) %}
  {% endfor %}
  {{ items | join(',\n  ') }}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

C. Discovery‑based generator

{% macro discovered_columns_from(relation, include=[], exclude=[], alias='') %}
  {% set cols = adapter.get_columns_in_relation(relation) %}
  {% set names = cols | map(attribute='name') | list %}
  {% if include %}{% set names = names | select('in', include) | list %}{% endif %}
  {% if exclude %}{% set names = names | reject('in', exclude) | list %}{% endif %}
  {{ names | map('lower') | map('string') | map('trim') | map('regex_replace', '^', alias) | join(',\n  ') }}
{% endmacro %}
Enter fullscreen mode Exit fullscreen mode

Putting it all together

models/marts/customers.sql

with base as (
  select * from {{ ref('stg_customers') }}
),
final as (
  select
    {{ user_policy_columns('b.') }}
  from base as b
)
select * from final
Enter fullscreen mode Exit fullscreen mode

This compiles to a clean, consistent SELECT with all policy rules applied.


Troubleshooting

  • Macro not found: ensure the file is in macros/ and the macro name matches.
  • KeyError on graph.nodes[...]: use the fully qualified node name (model.<name> or source.<src>.<table>), and confirm the YAML exists.
  • Relation not found (discovery): run once to create the relation, or guard with a fallback list:
  {% set rel = ref('possibly_missing') %}
  {% if rel is not none %}
    {{ discovered_columns_from(rel) }}
  {% else %}
    id, created_at
  {% endif %}
Enter fullscreen mode Exit fullscreen mode

Checklist for your PR

  • [ ] Columns are defined via macro(s) or YAML metadata
  • [ ] Transforms are centralized (policy macros)
  • [ ] Tests & docs reference the canonical column names
  • [ ] dbt compile output looks correct and readable
  • [ ] CI runs deps, compile, run, and test

Where to go next

  • Extend the YAML driver to support type casting, default values, or PII tags via meta.
  • Build package‑style macros so multiple projects can share your policies.
  • Use exposures to tie templated columns to downstream assets.

Happy templating! 🧩

Top comments (0)