Philip Hern

Posted on May 29 • Edited on Jul 27 • Originally published at philliant.com

dynamic tables, where have you been all my life?

#snowflake #dynamictables #dataengineering #database

quick answer

snowflake dynamic tables are a declarative, automated data engineering solution that materialize query results without manual pipeline orchestration. unlike standard materialized views, which are highly restricted and do not support joins, dynamic tables manage the dependency graph, calculate target lag, and automatically refresh multi-table query joins behind the scenes.

who this is for

audience: data engineers and analytics engineers managing data platforms on snowflake
prerequisites: familiarity with standard database tables, views, and data warehousing concepts
when to use this guide: when you are designing transactional-to-analytical data flows and want to avoid complex, manual scheduler configurations

why this matters

managing historical records and updating analytical reporting tables typically requires a significant amount of task orchestration. in the past, i spent hours configuring and debugging sequenced tasks, managing change data capture streams, and writing custom merge statements. this manual approach is fragile, expensive to maintain, and prone to scheduling failures.

when you are looking for an automated caching mechanism, the natural instinct is to search for materialized views. however, snowflake materialized views are not supported. this is where dynamic tables step in to solve the exact problem i had been trying to solve for years.

step-by-step

1) define the starting point

before i implemented dynamic tables, our data platform relied on a series of scheduled tasks. we ingested raw transactional records from our upstream application databases into the raw data layer. to populate our reporting tables, we had to coordinate several sequential operations:

standard views queried the raw tables
scheduled tasks ran custom merge statements to update downstream tables on a fixed timeline
we had to build and maintain manual parent-child relationships between tasks to ensure proper execution order

this system was rigid. if an upstream task failed, downstream tasks either ran on stale data or did not execute at all. calculating dependencies and scheduling intervals manually was a continuous headache.

2) apply the change

i recently replaced this complicated orchestration with snowflake dynamic tables. instead of writing imperative tasks and merge scripts, i defined the destination schema declaratively.

a dynamic table is defined by a standard sql query. snowflake automatically manages the execution, determining what has changed and applying those changes to the target table on our behalf. for example, the definition of a basic dynamic table looks like this:

create or replace dynamic table my_reporting_dynamic_table
  target_lag = '1 hour'
  warehouse = my_standard_warehouse
  as
    select
      p.id,
      p.name,
      c.classification,
      p.created_at
    from {{ ref('raw_person_v') }} p
    left join {{ ref('raw_classification_v') }} c
      on p.classification_id = c.id;

the key components of this declarative approach are target lag and dynamic boundaries.

the role of target lag

target lag defines how fresh you need the data to be. instead of telling snowflake when to run (such as a specific cron schedule), you tell snowflake what maximum data latency is acceptable. if you set the target lag to 1 hour, snowflake handles the scheduling automatically to ensure the data in the dynamic table is no more than one hour behind the source tables. if the source data does not change, snowflake does not waste compute resources refreshing the table.

automatic dependency management

snowflake dynamic tables automatically handle dependencies. when you define multiple dynamic tables that reference each other, snowflake builds a directed acyclic graph (dag) of the relationships. snowflake is aware of these dynamic boundaries, so it orchestrates the refresh order across all connected tables to ensure data consistency. you do not need to configure task chains or parent-child dependencies manually.

3) validate the result

once i deployed the dynamic tables, we validated the results by monitoring the snowflake dynamic table account history and query execution logs. i observed several immediate improvements:

declarative simplicity: our pipeline code shrunk significantly because we deleted numerous task definitions, task schedules, and custom merge scripts
automatic healing: when upstream data ingestion paused and resumed, the dynamic tables automatically recalculated and caught up to the target freshness without human intervention
optimized resource utilization: snowflake only consumed compute resources during the active refresh cycles, reducing unnecessary warehouse uptime compared to our old, rigid scheduling intervals

better late than never

discovering this solution felt like finding the missing piece of a puzzle. it is an incredible upgrade to our development workflow, regardless of when it happened.

still, i cannot help but think about how much time and effort would have been saved if i had discovered this earlier. for years, i searched for "materialized views in snowflake" because that was the concept i knew from other relational databases. because snowflake does not have the concept of materialized views, i assumed snowflake did not have a declarative caching mechanism that supported joins, which led me down the path of manual task management.

if you are currently in that position, searching for a way to materialize complex joins without writing brittle scheduled tasks, this is the sign you need:

looking for materialized views in snowflake? try dynamic tables.

this realization would have set me on the right path years ago. hopefully, sharing this experience will help another engineer bypass the struggle of manual task orchestration and jump straight to declarative, automated pipelines.

DEV Community