DEV Community

Cover image for I Just Wanted dbt State Without Another Account
Pradip Sodha
Pradip Sodha

Posted on

I Just Wanted dbt State Without Another Account

Background

DBT open whole new world of how data-transformation happen, DBT brings software engineering, now databricks, snowflake, fabric major cloud supports DBT natively, i will have another blog how DBT win the race of data-transformation where major cloud was in same race but there was one more player emerge which was SQLMesh, i fall in love with SQLMesh.

One of the SQLMesh feature was state management, DBT was stateless and then comes fivetran in picture, acquiring SQLMesh and then DBT and introducing dbt-state.

Introduction

dbt-state is way to make dbt run more mature, i can go on and on about it's features, and i'm sure slowly dbt-state will become a standard.

If i quickly list features then,

  • auto defer
  • NO-OP
  • auto incremental shallow clone

There is one problem in dbt-state, that i face was needs to login into either dbt cloud or app.dbt.state and i really don't wanted to lock in, my reasons was architecture and third party lock in and at last the price, i have azure blob or s3 for storage already.

So I created dbt-state-oss, so instead of using dbt state we can use existing cloud storage such as blob or s3, and server needs to run locally (we can host but works without that).

Mmmmm

One thing i noticed was dbt-state wasn't really storing a manifest file or major files, it was simply storing a one json file which look like,

{"target_table":"test.smoke.table","fingerprint":"abc123","execution_type":"FULL","last_modified_epoch":1780508102671,"table_type":null,"created_at":1780508102.6715958}
Enter fullscreen mode Exit fullscreen mode

and most of feature was actually on-fly such as catalog of prod, state modified feature from git, that was shocker for me and at same time good new.

So question comes what is need of server then, and answer is

Has this exact model already been built, and is it still fresh relative to its inputs?

Quick Start

On dev/prod/ci, install and start the server,

pip install dbt-state-oss          # add [s3] or [azure] for those backends
dbt-state-oss --store local --port 50051
Enter fullscreen mode Exit fullscreen mode

and then export ENV variables, so instead of dbt connect to dbt app state, it will connect to our server,

export RUN_CACHE_API_URL=localhost:50051 RUN_CACHE_API_SECURE=false RUN_CACHE_OAUTH_CLIENT_SECRET=dev
Enter fullscreen mode Exit fullscreen mode

that's all good to go, now run your existing job as it is no change,

dbt build
Enter fullscreen mode Exit fullscreen mode

Conclusion

SQLMesh has philosophy to keep things open-source, dbt-state is good feature, i really thanks Tobias (Toby) Mao and his team to bring such a wonderful feature but dbt-state should able to connect to existing cloud storage.

Top comments (0)