DEV Community

Cover image for Data Lakehouse with dbt and DuckLake
Data Lab Tech TV
Data Lab Tech TV

Posted on

Data Lakehouse with dbt and DuckLake

Now that you can use DuckDB to power your data lakehouse through DuckLake, you'll also save space on snapshots due to the ability to reference parts of parquet files (yes, you can keep all old versions, with little impact to storage), and you'll get improved performance for small change operations due to data inlining, which lets data be stored directly within the metadata database (SQLite, PostgreSQL, etc.).

With a little help from dbt and an unreleased branch of dbt-duckdb adapter, I was able to design a data lakehouse strategy, covering data ingestion, transformation, and exporting, almost exclusively based on SQL, on top of DuckDB and the newly released DuckLake!

The code is available as open source on GitHub, at DataLabTechTV/datalab, and the README will cover most of the details you need to understand it and get it running. I wrote a blog post where I covered some of the most interesting components or issues, and provided a few comments about the whole process. You can also see data lab in action, and learn more about it, by watching the video that I just uploaded about it this week!

Top comments (0)