<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Siraj Syed</title>
    <description>The latest articles on DEV Community by Siraj Syed (@siraj_syed_a122e4986ce967).</description>
    <link>https://dev.to/siraj_syed_a122e4986ce967</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2822318%2F7f921fdb-6760-4cae-8d6e-1a115086b994.jpg</url>
      <title>DEV Community: Siraj Syed</title>
      <link>https://dev.to/siraj_syed_a122e4986ce967</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/siraj_syed_a122e4986ce967"/>
    <language>en</language>
    <item>
      <title>Part 2: Syncing Normalized PostgreSQL Data to Denormalized ClickHouse Using Airbyte + DBT</title>
      <dc:creator>Siraj Syed</dc:creator>
      <pubDate>Sat, 10 May 2025 15:02:37 +0000</pubDate>
      <link>https://dev.to/siraj_syed_a122e4986ce967/part-2-syncing-normalized-postgresql-data-to-denormalized-clickhouse-using-airbyte-dbt-2hic</link>
      <guid>https://dev.to/siraj_syed_a122e4986ce967/part-2-syncing-normalized-postgresql-data-to-denormalized-clickhouse-using-airbyte-dbt-2hic</guid>
      <description>&lt;h2&gt;
  
  
  From Transactional Trenches to Analytical Ascent: PostgreSQL to ClickHouse with Airbyte and DBT
&lt;/h2&gt;

&lt;p&gt;In Part 1, we delved into the fundamental reasons why shoehorning your PostgreSQL data model directly into ClickHouse is a recipe for analytical sluggishness. We highlighted the contrasting strengths of row-oriented OLTP databases like PostgreSQL and column-oriented OLAP powerhouses like ClickHouse.&lt;/p&gt;

&lt;p&gt;Now, let's roll up our sleeves and translate that theory into a tangible, real-world solution. In this article, we'll embark on a journey to build a robust data pipeline that seamlessly syncs your normalized Online Transaction Processing (OLTP) data residing in PostgreSQL into a highly performant, denormalized schema optimized for Online Analytical Processing (OLAP) within ClickHouse.&lt;/p&gt;

&lt;p&gt;Our trusty companions on this expedition will be Airbyte for Change Data Capture (CDC) based ingestion and dbt (data build tool) for elegant transformations and nimble schema evolution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Our Architectural Blueprint
&lt;/h3&gt;

&lt;p&gt;Here's a visual representation of the data flow we'll be constructing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl8eqtjnko8og6i6w8oq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl8eqtjnko8og6i6w8oq.png" alt="Image description" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Laying the Foundation - Defining Your Source Schema in PostgreSQL
&lt;/h3&gt;

&lt;p&gt;Let's consider a common scenario: a basic e-commerce application. Our transactional data in PostgreSQL is structured in a normalized fashion, ensuring data integrity and minimizing redundancy for efficient writes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- users&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- orders&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This normalized structure, with separate &lt;code&gt;users&lt;/code&gt; and &lt;code&gt;orders&lt;/code&gt; tables linked by foreign keys, is ideal for handling transactional operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Setting Sail with Airbyte - Ingesting Data from Postgres&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Airbyte steps in as our reliable vessel for data ingestion. Its robust support for CDC (Change Data Capture) via the PostgreSQL Write-Ahead Log (WAL) allows us to stream changes in near real-time into ClickHouse. This approach ensures low latency and captures every modification made to our source data.&lt;/p&gt;

&lt;p&gt;To get this working, you'll need to configure Airbyte with the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source:&lt;/strong&gt; Connect to your PostgreSQL instance. Ensure you've enabled logical replication and created a replication slot, as these are prerequisites for CDC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Destination:&lt;/strong&gt; Configure your ClickHouse instance as the destination. Leverage the HTTP destination with compression for efficient data transfer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sync Mode:&lt;/strong&gt; Choose a sync mode that supports incremental updates with change tracking. "Incremental + Append" or a dedicated "CDC" mode (if available for the Postgres connector) are suitable options.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upon successful configuration, Airbyte will land the raw data in ClickHouse within tables named something like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;_airbyte_raw_users&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;_airbyte_raw_orders&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data within these tables will typically be structured as raw JSON blobs, with each row containing metadata and the actual data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_ab_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a unique identifier for the Airbyte record"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_ab_emitted_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"timestamp of when Airbyte processed the record"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Crafting Insights with DBT - Transformation and Denormalization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now comes the crucial step of shaping this raw data into an analytical powerhouse. This is where dbt shines. By connecting dbt to your ClickHouse instance (using adapters like &lt;code&gt;dbt-clickhouse&lt;/code&gt;), you can write SQL-based models to extract, transform, and load the data into your desired denormalized schema.&lt;/p&gt;

&lt;p&gt;Let's look at an example dbt model, &lt;code&gt;orders_flat.sql&lt;/code&gt;, that denormalizes the &lt;code&gt;orders&lt;/code&gt; data by joining it with relevant information from the &lt;code&gt;users&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;raw_orders&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;toDecimal128OrZero&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'total_amount'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;parseDateTimeBestEffort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;_airbyte_raw_orders&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;

&lt;span class="n"&gt;enriched_orders&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
  &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_airbyte_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'email'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;_airbyte_raw_users&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;enriched_orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We first extract the relevant fields from the raw JSON data ingested by Airbyte using ClickHouse's JSON functions like &lt;code&gt;JSONExtractString&lt;/code&gt;. We also perform basic type casting.&lt;/li&gt;
&lt;li&gt;Then, we join the extracted &lt;code&gt;orders&lt;/code&gt; data with the relevant fields from the &lt;code&gt;users&lt;/code&gt; data based on the &lt;code&gt;user_id&lt;/code&gt;. This denormalizes the data, bringing related information into a single table.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Optimizing for Speed in ClickHouse - Partitioning and Materialization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To truly unlock ClickHouse's analytical prowess, we need to structure our tables for optimal query performance. Partitioning and ordering are key techniques. Let's materialize our &lt;code&gt;enriched_orders&lt;/code&gt; model into a ClickHouse table with these optimizations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders_flat&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toYYYYMM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;enriched_orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's why this is important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ENGINE = MergeTree&lt;/code&gt;:&lt;/strong&gt; This is a family of powerful table engines in ClickHouse designed for high-performance data processing and analytics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PARTITION BY toYYYYMM(created_at)&lt;/code&gt;:&lt;/strong&gt; Partitioning the data by year and month of the &lt;code&gt;created_at&lt;/code&gt; column allows ClickHouse to efficiently skip irrelevant data during queries that filter by date ranges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ORDER BY (user_id, created_at)&lt;/code&gt;:&lt;/strong&gt; Specifying an order key helps ClickHouse organize the data within each partition, enabling faster data retrieval for queries that filter or sort by these columns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AS SELECT * FROM enriched_orders&lt;/code&gt;:&lt;/strong&gt; This creates the table and populates it with the results of our dbt transformation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 5: Unleashing Analytical Power - Querying Your Denormalized Data&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With our denormalized and optimized data now residing in ClickHouse, we can execute analytical queries that would be prohibitively slow on our normalized PostgreSQL database, especially on large datasets.&lt;/p&gt;

&lt;p&gt;For example, to count daily orders and calculate total revenue over the last 30 days:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders_flat&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 day'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query, leveraging ClickHouse's columnar storage and indexing capabilities, will execute almost instantly, providing valuable business insights.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Automating Your Data Pipeline with CI/CD&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To ensure a smooth and reliable data flow, consider automating your dbt transformations. You can achieve this by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Using DBT Cloud:&lt;/strong&gt; This managed service provides a web-based interface for developing, scheduling, and monitoring your dbt projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementing CI/CD Pipelines:&lt;/strong&gt; Integrate your dbt runs into your Continuous Integration/Continuous Deployment (CI/CD) pipelines (e.g., using GitLab CI, GitHub Actions) to automatically trigger transformations whenever new code is merged or on a scheduled basis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Contracts and Schema Registry (Optional):&lt;/strong&gt; For more complex environments, consider implementing data contracts or using a schema registry to track and manage schema changes across your PostgreSQL and ClickHouse systems, preventing breaking changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Gotchas to Navigate&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;While this pattern is powerful, here are some common pitfalls and their solutions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse schema drift&lt;/td&gt;
&lt;td&gt;Use DBT to re-materialize views and avoid dynamic columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timestamp mismatches&lt;/td&gt;
&lt;td&gt;Normalize to UTC early in the pipeline (ideally within dbt)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NULL vs empty string in JSON&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;ifNullOrDefault()&lt;/code&gt; or &lt;code&gt;assumeNotNull()&lt;/code&gt; carefully in ClickHouse queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data bloat in Airbyte raw tables&lt;/td&gt;
&lt;td&gt;Apply retention policies or configure auto-dropping of raw staging tables&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Final Thoughts: A Powerful Paradigm
&lt;/h3&gt;

&lt;p&gt;This architecture, leveraging the strengths of PostgreSQL for transactional integrity and ClickHouse for analytical speed, orchestrated by Airbyte for seamless ingestion and dbt for elegant transformation, offers a compelling solution for modern data pipelines.&lt;/p&gt;

&lt;p&gt;It allows you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safeguard your OLTP workloads&lt;/strong&gt; in PostgreSQL without compromising analytical performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offload demanding analytics&lt;/strong&gt; to the lightning-fast columnar engine of ClickHouse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain a clear separation of concerns&lt;/strong&gt;, avoiding the complexities of trying to fit an analytical workload onto a transactional database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Future-proof your data infrastructure&lt;/strong&gt; by adopting decoupled and specialized tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Airbyte and dbt as your allies, your data becomes a fluid asset, readily transformed and analyzed to drive meaningful insights.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Up: Sharing Your Analytical Treasures
&lt;/h3&gt;

&lt;p&gt;In Part 3, we'll explore how to expose your meticulously crafted ClickHouse OLAP layer as an API or embeddable dashboard for your customers, all while implementing robust access controls and cost attribution strategies. Stay tuned!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>dbt</category>
      <category>airbyte</category>
      <category>clickhouse</category>
    </item>
    <item>
      <title>Designing Data Models That Work for Both PostgreSQL and ClickHouse: A Developer’s Guide</title>
      <dc:creator>Siraj Syed</dc:creator>
      <pubDate>Fri, 09 May 2025 02:27:58 +0000</pubDate>
      <link>https://dev.to/siraj_syed_a122e4986ce967/designing-data-models-that-work-for-both-postgresql-and-clickhouse-a-developers-guide-3e4d</link>
      <guid>https://dev.to/siraj_syed_a122e4986ce967/designing-data-models-that-work-for-both-postgresql-and-clickhouse-a-developers-guide-3e4d</guid>
      <description>&lt;p&gt;Modern applications are increasingly architected with &lt;strong&gt;PostgreSQL&lt;/strong&gt; for OLTP (transactions) and &lt;strong&gt;ClickHouse&lt;/strong&gt; for OLAP (analytics). This hybrid design gives you the best of both worlds: reliable writes and blazing-fast reads.&lt;/p&gt;

&lt;p&gt;But here’s the catch—you can’t model data the same way in both. A normalized model that’s perfect for Postgres could kill performance in ClickHouse. And a denormalized, flattened schema for ClickHouse might break constraints and business logic in Postgres.&lt;/p&gt;

&lt;p&gt;So how do you design a model that works well enough across both?&lt;/p&gt;

&lt;p&gt;Let’s walk through the &lt;strong&gt;key principles, trade-offs, and best practices&lt;/strong&gt; for dual-target data modeling that won’t leave you regretting schema decisions six months later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Use PostgreSQL + ClickHouse?
&lt;/h2&gt;

&lt;p&gt;Before diving into data modeling, here’s the high-level architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL:&lt;/strong&gt; Primary source of truth. Handles transactions, constraints, and app-level logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClickHouse:&lt;/strong&gt; Secondary analytical store. Optimized for fast aggregates, filtering, time-series analysis, and dashboards.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common patterns for data syncing:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sync via Debezium / Kafka&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Periodic ETL using Airflow or DBT&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Event-based architecture using CDC (Change Data Capture)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. Normalize in PostgreSQL, Denormalize in ClickHouse
&lt;/h2&gt;

&lt;p&gt;PostgreSQL loves third normal form. ClickHouse doesn’t.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Join performance&lt;/td&gt;
&lt;td&gt;Efficient with indexes&lt;/td&gt;
&lt;td&gt;Costly, especially over large tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Normalization&lt;/td&gt;
&lt;td&gt;Encouraged (FKs, constraints)&lt;/td&gt;
&lt;td&gt;Discouraged (flatten your data)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write latency&lt;/td&gt;
&lt;td&gt;ACID-compliant, slower but reliable&lt;/td&gt;
&lt;td&gt;Fast inserts, optimized for batches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analytics&lt;/td&gt;
&lt;td&gt;Slow on large joins&lt;/td&gt;
&lt;td&gt;Optimized for OLAP queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Best Practice:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Model your source data in normalized form in Postgres. When syncing to ClickHouse, flatten your facts and materialize your dimensions.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Watch for Type Compatibility (JSON, UUID, Timestamps)
&lt;/h2&gt;

&lt;p&gt;Some Postgres types don’t map cleanly to ClickHouse. Here are common gotchas:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PostgreSQL Type&lt;/th&gt;
&lt;th&gt;ClickHouse Equivalent&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;UUID&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;String&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No native UUID support, stringify it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;JSONB&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;String&lt;/code&gt; or &lt;code&gt;Nested&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Consider flattening or casting to string&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TIMESTAMPTZ&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;DateTime64&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ensure timezones are handled correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NUMERIC&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Decimal(18,4)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Match precision explicitly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Use a schema registry or intermediate layer (like &lt;strong&gt;DBT&lt;/strong&gt; or &lt;strong&gt;protobuf&lt;/strong&gt;) to enforce compatibility across both systems.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. Optimize Time-Based Partitioning for ClickHouse
&lt;/h2&gt;

&lt;p&gt;ClickHouse thrives when data is &lt;strong&gt;partitioned&lt;/strong&gt; and &lt;strong&gt;sorted&lt;/strong&gt; effectively.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;created_at&lt;/code&gt; or &lt;code&gt;updated_at&lt;/code&gt; for tracking changes.&lt;/li&gt;
&lt;li&gt;Use indexes on frequently filtered fields.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ClickHouse:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Partition by &lt;code&gt;toYYYYMM(created_at)&lt;/code&gt; or &lt;code&gt;toYYYYMMDD()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Sort key: &lt;code&gt;(user_id, created_at)&lt;/code&gt; for &lt;strong&gt;segment elimination&lt;/strong&gt; (faster filtering).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  4. Avoid Foreign Keys in ClickHouse
&lt;/h2&gt;

&lt;p&gt;ClickHouse does not support foreign key constraints. This means you need to &lt;strong&gt;flatten joins ahead of time&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Example:
&lt;/h3&gt;

&lt;p&gt;PostgreSQL Schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse Flattened Table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;orders_flat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_email&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_name&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="nb"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;DateTime64&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ETL pipelines&lt;/strong&gt; should enrich the data before writing to ClickHouse.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Model for Read Patterns, Not Write Patterns
&lt;/h2&gt;

&lt;p&gt;ClickHouse thrives on &lt;strong&gt;append-only, query-optimized data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'abc123'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pre-aggregate it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;daily_event_counts&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;daily_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query from &lt;code&gt;daily_event_counts&lt;/code&gt; instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Be Cautious with Schema Evolution
&lt;/h2&gt;

&lt;p&gt;PostgreSQL handles schema changes gracefully. ClickHouse… doesn’t (yet).&lt;/p&gt;

&lt;h3&gt;
  
  
  Tips:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Avoid adding columns frequently in ClickHouse.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use wide-table design (predefine a large schema if possible).&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer additive changes (append-only, soft deletes).&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To manage &lt;strong&gt;schema drift&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use tools like &lt;strong&gt;DBT, Airbyte, or LakeSoul&lt;/strong&gt; with &lt;strong&gt;versioned schemas&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Log schema changes and sync them across systems in CI/CD.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  7. Use Separate ETL Pipelines for OLTP and OLAP
&lt;/h2&gt;

&lt;p&gt;Instead of writing the same data model into both systems directly, maintain &lt;strong&gt;two pipelines&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OLTP write → PostgreSQL&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OLTP sync → Enriched + transformed → ClickHouse&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This decouples constraints, allows async processing, and optimizes each layer for its strength.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR: Unified Modeling Principles
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Principle&lt;/th&gt;
&lt;th&gt;PostgreSQL (OLTP)&lt;/th&gt;
&lt;th&gt;ClickHouse (OLAP)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normalization&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Joins&lt;/td&gt;
&lt;td&gt;✅ Fast&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Constraints&lt;/td&gt;
&lt;td&gt;✅ Enforced&lt;/td&gt;
&lt;td&gt;❌ Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Types&lt;/td&gt;
&lt;td&gt;✅ Strong, varied&lt;/td&gt;
&lt;td&gt;❌ Simpler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writes&lt;/td&gt;
&lt;td&gt;✅ Row-based&lt;/td&gt;
&lt;td&gt;❌ Batch-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries&lt;/td&gt;
&lt;td&gt;✅ Indexed row access&lt;/td&gt;
&lt;td&gt;❌ Columnar, vectorized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evolution&lt;/td&gt;
&lt;td&gt;✅ Flexible&lt;/td&gt;
&lt;td&gt;❌ Careful planning needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Designing data models across PostgreSQL and ClickHouse isn’t about picking one approach—it’s about understanding what each engine excels at and designing &lt;strong&gt;your sync + transformations accordingly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Get this right, and you’ll enjoy the &lt;strong&gt;flexibility of Postgres&lt;/strong&gt; with the &lt;strong&gt;speed and scalability of ClickHouse&lt;/strong&gt;—without constantly fixing pipelines or rewriting queries.&lt;/p&gt;




&lt;h3&gt;
  
  
  Coming Soon:
&lt;/h3&gt;

&lt;p&gt;💡 &lt;strong&gt;In Part 2, we’ll build a real-world example syncing a normalized Postgres schema into a denormalized ClickHouse table using Airbyte + DBT. Stay tuned!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>clickhouse</category>
      <category>data</category>
    </item>
  </channel>
</rss>
