<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shahab Ranjbary</title>
    <description>The latest articles on DEV Community by Shahab Ranjbary (@ranjbaryshahab).</description>
    <link>https://dev.to/ranjbaryshahab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1155586%2F4422fdb3-3fee-44cd-8529-28bc0a423cfb.jpeg</url>
      <title>DEV Community: Shahab Ranjbary</title>
      <link>https://dev.to/ranjbaryshahab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ranjbaryshahab"/>
    <language>en</language>
    <item>
      <title>🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Thu, 30 Oct 2025 10:10:28 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/clickhouse-left-joins-why-joinusenulls-matters-1h9f</link>
      <guid>https://dev.to/ranjbaryshahab/clickhouse-left-joins-why-joinusenulls-matters-1h9f</guid>
      <description>&lt;h2&gt;
  
  
  🧠 Understanding &lt;code&gt;join_use_nulls&lt;/code&gt; in ClickHouse
&lt;/h2&gt;

&lt;p&gt;ClickHouse is famous for being blazing fast — but sometimes its SQL semantics can surprise you, especially around &lt;code&gt;JOIN&lt;/code&gt;s.&lt;/p&gt;

&lt;p&gt;Here’s a simple example that shows how the &lt;code&gt;join_use_nulls&lt;/code&gt; setting can completely change your results.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧩 The Setup
&lt;/h3&gt;

&lt;p&gt;Let’s create two tiny tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;`val`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TinyLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;`val`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TinyLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val2&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;),(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ve got:&lt;/p&gt;

&lt;p&gt;id_val: three rows with IDs 1, 2, and 3&lt;br&gt;
id_val2: three rows, but ID 2 is missing and ID 1 appears twice&lt;/p&gt;
&lt;h3&gt;
  
  
  🧮 The Default JOIN
&lt;/h3&gt;

&lt;p&gt;Let’s run a LEFT JOIN:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val2&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;┌─id─┬─val─┬─val_1─┐
│  1 │  11 │   21  │
│  1 │  11 │   22  │
│  2 │  12 │    0  │
│  3 │  13 │   23  │
└────┴─────┴───────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice something interesting?&lt;br&gt;
When there’s no matching row for id=2, ClickHouse fills it with a default value — in this case, 0.&lt;/p&gt;

&lt;p&gt;For numeric columns, the default is 0.&lt;br&gt;
For strings, it would be an empty string ''.&lt;/p&gt;
&lt;h3&gt;
  
  
  🧩 Enter join_use_nulls
&lt;/h3&gt;

&lt;p&gt;If you want standard SQL behavior, where missing values become NULL, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_val2&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;join_use_nulls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;┌─id─┬─val─┬─val_1─┐
│  1 │  11 │   21  │
│  1 │  11 │   22  │
│  2 │  12 │  NULL │
│  3 │  13 │   23  │
└────┴─────┴───────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s closer to what most SQL engines (like PostgreSQL or MySQL) would do.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ Why This Matters
&lt;/h3&gt;

&lt;p&gt;This setting affects how ClickHouse represents non-matching rows in joins — which can have downstream effects on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Aggregations&lt;br&gt;
SELECT count(val_1) FROM ...&lt;br&gt;
will count 0s differently than NULLs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exports &amp;amp; BI tools&lt;br&gt;
Some tools treat 0 as a valid number, but NULL as missing data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data semantics&lt;br&gt;
A missing match isn’t always “zero” — it might mean “unknown”.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🚀 Performance Note
&lt;/h3&gt;

&lt;p&gt;The default behavior (without join_use_nulls) is slightly faster and more memory-efficient because ClickHouse doesn’t have to track NULL bitmaps.&lt;/p&gt;

&lt;p&gt;If you’re doing high-volume analytical joins and don’t care about SQL-standard nulls, keep it off.&lt;br&gt;
If you need correctness and consistency with other systems, turn it on.&lt;/p&gt;
&lt;h3&gt;
  
  
  🔍 Check Your Settings
&lt;/h3&gt;

&lt;p&gt;You can check the current value of this setting in your ClickHouse session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%join_use_nulls%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or set it globally in your configuration file or session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET join_use_nulls = 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  📚 Reference
&lt;/h3&gt;

&lt;p&gt;For more detail on ClickHouse’s Join engine and its settings (including join_use_nulls), see: &lt;a href="https://clickhouse.com/docs/engines/table-engines/special/join" rel="noopener noreferrer"&gt;ClickHouse docs — Join table engine&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🗨️ Discussion
&lt;/h3&gt;

&lt;p&gt;Have you ever been surprised by ClickHouse JOIN behavior?&lt;br&gt;
Do you leave join_use_nulls off for performance, or always turn it on for clarity?&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>join</category>
    </item>
    <item>
      <title>🛠️ Copying and Moving Data Between Tables in ClickHouse (with Different Engines)</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Thu, 17 Apr 2025 00:27:58 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/copying-and-moving-data-between-tables-in-clickhouse-with-different-engines-3kj0</link>
      <guid>https://dev.to/ranjbaryshahab/copying-and-moving-data-between-tables-in-clickhouse-with-different-engines-3kj0</guid>
      <description>&lt;p&gt;Sometimes in ClickHouse, you need to &lt;strong&gt;move or copy data between tables&lt;/strong&gt; — especially when you're changing the &lt;strong&gt;table engine&lt;/strong&gt;, for example from &lt;code&gt;MergeTree&lt;/code&gt; to &lt;code&gt;ReplicatedMergeTree&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;At first glance, it may seem like &lt;a href="https://clickhouse.com/docs/sql-reference/statements/alter/partition#move-partition-to-table" rel="noopener noreferrer"&gt;&lt;code&gt;MOVE PARTITION TO TABLE&lt;/code&gt;&lt;/a&gt; should do the trick — but in many cases, &lt;strong&gt;it doesn't work&lt;/strong&gt; due to strict requirements. Let’s go over why, and a better approach we used.&lt;/p&gt;

&lt;h2&gt;
  
  
  ❌ Why &lt;code&gt;MOVE PARTITION TO TABLE&lt;/code&gt; Didn’t Work for Us
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;ALTER TABLE source MOVE PARTITION TO TABLE dest&lt;/code&gt; command physically moves partitions &lt;strong&gt;and deletes them from the source table&lt;/strong&gt;. But it comes with strict requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Same &lt;strong&gt;structure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Same &lt;strong&gt;partition key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Same &lt;strong&gt;primary key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Same &lt;strong&gt;order by&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Same &lt;strong&gt;storage policy&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Same &lt;strong&gt;engine family&lt;/strong&gt; (e.g., both must be &lt;code&gt;MergeTree&lt;/code&gt; or both &lt;code&gt;ReplicatedMergeTree&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;✅ Same or superset of &lt;strong&gt;indices and projections&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when you’re trying to &lt;strong&gt;move data from a &lt;code&gt;MergeTree&lt;/code&gt; table to a &lt;code&gt;ReplicatedMergeTree&lt;/code&gt; table&lt;/strong&gt;, this query &lt;strong&gt;will fail&lt;/strong&gt; because the engines are different.&lt;/p&gt;

&lt;h2&gt;
  
  
  ✅ Alternative: &lt;code&gt;ATTACH PARTITION FROM&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Instead, we used this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;new_table&lt;/span&gt; &lt;span class="n"&gt;ATTACH&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;partition_expr&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;old_table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;strong&gt;copies the data&lt;/strong&gt; (without deleting it from the source) and &lt;strong&gt;supports different engine types&lt;/strong&gt;, which makes it ideal for our use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Requirements:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Same &lt;strong&gt;structure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Same &lt;strong&gt;partition key&lt;/strong&gt;, &lt;strong&gt;order by&lt;/strong&gt;, and &lt;strong&gt;primary key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Same &lt;strong&gt;storage policy&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Indices/projections must &lt;strong&gt;match&lt;/strong&gt;
(or be a &lt;strong&gt;superset&lt;/strong&gt; if &lt;code&gt;enforce_index_structure_match_on_partition_manipulation&lt;/code&gt; is disabled)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s a &lt;strong&gt;safe and clean&lt;/strong&gt; way to transfer data between tables — even when they use different engines.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧹 Final Step: Drop the Old Partition
&lt;/h3&gt;

&lt;p&gt;Once the data is successfully attached and validated, we clean up the original table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;old_table&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;partition_expr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;strong&gt;removes the partition&lt;/strong&gt; from the old table (and on all replicas, if any).&lt;br&gt;&lt;br&gt;
Data is &lt;strong&gt;physically deleted after ~10 minutes&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  🧾 What About &lt;code&gt;ATTACH TABLE AS REPLICATED&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;ClickHouse also supports converting a table like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;DETACH&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;my_table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;ATTACH&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;my_table&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;REPLICATED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="n"&gt;RESTORE&lt;/span&gt; &lt;span class="n"&gt;REPLICA&lt;/span&gt; &lt;span class="n"&gt;my_table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But &lt;strong&gt;we chose not to use this option&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Our goal wasn’t to convert the existing table but to &lt;strong&gt;control the replication and migration process manually&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So while &lt;code&gt;ATTACH TABLE AS REPLICATED&lt;/code&gt; is &lt;strong&gt;valid and useful&lt;/strong&gt; in some scenarios, it &lt;strong&gt;wasn’t the right fit for us&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧪 Summary: Best Way to Copy Between Tables with Different Engines
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🔧 Create a &lt;strong&gt;new table&lt;/strong&gt; with the same schema, keys, and partitioning.&lt;/li&gt;
&lt;li&gt;📥 Use &lt;code&gt;ATTACH PARTITION FROM&lt;/code&gt; to &lt;strong&gt;copy data&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Validate&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;🗑️ &lt;strong&gt;Drop&lt;/strong&gt; the original partition if needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach is &lt;strong&gt;clean&lt;/strong&gt;, &lt;strong&gt;safe&lt;/strong&gt;, and works even when the engines are different.&lt;/p&gt;

&lt;h3&gt;
  
  
  📚 Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://clickhouse.com/docs/sql-reference/statements/alter/partition" rel="noopener noreferrer"&gt;Manipulating Partitions and Parts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://clickhouse.com/docs/sql-reference/statements/attach#attach-mergetree-table-as-replicatedmergetree" rel="noopener noreferrer"&gt;ClickHouse Attach MergeTree table as ReplicatedMergeTree&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;🧡 Thanks for reading!&lt;br&gt;
If you’ve used other approaches or faced gotchas while doing &lt;strong&gt;engine migrations in ClickHouse&lt;/strong&gt;, let’s discuss in the comments!&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>dataengineering</category>
      <category>sql</category>
      <category>database</category>
    </item>
    <item>
      <title>Choosing the Right Real-Time Stream Processing Framework</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Sun, 08 Sep 2024 23:08:53 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/comparing-top-real-time-stream-processing-frameworks-ek8</link>
      <guid>https://dev.to/ranjbaryshahab/comparing-top-real-time-stream-processing-frameworks-ek8</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the dynamic realm of real-time stream processing, tools like ksqlDB, Apache Spark, Apache Flink, Apache Storm, Timeplus Proton, and RisingWave each offer distinct advantages tailored to diverse use cases and requirements. This article delves into these leading stream processing frameworks' unique capabilities, use cases, and pros and cons.&lt;/p&gt;

&lt;h3&gt;
  
  
  ksqlDB
&lt;/h3&gt;

&lt;p&gt;ksqlDB simplifies real-time stream processing by providing a SQL-based layer over Apache Kafka Streams, ensuring seamless integration within Kafka environments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz0upe3hsnxaawn8pb3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz0upe3hsnxaawn8pb3i.png" alt="ksql" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Materialized Cache: Build and serve incrementally updated stateful views. Useful for applications requiring up-to-date information without continuous database queries.&lt;/li&gt;
&lt;li&gt;Streaming ETL Pipeline: Manipulate in-flight data to connect arbitrary sources and sinks. Facilitates seamless real-time data integration across systems.&lt;/li&gt;
&lt;li&gt;Event-Driven Microservices: Trigger changes based on observed event patterns in a stream. Ideal for real-time notifications, monitoring, and automated decision-making systems.&lt;/li&gt;
&lt;li&gt;Stream Processing Cookbook: Use predefined recipes to quickly implement common streaming data tasks, which benefits beginners and experienced users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Strong Kafka Integration: Built-in integration with Apache Kafka for streamlined event-streaming applications.&lt;/li&gt;
&lt;li&gt;Low Latency: Delivers latency as low as ten milliseconds, ideal for real-time analytics.&lt;/li&gt;
&lt;li&gt;SQL Interface: Familiar SQL interface for easy adoption.&lt;/li&gt;
&lt;li&gt;Fault Tolerance: Built-in fault tolerance mechanisms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Limited Analytics Capability: Lags behind competitors like Flink and Spark in advanced analytics tools.&lt;/li&gt;
&lt;li&gt;Learning Curve: Slightly steeper learning curve compared to simpler frameworks.&lt;/li&gt;
&lt;li&gt;ETL Limitations: Not ideal for complex ETL operations.
Checkpointing Challenges: Advanced checkpointing mechanisms are lacking, impacting exact-once processing in complex scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Apache Spark (Structured Streaming)
&lt;/h3&gt;

&lt;p&gt;Apache Spark offers scalability and flexibility through its Structured Streaming component, supporting batch and stream processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7d7yuq57mi6fcvi06oa9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7d7yuq57mi6fcvi06oa9.jpg" alt="Spark Structured Streaming" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Streaming ETL: Continuously clean and aggregate data before pushing it into data repositories, simplifying conventional ETL processes.&lt;/li&gt;
&lt;li&gt;Data Enrichment: Combine live and historical data for real-time analysis, enhancing data quality for personalized marketing efforts.&lt;/li&gt;
&lt;li&gt;Trigger Event Detection: Detect and respond to unusual patterns or events in real-time, such as fraudulent transactions or critical health alerts.&lt;/li&gt;
&lt;li&gt;Complex Session Analysis: Analyze user activities in real-time, continuously updating machine learning models. Used by companies like Netflix for real-time show recommendations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Language Support: Supports Scala, Java, Python, R, C#, F#.&lt;/li&gt;
&lt;li&gt;Community Support: Extensive community and ecosystem support.&lt;/li&gt;
&lt;li&gt;Exact-Once Processing: Ensures exactly-once processing semantics.&lt;/li&gt;
&lt;li&gt;Versatility: Integrates well with diverse data sources beyond Kafka.&lt;/li&gt;
&lt;li&gt;SQL Capabilities: Supports SQL for querying and processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Micro-Batch Processing: Operates on micro-batch processing, potentially limiting ultra-low latency applications.&lt;/li&gt;
&lt;li&gt;High Memory Consumption: In-memory processing architecture can lead to high memory usage.&lt;/li&gt;
&lt;li&gt;Complexity: Deployment and management complexity, particularly in large-scale environments.&lt;/li&gt;
&lt;li&gt;Integration Challenges: Direct SQL querying on Kafka topics is not straightforward.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Apache Flink
&lt;/h3&gt;

&lt;p&gt;Apache Flink excels in low-latency, high-throughput stream processing with unified APIs for both batch and stream processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F20b7tlm8hdij5oyszp2h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F20b7tlm8hdij5oyszp2h.png" alt="Apache Flink" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Event-Driven Applications: Real-time fraud detection, anomaly detection, rule-based alerting, business process monitoring, and handling real-time user interactions in social networks.&lt;/li&gt;
&lt;li&gt;Data Analytics Applications: Quality monitoring of telco networks, product updates analysis, ad-hoc analysis of live data, and large-scale graph analysis.&lt;/li&gt;
&lt;li&gt;Data Pipeline Applications: Real-time search index building and continuous ETL in e-commerce environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Low Latency: Minimal latency and high throughput suitable for demanding real-time applications.&lt;/li&gt;
&lt;li&gt;Unified API: Simplifies development with a unified API for batch and stream processing.&lt;/li&gt;
&lt;li&gt;Advanced Analytics: Offers advanced analytical functions and robust state management.&lt;/li&gt;
&lt;li&gt;Exactly-Once Processing: Supports exactly-once processing semantics.&lt;/li&gt;
&lt;li&gt;Windowing Support: Flexible windowing capabilities based on both time and count.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Integration Complexity: Integration with other big data ecosystems like Hadoop/Hive can be challenging.&lt;/li&gt;
&lt;li&gt;Limited Language Support: Supports Java, Scala, Python, and SQL, limiting language flexibility.&lt;/li&gt;
&lt;li&gt;Community Size: Smaller community compared to Apache Spark, potentially affecting support availability.&lt;/li&gt;
&lt;li&gt;Learning Curve: Requires some learning effort, especially for complex use cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Timeplus Proton
&lt;/h3&gt;

&lt;p&gt;Timeplus Proton leverages ClickHouse for efficient, high-speed streaming SQL processing with an Apache 2.0 open-source license.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxey7gqj7zaqrg5tulzuf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxey7gqj7zaqrg5tulzuf.png" alt="Timeplus Proton" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;FinTech: Real-time post-trade analytics and real-time pricing for financial instruments.&lt;/li&gt;
&lt;li&gt;DevOps: Real-time observability, metering for usage-based pricing, and real-time GitHub insights.&lt;/li&gt;
&lt;li&gt;Security Compliance: SOC2 compliance dashboards and alerts, container vulnerability monitoring, monitoring superblocks user activities, and protecting sensitive information in Slack.&lt;/li&gt;
&lt;li&gt;IoT: Real-time fleet monitoring and real-time sensor data from smartphones.&lt;/li&gt;
&lt;li&gt;Video Streaming: Analyzing Livepeer video engagement metrics.&lt;/li&gt;
&lt;li&gt;Customer 360: Auth0 notifications for new signups, HubSpot custom dashboards/alerts, Jitsu clickstream analysis, and real-time Twitter marketing.&lt;/li&gt;
&lt;li&gt;Misc: Wildfire monitoring and alerting, and data-driven parenting tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;High Performance: Utilizes ClickHouse for efficient data processing and storage.&lt;/li&gt;
&lt;li&gt;Low Latency: Designed for minimal latency and efficient resource usage.&lt;/li&gt;
&lt;li&gt;Flexibility: Supports various data sources beyond Kafka.&lt;/li&gt;
&lt;li&gt;Open Source: Apache 2.0 licensed, offering flexibility and community support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Limited Kafka Integration: Not deeply integrated with Kafka-specific features compared to other solutions.&lt;/li&gt;
&lt;li&gt;Use Case Specific: More suited for specific use cases rather than general-purpose stream processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RisingWave
&lt;/h3&gt;

&lt;p&gt;RisingWave, built in Rust, emphasizes efficient I/O handling and lightweight scaling with a custom storage engine for SQL-based stream processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furyl9yj9mw5dpvc0geqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furyl9yj9mw5dpvc0geqs.png" alt="RisingWave" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Streaming ETL: Continuous data transformation and integration.&lt;/li&gt;
&lt;li&gt;Real-Time Analytics: On-the-fly data analysis and reporting.&lt;/li&gt;
&lt;li&gt;Event-Driven Applications: Real-time event processing and decision-making.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Efficient I/O Operations: Optimized for high-performance I/O handling.&lt;/li&gt;
&lt;li&gt;LSM-Tree Based Engine: Efficient data management with an LSM-Tree-based storage engine.&lt;/li&gt;
&lt;li&gt;SQL Support: Provides SQL-based stream processing with materialized views.&lt;/li&gt;
&lt;li&gt;Global Checkpoints: Ensures consistent results with frequent global checkpoints.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Limited Ecosystem and Community Support: As a newer framework, it may have limited ecosystem integrations and community support.&lt;/li&gt;
&lt;li&gt;Scalability Considerations: Scalability challenges may arise as it grows and scales out.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Apache Storm
&lt;/h3&gt;

&lt;p&gt;Apache Storm is an open-source distributed stream processing engine. Mainly written in Java and Clojure, it gained popularity after being acquired by Twitter in 2011. Storm can be considered a pioneer in real-time distributed stream processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2aupev3behc3q2fp1eyx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2aupev3behc3q2fp1eyx.png" alt="Apache Storm" width="460" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Real-Time Analytics: Used by companies like Spotify for real-time monitoring and analytics.&lt;/li&gt;
&lt;li&gt;Online Machine Learning: Suitable for continuous computation and distributed machine learning tasks.&lt;/li&gt;
&lt;li&gt;ETL Processes: Handles complex ETL tasks efficiently.&lt;/li&gt;
&lt;li&gt;Distributed RPC: Enables distributed remote procedure calls for real-time applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Real-Time Processing: Designed for low-latency, high-throughput applications.&lt;/li&gt;
&lt;li&gt;Flexibility: Can be used with any programming language and integrates well with various data sources.&lt;/li&gt;
&lt;li&gt;Scalability: Capable of processing over a million tuples per second per node.&lt;/li&gt;
&lt;li&gt;Fault Tolerance: Ensures reliable processing with built-in fault tolerance mechanisms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Complexity: Requires careful setup and configuration; can be tricky to install and integrate with other technologies.&lt;/li&gt;
&lt;li&gt;Limited Windowing Support: Lacks robust support for event-based time management and windowing.&lt;/li&gt;
&lt;li&gt;API Limitations: Does not provide high-level APIs like SQL natively, making it harder for new developers to adopt.&lt;/li&gt;
&lt;li&gt;State Management: Stateless by default, which can complicate fault tolerance and recovery mechanisms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Choosing the right stream processing framework depends on specific requirements such as latency, scalability, integration needs, and operational complexity. Each tool offers unique strengths and trade-offs, making it crucial to align them with project goals and constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ksqlDB&lt;/strong&gt; excels in Kafka-centric environments but may not meet advanced analytics needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Spark&lt;/strong&gt; (Structured Streaming) provides extensive community support and exact-once processing but operates on micro-batch processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Flink&lt;/strong&gt; is optimal for low-latency, high-analytics environments with a unified API but requires careful integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timeplus Proton&lt;/strong&gt; offers high performance and flexibility with minimal latency but lacks deep Kafka integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RisingWave&lt;/strong&gt; focuses on efficient I/O handling and SQL-based processing, suitable for specialized applications but with scalability considerations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Storm&lt;/strong&gt; provides robust real-time processing capabilities but comes with a steeper learning curve and limited high-level API support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Additional Insights from Streaming Databases
&lt;/h3&gt;

&lt;p&gt;In today’s data-driven landscape, streaming databases like RisingWave and ksqlDB stand out for their ability to handle high-speed, real-time data ingestion and processing. Unlike stream processing engines focused solely on real-time data manipulation, streaming databases integrate storage capabilities, enabling seamless real-time querying and analytics. This unique combination positions them as pivotal tools for organizations aiming to harness real-time insights and maintain a competitive advantage in a fast-paced digital environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://medium.com/@RisingWave_Engineering/streaming-databases-everything-you-wanted-to-know-5229625708ce" rel="noopener noreferrer"&gt;Streaming Databases: Everything You Wanted to Know&lt;br&gt;
&lt;/a&gt;&lt;a href="https://www.timeplus.com/timeplus-vs-ksqldb" rel="noopener noreferrer"&gt;Timeplus vs. ksqlDB&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.timeplus.com/post/proton-ksqldb-alternative" rel="noopener noreferrer"&gt;Proton: An Open-Source Alternative to ksqlDB for Streaming Processing&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.confluent.io/blog/ksql-streaming-sql-for-apache-kafka" rel="noopener noreferrer"&gt;Introducing KSQL: Streaming SQL for Apache Kafka&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/@RisingWave_Engineering/choosing-a-stream-processing-system-this-article-has-you-covered-df046e71d862" rel="noopener noreferrer"&gt;Choosing a Stream Processing System?&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.confluent.io/blog/kafka-streams-vs-ksqldb-compared/" rel="noopener noreferrer"&gt;Kafka Streams and ksqlDB Compared – How to Choose&lt;br&gt;
&lt;/a&gt;&lt;a href="https://medium.com/@byanalytixlabs/flink-vs-kafka-a-quick-guide-to-stream-processing-engines-b09dd0e6b8af" rel="noopener noreferrer"&gt;Flink vs. Kafka: A Quick Guide to Stream Processing Engines&lt;/a&gt;&lt;br&gt;
&lt;a href="https://redpanda-data.medium.com/comparing-ksqldb-spark-sql-and-flink-sql-b4e495dc34fd" rel="noopener noreferrer"&gt;Comparing ksqlDB, Spark SQL, and Flink SQL&lt;/a&gt;&lt;br&gt;
&lt;a href="https://risingwave.com/risingwave-vs-ksqldb/" rel="noopener noreferrer"&gt;RisingWave vs ksqlDB&lt;/a&gt;&lt;br&gt;
&lt;a href="https://aws.amazon.com/compare/the-difference-between-kafka-and-spark/" rel="noopener noreferrer"&gt;What’s the Difference Between Kafka and Spark?&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.jesse-anderson.com/2019/10/why-i-recommend-my-clients-not-use-ksql-and-kafka-streams/" rel="noopener noreferrer"&gt;Why I Recommend My Clients NOT Use KSQL and Kafka Streams&lt;/a&gt;&lt;a href="https://www.projectpro.io/compare/apache-flink-vs-apache-storm" rel="noopener noreferrer"&gt;Apache Flink vs Apache Storm: Which Tool is Better for Your Next Project?&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.designgurus.io/blog/kafka-streams-%20apache-flink-apache-storm" rel="noopener noreferrer"&gt;Kafka Streams vs. Apache Flink vs. Apache Storm: Stream Processing Showdown&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.confluent.io/blog/choosing-a-data-streaming-platform-and-stream-processing-engine/#:~:text=Ideally%2C%20when%20choosing%20a%20data,SQL%2C%20Python%2C%20and%20Java." rel="noopener noreferrer"&gt;Life Happens in Real Time, Not in Batches: Choosing a Data Streaming Platform and Stream Processing Engine&lt;/a&gt;&lt;/p&gt;

</description>
      <category>flink</category>
      <category>spark</category>
      <category>ksqldb</category>
      <category>timeplus</category>
    </item>
    <item>
      <title>Tracking User Account Changes in Real-Time: A Debezium and ClickHouse Implementation</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Mon, 15 Jul 2024 18:00:07 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/tracking-user-account-changes-in-real-time-a-debezium-and-clickhouse-implementation-350</link>
      <guid>https://dev.to/ranjbaryshahab/tracking-user-account-changes-in-real-time-a-debezium-and-clickhouse-implementation-350</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In modern e-commerce applications, tracking changes to critical data like user account types in real time is essential for business intelligence and regulatory compliance. Traditional databases often overwrite old values, making historical analysis challenging. Enter Change Data Capture (CDC), a technique that captures and stores every change to your data, enabling a comprehensive audit trail and real-time analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenario Overview
&lt;/h2&gt;

&lt;p&gt;Imagine you manage a PostgreSQL database for an e-commerce platform where user account types (Bronze, Silver, Gold) frequently change. However, PostgreSQL only retains the latest values, making it difficult to analyze past states of user accounts. To address this, we'll use Debezium and ClickHouse to track and store all account-type changes with timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Your Environment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Getting the project from Git
&lt;/h3&gt;

&lt;p&gt;To get started, clone the &lt;a href="https://github.com/ranjbaryshahab/postgres-cdc-clickhouse" rel="noopener noreferrer"&gt;postgres-cdc-clickhouse GitHub repository&lt;/a&gt; and launch the environment using Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ranjbaryshahab/postgres-cdc-clickhouse
&lt;span class="nb"&gt;cd &lt;/span&gt;postgres-cdc-clickhouse
docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating the PostgreSQL Table
&lt;/h3&gt;

&lt;p&gt;First, define the users' table in PostgreSQL to store user data and track changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;account_type&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;REPLICA&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;FULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then inserting data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Bronze'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Silver'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user3'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Gold'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the result is like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbpd0fqt9pzyzr82i4ie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbpd0fqt9pzyzr82i4ie.png" alt="Postgres Result" width="799" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up Debezium Connector
&lt;/h3&gt;

&lt;p&gt;Configure the Debezium PostgreSQL connector in Kafka Connect to capture changes from the users table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="s1"&gt;'http://localhost:8083/connectors'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-raw&lt;/span&gt; &lt;span class="s1"&gt;'{
    "name": "raw_data.shop-connector",
    "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "database.dbname": "postgres",
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "schema-changes.shop",
        "database.hostname": "postgres",
        "database.password": "postgres",
        "database.port": "5432",
        "database.server.name": "shop",
        "database.user": "postgres",
        "name": "raw_data.shop-connector",
        "plugin.name": "pgoutput",
        "table.include.list": "public.users",
        "tasks.max": "1",
        "topic.creation.default.cleanup.policy": "delete",
        "topic.creation.default.partitions": "1",
        "topic.creation.default.replication.factor": "1",
        "topic.creation.default.retention.ms": "604800000",
        "topic.creation.enable": "true",
        "topic.prefix": "raw_data.shop",
        "database.history.skip.unparseable.ddl": "true",
        "key.converter": " org.apache.kafka.connect.json.JsonConverter",
        "value.converter": " org.apache.kafka.connect.json.JsonConverter",
        "key.converter.schemas.enable": "false",
        "value.converter.schemas.enable": "false",
        "snapshot.mode": "initial"
    }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;see the result in Redpanda Console:&lt;br&gt;
Go to &lt;a href="http://localhost:9080/topics" rel="noopener noreferrer"&gt;http://localhost:9080/topics&lt;/a&gt; to see the topics created by the Debezium connector.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zou053xr2ywlf9rajrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zou053xr2ywlf9rajrf.png" alt="Redpanda Console Data" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Tracking Changes in Real-Time
&lt;/h3&gt;

&lt;p&gt;Now, let's simulate an update in the users' table to see the change reflected in our Kafka topics and ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Update user with ID 1 to change the account type to 'Gold'&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Gold'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this update is made, Debezium captures the change and produces an event in the Kafka topic. This event contains the previous and new values of the account_type field along with the operation type (u for update).&lt;/p&gt;

&lt;p&gt;see the result in Redpanda Console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjgy6frm9tvfgsrt296v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjgy6frm9tvfgsrt296v.png" alt="Redpanda Console Tracking Data" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Storing Changes in ClickHouse
&lt;/h3&gt;

&lt;p&gt;Utilize ClickHouse to store all changes in a dedicated table and materialized view:&lt;/p&gt;

&lt;h4&gt;
  
  
  Create ClickHouse Database and Table
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;shop&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;shop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_type_switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;before_account_type&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;after_account_type&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ReplacingMergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Set Up Kafka Engine Table in ClickHouse
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;kafka_shop&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;kafka_shop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kafka__account_type_switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`jsonString`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Kafka&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;kafka_broker_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'broker:29092'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;kafka_topic_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'raw_data.shop.public.users'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;kafka_group_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'raw_date_clickhouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;kafka_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'JSONAsString'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Create Materialized View
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;kafka_shop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consumer__account_type_switch&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;shop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_type_switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;before_account_type&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;after_account_type&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'after'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(UInt32)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'after'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'username'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'before'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'account_type'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;before_account_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'after'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'account_type'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;after_account_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'after'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'updated_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(UInt64)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'after'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(UInt64)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;kafka_shop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kafka__account_type_switch&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;before_account_type&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;after_account_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JSONExtract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'op'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nullable(String)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'u'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Verify the data in ClickHouse
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sfiql9vdsib7v3i3r14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sfiql9vdsib7v3i3r14.png" alt="ClickHouse Result" width="800" height="48"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Implementing CDC with Debezium and ClickHouse enables your e-commerce platform to maintain a complete user account type change history. This setup empowers detailed auditing, regulatory compliance, and advanced analytics, providing valuable insights into user behavior and business trends.&lt;/p&gt;

&lt;p&gt;By following these steps, you can enhance your data management practices and leverage real-time change tracking to drive informed decision-making in your business.&lt;/p&gt;

</description>
      <category>debezium</category>
      <category>clickhouse</category>
      <category>kafka</category>
      <category>cdc</category>
    </item>
    <item>
      <title>Storing and Handling Confidential Data in ClickHouse</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Sat, 01 Jun 2024 03:42:02 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/storing-and-handling-confidential-data-in-clickhouse-2p04</link>
      <guid>https://dev.to/ranjbaryshahab/storing-and-handling-confidential-data-in-clickhouse-2p04</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Handling sensitive data securely is critical, especially when migrating from one database to another. In this article, we'll explore how to manage and store confidential data in ClickHouse, focusing on a scenario where we migrate data from MySQL daily. We will use encryption to ensure data security and implement views to control access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenario
&lt;/h2&gt;

&lt;p&gt;We have a MySQL table containing users' general information, including confidential fields like phone numbers and national codes. Our goal is to migrate this data to ClickHouse every day while ensuring the sensitive fields are encrypted. &lt;/p&gt;

&lt;h2&gt;
  
  
  MySQL Table Setup
&lt;/h2&gt;

&lt;p&gt;First, let's set up our MySQL database and table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;national_code&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Shahab'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ranjbary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0912345678'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'123456'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Sepehr'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ranjbary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0935999999'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'982345'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mostafa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Fekri'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'09192222222'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'3244442'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Saber'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Farshbaf'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0912333435'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'454554'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  ClickHouse Table Setup
&lt;/h2&gt;

&lt;p&gt;Next, we create a similar table in ClickHouse with an additional column for the initialization vector (IV) used in encryption:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;national_code&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;iv&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Data Migration and Encryption
&lt;/h2&gt;

&lt;p&gt;We'll use the encrypt function to insert data into ClickHouse. This function will encrypt the confidential fields using AES-256-GCM encryption. You can automate this process using Airflow or other scheduling tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;national_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;national_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'iv1'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;iv&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'mysql:3306'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'users'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'general_info'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'root'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'123456'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-Time Data Handling
&lt;/h2&gt;

&lt;p&gt;If you use Kafka, NATS, or another streaming service, you can use a materialized view in ClickHouse to encrypt data upon ingestion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info_mv&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;national_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'iv1'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;national_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'iv1'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;iv&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'kafka_broker:9092'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'topic'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'group'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'format'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Decryption and Secure Access
&lt;/h2&gt;

&lt;p&gt;For users who need access to decrypted data, we create a view that decrypts the confidential fields on the fly using the &lt;code&gt;tryDecrypt&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info_decryption&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tryDecrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tryDecrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tryDecrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tryDecrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'aes-256-gcm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;national_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'keykeykeykeykeykeykeykeykeykey01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;national_code&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Managing Permissions
&lt;/h2&gt;

&lt;p&gt;We can control data access by granting permissions selectively to users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Grant access to the main table for all users&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Grant access to the decrypted view only to specific users&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;general_info_decryption&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We can securely handle confidential data by leveraging ClickHouse's encryption functions and carefully managing user access. This approach ensures that sensitive information remains protected during storage and transit, while still being accessible to authorized users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Tips
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Key Management: Use a secure key management system (KMS) to store and rotate encryption keys. Avoid hardcoding keys in your scripts or configurations.&lt;/li&gt;
&lt;li&gt;Auditing: Implement logging and monitoring to audit access to encrypted and decrypted data.&lt;/li&gt;
&lt;li&gt;Performance: Be mindful of the performance implications of encryption and decryption. Test your setup to ensure it meets your performance requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By following these best practices, you can enhance the security and compliance of your data handling processes in ClickHouse.&lt;/p&gt;

&lt;p&gt;For more information on ClickHouse's encryption functions and best practices, refer to the &lt;a href="https://clickhouse.com/docs/en/sql-reference/functions/encryption-functions" rel="noopener noreferrer"&gt;ClickHouse documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>encryption</category>
      <category>kafka</category>
      <category>decryption</category>
    </item>
    <item>
      <title>Universal Data Migration: Using Slingdata to Transfer Data Between Databases</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Fri, 24 May 2024 12:35:51 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/universal-data-migration-using-slingdata-to-transfer-data-between-databases-161l</link>
      <guid>https://dev.to/ranjbaryshahab/universal-data-migration-using-slingdata-to-transfer-data-between-databases-161l</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Data management can be complex, especially when dealing with high-volume data pipelines. Whether you're moving data between databases, from file systems to databases, or vice versa, the Extract &amp;amp; Load (EL) approach can streamline these processes. Enter &lt;strong&gt;&lt;a href="https://docs.slingdata.io" rel="noopener noreferrer"&gt;Slingdata&lt;/a&gt;&lt;/strong&gt;, a powerful, free CLI tool built with Go that offers a straightforward solution for creating and maintaining robust data pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Slingdata?
&lt;/h2&gt;

&lt;p&gt;Slingdata is a passion project turned into a practical tool that simplifies data movement across various systems. Its key focus is on the Extract &amp;amp; Load (EL) process, enabling efficient data transfer between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database to Database&lt;/li&gt;
&lt;li&gt;File System to Database&lt;/li&gt;
&lt;li&gt;Database to File System&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Features of Slingdata
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single Binary deployment (built with Go). See the installation page.&lt;/li&gt;
&lt;li&gt;Use Custom SQL as a stream.&lt;/li&gt;
&lt;li&gt;Manage / View / Test / Discover your connections with the &lt;code&gt;sling conns&lt;/code&gt; sub-command.&lt;/li&gt;
&lt;li&gt;Use Environment Variables for connections.&lt;/li&gt;
&lt;li&gt;Provide YAML or JSON configurations (perfect for git version control).&lt;/li&gt;
&lt;li&gt;Powerful Replication logic, to replicate many tables with a wildcard (my_schema.*).&lt;/li&gt;
&lt;li&gt;Reads your existing DBT connections.&lt;/li&gt;
&lt;li&gt;Use your environment variable in your YAML / JSON config (&lt;code&gt;SELECT * from my_table where date = '{date}'&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Convenient Transformations, such as the flatten option, which auto-creates columns from your nested fields.&lt;/li&gt;
&lt;li&gt;Run Pre &amp;amp; Post SQL commands.&lt;/li&gt;
&lt;li&gt;And many more!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Sample EL Process: ClickHouse to PostgreSQL
&lt;/h2&gt;

&lt;p&gt;To showcase Slingdata’s capabilities, let's walk through a sample Extract &amp;amp; Load process moving data from ClickHouse to PostgreSQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://clickhouse.com" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt; installed and running.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.postgresql.org" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; installed and running.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Slingdata&lt;/a&gt; installed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Configure the Source Database
&lt;/h3&gt;

&lt;p&gt;Let's assume our source database is ClickHouse. We can export a new environment variable or add the connection to the &lt;code&gt;env.yaml&lt;/code&gt; of Slingdata files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MARKETING_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To test this connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;MARKETING_URL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF success!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will do the same for ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DATAWAREHOUSE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://sling_user:sling_pass@clickhouse:8123/marketing"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To test this connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;DATAWAREHOUSE_URL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF success!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's create a database and a table in ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt; &lt;span class="n"&gt;marketing&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`marketing`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;family&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in PostgreSQL, create a table with the same schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;family&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert data into ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;marketing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Shahab'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ranjbary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;marketing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Sepher'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ranjbary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;marketing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Mostafa'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Fekri'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;38&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, to move data to PostgreSQL, you can either create a replication file or run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;--src-conn&lt;/span&gt; DATAWAREHOUSE_URL &lt;span class="nt"&gt;--src-stream&lt;/span&gt; marketing.users &lt;span class="nt"&gt;--tgt-conn&lt;/span&gt; MARKETING_URL &lt;span class="nt"&gt;--tgt-object&lt;/span&gt; public.users
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Slingdata offers a powerful and flexible way to manage data pipelines with ease. Its extensive features, combined with simple configurations and robust performance, make it an excellent tool for anyone dealing with high-volume data transfers.&lt;/p&gt;

</description>
      <category>slingdata</category>
      <category>clickhouse</category>
      <category>postgres</category>
      <category>datamigration</category>
    </item>
    <item>
      <title>Simplified API Creation and Management: ClickHouse to APISIX Integration Without Code</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Wed, 22 May 2024 19:23:51 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/simplified-api-creation-and-management-clickhouse-to-apisix-integration-without-code-jli</link>
      <guid>https://dev.to/ranjbaryshahab/simplified-api-creation-and-management-clickhouse-to-apisix-integration-without-code-jli</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the world of data management and web services, creating and managing APIs can often be a complex and time-consuming task. However, with the right tools, this process can be significantly simplified. In this article, we will explore how to create APIs for fetching data from &lt;a href="https://clickhouse.com" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt; tables without writing any code and manage these APIs using &lt;a href="https://apisix.apache.org" rel="noopener noreferrer"&gt;APISIX&lt;/a&gt;. ClickHouse, a fast and open-source columnar database management system, provides an HTTP interface by default, enabling easy access to data. By integrating this with APISIX, an open-source API gateway, we can not only manage and log our APIs but also leverage a host of features provided by APISIX to enhance our API management capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ClickHouse and APISIX?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt; is renowned for its performance and efficiency in handling large volumes of data. Its built-in HTTP interface allows for straightforward data retrieval from tables, making it an excellent choice for building APIs. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;APISIX&lt;/strong&gt;, on the other hand, is a powerful API gateway that offers extensive features such as traffic management, logging, monitoring, and security, among others. Integrating ClickHouse with APISIX allows us to manage our APIs effectively without delving into the complexities of writing custom code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating APIs in ClickHouse
&lt;/h2&gt;

&lt;p&gt;ClickHouse’s &lt;a href="https://clickhouse.com/docs/en/interfaces/http" rel="noopener noreferrer"&gt;HTTP interface&lt;/a&gt; enables us to interact with the database using simple HTTP requests. This interface supports various operations such as querying data, inserting records, and managing tables. To fetch data from a ClickHouse table, you can use a basic HTTP GET request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;GET http://&amp;lt;clickhouse-server&amp;gt;:8123/?query&lt;span class="o"&gt;=&lt;/span&gt;SELECT+&lt;span class="k"&gt;*&lt;/span&gt;+FROM+&amp;lt;table_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simplicity makes ClickHouse an excellent candidate for API creation, as it eliminates the need for complex server-side logic. However, managing these APIs, especially in a production environment, requires additional capabilities such as authentication, rate limiting, and detailed logging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enhancing API Management with APISIX
&lt;/h2&gt;

&lt;p&gt;APISIX comes into play by providing a robust platform for managing our ClickHouse APIs. By routing ClickHouse's HTTP requests through APISIX, we gain access to a plethora of features that enhance API management:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Traffic Management&lt;/strong&gt;: Control and optimize API traffic using load balancing and rate limiting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Implement authentication and authorization mechanisms to secure your APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging and Monitoring&lt;/strong&gt;: Gain insights into API usage patterns and performance through comprehensive logging and monitoring tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin System&lt;/strong&gt;: Extend APISIX’s functionality using its rich plugin system, enabling custom behavior and integrations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up the Integration
&lt;/h2&gt;

&lt;p&gt;To run, follow these steps:&lt;/p&gt;

&lt;h3&gt;
  
  
  Clone the Repository
&lt;/h3&gt;

&lt;p&gt;First, clone the repository to your local machine. This repository contains the necessary configurations for setting up ClickHouse and APISIX.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ranjbaryshahab/clickhouse-apisix.git
&lt;span class="nb"&gt;cd &lt;/span&gt;clickhouse-apisix
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run Docker Compose
&lt;/h3&gt;

&lt;p&gt;Start the services using Docker Compose. This command will set up ClickHouse and APISIX using predefined configurations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connect to ClickHouse and Run the Following Commands
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Create table
&lt;/h3&gt;

&lt;p&gt;Create a table named users in the default database of ClickHouse. This table will store user data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`default`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;family&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Insert data into the table
&lt;/h3&gt;

&lt;p&gt;Insert sample data into the users table. This data will be used for testing our API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Shahab'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ranjbary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Sepehr'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ranjbary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'John'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create a Log Table for the API Calls
&lt;/h3&gt;

&lt;p&gt;Create a log table in the apisix_log database. This table will store logs of API calls for monitoring and analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt; &lt;span class="n"&gt;apisix_log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;apisix_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`@timestamp`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`host`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`client_ip`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`consumer_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`route_id`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`route_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`request`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`upstream_status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;UInt16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`upstream_response_time`&lt;/span&gt; &lt;span class="n"&gt;Float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`connection_time`&lt;/span&gt; &lt;span class="n"&gt;Float64&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="nv"&gt;`@timestamp`&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nv"&gt;`@timestamp`&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create Users and Grant Access
&lt;/h3&gt;

&lt;p&gt;Create users for APISIX and grant the necessary permissions to access the users table and insert logs into the apisix_log.users table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;apisix&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;plaintext_password&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'apisix'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;apisix&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;apisix_logger&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;plaintext_password&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'apisix_logger'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;apisix_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;apisix_logger&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a Consumer in APISIX
&lt;/h2&gt;

&lt;p&gt;Create a consumer in APISIX with basic authentication. This consumer will be used to authenticate API requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:9180/apisix/admin/consumers &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1'&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; PUT &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "username": "ds_team",
  "plugins": {
    "basic-auth": {
      "_meta": {
        "disable": true
      },
      "password": "12345",
      "username": "ds_team"
    }
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a Route in APISIX
&lt;/h2&gt;

&lt;p&gt;Create a route in APISIX to handle requests to the ClickHouse API. This route includes configurations for authentication, logging, and proxying requests to ClickHouse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:9180/apisix/admin/routes/1 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1'&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; PUT &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "uri": "/clickhouse-api*",
  "name": "clickhouse",
  "methods": [
    "GET"
  ],
  "plugins": {
    "basic-auth": {
      "disable": false
    },
    "clickhouse-logger": {
      "database": "apisix_log",
      "disable": false,
      "endpoint_addr": "http://clickhouse:8123",
      "log_format": {
        "@timestamp": "$time_iso8601",
        "client_ip": "$remote_addr",
        "connection_time": "$connection_time",
        "consumer_name": "$consumer_name",
        "host": "$host",
        "request": "$request",
        "route_name": "$route_name",
        "status": "$status",
        "upstream_response_time": "$upstream_response_time",
        "upstream_status": "$upstream_status"
      },
      "logtable": "users",
      "password": "apisix_logger",
      "user": "apisix_logger"
    },
    "proxy-rewrite": {
      "headers": {
        "Authorization": "",
        "X-ClickHouse-Format": "JSON",
        "X-ClickHouse-Key": "apisix",
        "X-ClickHouse-User": "apisix"
      },
      "uri": "/?query=select%20name,family,age%20from%20default.users%20where%20age%20=%20%7Bage:UInt8%7D"
    }
  },
  "upstream": {
    "nodes": [
      {
        "host": "clickhouse",
        "port": 8123,
        "weight": 1
      }
    ],
    "timeout": {
      "connect": 6,
      "send": 6,
      "read": 6
    },
    "type": "roundrobin",
    "scheme": "http",
    "pass_host": "pass",
    "keepalive_pool": {
      "idle_timeout": 60,
      "requests": 1000,
      "size": 320
    }
  },
  "status": 1
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Now We Can Test It
&lt;/h2&gt;

&lt;p&gt;Test the API by making a GET request to the ClickHouse API endpoint. This request fetches users with the age of 18.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nt"&gt;--request&lt;/span&gt; GET &lt;span class="s1"&gt;'http://localhost:9080/clickhouse-api/users?param_age=18'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Authorization: Basic ZHNfdGVhbToxMjM0NQ=='&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This API returns the users with the specified age. The result is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"String"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"family"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"String"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"UInt8"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sepehr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"family"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ranjbary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rows"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"statistics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"elapsed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.001827077&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"rows_read"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"bytes_read"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also check the log table for verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;apisix_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Integrating ClickHouse with APISIX allows you to create and manage APIs efficiently without writing any code. ClickHouse’s HTTP interface simplifies data retrieval, while APISIX enhances API management with features like traffic control, security, and logging. This setup is highly beneficial for production environments, where managing APIs with minimal code and maximum efficiency is crucial. By following the steps outlined in this guide, you can quickly set up and test APIs for your ClickHouse tables, ensuring robust and scalable API management.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>apisix</category>
      <category>apigateway</category>
      <category>datamanagement</category>
    </item>
    <item>
      <title>Improving Data Quality in ClickHouse Databases with Soda</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Fri, 19 Jan 2024 23:22:59 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/improving-data-quality-in-clickhouse-databases-with-soda-4kp4</link>
      <guid>https://dev.to/ranjbaryshahab/improving-data-quality-in-clickhouse-databases-with-soda-4kp4</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Data quality is the bedrock of any data-driven project, ensuring the integrity and reliability of information. &lt;a href="https://docs.soda.io" rel="noopener noreferrer"&gt;Soda&lt;/a&gt;, a versatile tool catering to Data Engineers, Data Scientists, and Data Analysts, empowers users to conduct data quality tests precisely when and where they need to. This guide merges theoretical insights with practical examples from the &lt;code&gt;soda-clickhouse&lt;/code&gt; project, showcasing how Soda can significantly enhance data quality in &lt;a href="https://clickhouse.com" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt; databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Soda Use Case Guides
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Testing Data in a Pipeline
&lt;/h3&gt;

&lt;p&gt;Integrate Soda seamlessly into your data pipeline to perform continuous data quality checks. Define checks using SodaCL, include scans at key pipeline stages, and monitor results for early issue detection, ensuring data quality at every step.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Testing Data Before Migration
&lt;/h3&gt;

&lt;p&gt;Prioritize data integrity before migration by configuring Soda for both source and target databases. Leverage Soda's reconciliation checks to verify data consistency and resolve any issues identified in pre-migration scans, ensuring a smooth transition.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Testing Data in CI/CD
&lt;/h3&gt;

&lt;p&gt;Incorporate Soda into your CI/CD pipeline for automated data quality checks. Configure alerts for failures, seamlessly integrate with GitHub Actions, and maintain reliable data throughout the development lifecycle, fostering a culture of quality in your data-driven processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Self-Serve Soda
&lt;/h3&gt;

&lt;p&gt;Empower your teams with self-serve capabilities using Soda. Facilitate easy access, encourage collaborative definition of quality checks, enable browser interface usage, and integrate with data catalogs for a holistic overview of dataset health, allowing teams to take ownership of their data quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Soda Cloud: Advanced Analytics and Collaboration
&lt;/h2&gt;

&lt;p&gt;When it comes to reviewing scan results and investigating issues, Soda Cloud takes your data quality management to the next level. Here's what you can do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Review Scan Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access visualized scan results not only in the command-line interface but also through the intuitive dashboard on Soda Cloud.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Set Alert Notifications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configure custom alert notifications to stay informed about any deviations or anomalies detected during scans. Proactively address potential data quality issues.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Track Trends Over Time:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gain insights into the trends of your data quality metrics over time. Track improvements or deviations and make informed decisions about your data pipeline.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Integration with External Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seamlessly integrate Soda Cloud with your existing messaging, ticketing, and data cataloging tools. Collaborate more effectively by connecting with platforms like Slack, Jira, and Atlan.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Soda Cloud not only enhances the visibility of your data quality but also facilitates collaboration and advanced analytics, making it a powerful companion for managing and improving your data quality standards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Samples: &lt;code&gt;soda-clickhouse&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Explore the &lt;a href="https://github.com/ranjbaryshahab/soda-clickhouse" rel="noopener noreferrer"&gt;soda-clickhouse&lt;/a&gt; project on GitHub, featuring two insightful samples:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sample_1: Data Quality in &lt;code&gt;dim_customer&lt;/code&gt; Table&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dive into practical examples of performing quality checks on the &lt;code&gt;dim_customer&lt;/code&gt; table within ClickHouse using Soda.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sample_2: Data Migration Validation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure data integrity during migration from a &lt;a href="https://www.mysql.com" rel="noopener noreferrer"&gt;MySQL&lt;/a&gt; to ClickHouse data source using reconciliation checks in Soda.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Implementing Soda for data quality assurance provides a comprehensive and effective approach. Whether you are testing data in a pipeline, before migration, in CI/CD, or enabling self-serve capabilities, Soda enhances the reliability and trustworthiness of your datasets. Explore the &lt;code&gt;soda-clickhouse&lt;/code&gt; project, experiment with the provided samples, and leverage Soda's capabilities to elevate your data quality standards and foster a data-driven culture. Elevate your data quality standards with Soda - making data reliability an integral part of your data journey.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>soda</category>
      <category>dataquality</category>
      <category>database</category>
    </item>
    <item>
      <title>Using Benthos: A Practical Guide for Kafka and PostgreSQL Integration</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Fri, 08 Dec 2023 23:12:09 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/using-benthos-a-practical-guide-for-kafka-and-postgresql-integration-2dln</link>
      <guid>https://dev.to/ranjbaryshahab/using-benthos-a-practical-guide-for-kafka-and-postgresql-integration-2dln</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://benthos.dev/" rel="noopener noreferrer"&gt;Benthos&lt;/a&gt; is a versatile and powerful stream processor that simplifies the task of handling real-time data streams. In this article, we'll delve into Benthos, its key features, and how it can be seamlessly integrated with Kafka and PostgreSQL to build robust data pipelines. To illustrate these concepts, we'll use a practical example: BenthosPipelineDB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BenthosPipelineDB&lt;/strong&gt; is a sample project showcasing the seamless integration of Benthos with Kafka and PostgreSQL. This project is designed to produce random data, publish it to a Kafka topic, and consume and insert it into a PostgreSQL database using Benthos. The project structure and configurations provide a clear understanding of how these technologies can be orchestrated to create a reliable and scalable data processing pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benthos Overview
&lt;/h3&gt;

&lt;p&gt;Benthos is a stream processor designed for real-time data handling. Its features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stream Processing&lt;/strong&gt;: Benthos allows you to process data streams with ease, making it a valuable tool for scenarios that require real-time data transformations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Connectivity&lt;/strong&gt;: With native support for various data sources and sinks, including Kafka and PostgreSQL, Benthos simplifies the integration process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extensibility&lt;/strong&gt;: Benthos supports a wide range of processors and plugins, providing flexibility in designing your data processing pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Kafka Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;Kafka&lt;/a&gt; is a distributed streaming platform that enables the building of real-time data pipelines. Key Kafka concepts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Topics&lt;/strong&gt;: Kafka organizes data streams into topics, allowing for efficient data segregation and distribution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Producers and Consumers&lt;/strong&gt;: Producers publish data to Kafka topics, while consumers subscribe to these topics, creating a scalable and fault-tolerant system.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  PostgreSQL Overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; is a powerful, open-source relational database system. It offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relational Model&lt;/strong&gt;: PostgreSQL follows a robust relational database model, supporting the creation of structured and organized datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ACID Compliance&lt;/strong&gt;: ACID (Atomicity, Consistency, Isolation, Durability) compliance ensures data integrity, making PostgreSQL suitable for critical applications.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;

&lt;p&gt;Here's an overview of the project structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Benthos Pipeline&lt;/strong&gt;: The core Benthos configuration is stored in &lt;code&gt;pipeline/benthos.yml&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostgreSQL Setup&lt;/strong&gt;: SQL scripts for creating tables are located in &lt;code&gt;postgres/sql/create_table.sql&lt;/code&gt;, and data is stored in &lt;code&gt;postgres/data&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Producer&lt;/strong&gt;: The data generator, responsible for producing random data, is in the &lt;code&gt;data-producer&lt;/code&gt; directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker Setup&lt;/strong&gt;: The &lt;code&gt;docker-compose.yaml&lt;/code&gt; file orchestrates the Docker containers for PostgreSQL, Kafka, Benthos, and the data producer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Configuration&lt;/strong&gt;: Environment variables are stored in &lt;code&gt;config.env&lt;/code&gt;, facilitating easy configuration management.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Generation:&lt;/strong&gt; The data producer (&lt;code&gt;message-producer.py&lt;/code&gt;) generates random messages with the format &lt;code&gt;{"name": "hRnWJsIf", "age": 82}&lt;/code&gt; and publishes them to the Kafka topic (&lt;code&gt;users&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Benthos Processing:&lt;/strong&gt; Benthos reads messages from the Kafka topic, processes them according to the defined pipeline in &lt;code&gt;pipeline/benthos.yml&lt;/code&gt;, and sends them to the specified output.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostgreSQL Storage:&lt;/strong&gt; Processed data is inserted into the PostgreSQL table (&lt;code&gt;users&lt;/code&gt;) as defined in &lt;code&gt;postgres/sql/create_table.sql&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Running the Project
&lt;/h3&gt;

&lt;p&gt;1- Navigate to the project directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/BenthosPipelineDB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2- Start the project using Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3- Monitor logs to ensure everything is running smoothly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose logs &lt;span class="nt"&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4- Kafka Console Consumer&lt;br&gt;
If you want to observe the data flowing through the users topic in real time, you can use the Kafka console consumer. Open your terminal and run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose &lt;span class="nb"&gt;exec &lt;/span&gt;kafka kafka-console-consumer.sh &lt;span class="nt"&gt;--topic&lt;/span&gt; &lt;span class="nb"&gt;users&lt;/span&gt; &lt;span class="nt"&gt;--from-beginning&lt;/span&gt; &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; kafka:9092
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5- Connecting to PostgreSQL&lt;br&gt;
To inspect the data in the PostgreSQL database, you can use a PostgreSQL client. Assuming you have PostgreSQL installed locally, you can connect using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="nt"&gt;-h&lt;/span&gt; localhost &lt;span class="nt"&gt;-p&lt;/span&gt; 5432 &lt;span class="nt"&gt;-U&lt;/span&gt; postgres &lt;span class="nt"&gt;-d&lt;/span&gt; postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;6- Now, let's run a simple query to fetch the first 10 records from the users' table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM &lt;span class="nb"&gt;users &lt;/span&gt;LIMIT 10&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, you have a robust data processing pipeline using Benthos, Kafka, and PostgreSQL!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;BenthosPipelineDB demonstrates the power and flexibility of Benthos in combination with Kafka and PostgreSQL. With a clear project structure and straightforward configuration, it provides a foundation for building scalable and reliable data processing systems.&lt;/p&gt;

&lt;p&gt;Explore the &lt;a href="https://github.com/ranjbaryshahab/BenthosPipelineDB" rel="noopener noreferrer"&gt;BenthosPipelineDB repository&lt;/a&gt; for hands-on experience and customization.&lt;/p&gt;

&lt;p&gt;Happy streaming!&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>postgres</category>
      <category>benthos</category>
      <category>ingestion</category>
    </item>
    <item>
      <title>Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Mon, 25 Sep 2023 00:47:53 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/connecting-multiple-kafka-clusters-in-clickhouse-using-named-collections-4i79</link>
      <guid>https://dev.to/ranjbaryshahab/connecting-multiple-kafka-clusters-in-clickhouse-using-named-collections-4i79</guid>
      <description>&lt;h3&gt;
  
  
  Introduction:
&lt;/h3&gt;

&lt;p&gt;ClickHouse is a powerful columnar database renowned for its speed and efficiency. A pivotal strength lies in its seamless integration with external data sources like Kafka. With the rising need for multi-cluster setups in modern data architectures, ClickHouse's Named Collections offers an invaluable asset. In this guide, we'll delve into how you can leverage this feature to seamlessly set up connections to two distinct Kafka clusters.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Use Named Collections?
&lt;/h3&gt;

&lt;p&gt;Understanding the true value of Named Collections is crucial before we dive deep into the configurations. They allow us to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduce Repetition:&lt;/strong&gt; Eliminate the need to redundantly specify configurations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Centralized Management:&lt;/strong&gt; Maintain all configurations in a single, easily manageable location.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Security:&lt;/strong&gt; Safeguard sensitive credentials, keeping them out of the reach of non-administrative users.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Configuring Named Collections for Kafka:
&lt;/h3&gt;

&lt;p&gt;With the prominence of Named Collections established, let's gear up to connect to two distinct Kafka clusters – primary and secondary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XML Configuration&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;clickhouse&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;named_collections&amp;gt;&lt;/span&gt;
        &lt;span class="c"&gt;&amp;lt;!-- Primary Kafka Cluster Configuration --&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;primary_kafka_cluster&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;broker_list&amp;gt;&lt;/span&gt;primary-kafka-cluster:9094&lt;span class="nt"&gt;&amp;lt;/broker_list&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;kafka_settings&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;client_id&amp;gt;&lt;/span&gt;primary_kafka_client&lt;span class="nt"&gt;&amp;lt;/client_id&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;security_protocol&amp;gt;&lt;/span&gt;SASL_PLAINTEXT&lt;span class="nt"&gt;&amp;lt;/security_protocol&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;sasl_mechanisms&amp;gt;&lt;/span&gt;SCRAM-SHA-512&lt;span class="nt"&gt;&amp;lt;/sasl_mechanisms&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;sasl_username&amp;gt;&lt;/span&gt;clickhouse_primary&lt;span class="nt"&gt;&amp;lt;/sasl_username&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;sasl_password&amp;gt;&lt;/span&gt;primary_secret_password&lt;span class="nt"&gt;&amp;lt;/sasl_password&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/kafka_settings&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/primary_kafka_cluster&amp;gt;&lt;/span&gt;
        &lt;span class="c"&gt;&amp;lt;!-- Secondary Kafka Cluster Configuration --&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;secondary_kafka_cluster&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;broker_list&amp;gt;&lt;/span&gt;backup-kafka-cluster:9095&lt;span class="nt"&gt;&amp;lt;/broker_list&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;kafka_settings&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;client_id&amp;gt;&lt;/span&gt;secondary_kafka_client&lt;span class="nt"&gt;&amp;lt;/client_id&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;security_protocol&amp;gt;&lt;/span&gt;SASL_PLAINTEXT&lt;span class="nt"&gt;&amp;lt;/security_protocol&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;sasl_mechanism&amp;gt;&lt;/span&gt;SCRAM-SHA-512&lt;span class="nt"&gt;&amp;lt;/sasl_mechanism&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;sasl_username&amp;gt;&lt;/span&gt;clickhouse_secondary&lt;span class="nt"&gt;&amp;lt;/sasl_username&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;sasl_password&amp;gt;&lt;/span&gt;secondary_secret_password&lt;span class="nt"&gt;&amp;lt;/sasl_password&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/kafka_settings&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/secondary_kafka_cluster&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/named_collections&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/clickhouse&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;For a more detailed configuration setup, refer to&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/31691" rel="noopener noreferrer"&gt; Pull Request #31691&lt;/a&gt; starting from ClickHouse v21.12, which provides a more streamlined approach to using named_collections.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Setting Up Permanent Storage: MergeTree Table
&lt;/h3&gt;

&lt;p&gt;After configuring our Kafka connections, the focus shifts to the ClickHouse realm. We'll architect tables that act as our permanent data reservoirs.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Kafka Engine Table:
&lt;/h4&gt;

&lt;p&gt;To tap directly into our Kafka topics, we'll shape tables in ClickHouse using the Kafka engine. Here's how you can define these tables:&lt;/p&gt;

&lt;p&gt;For the primary Kafka cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;kafka_cluster_a&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`first_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`last_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Kafka&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;primary_kafka_cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;kafka_topic_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'your_topic_name_for_primary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;kafka_group_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'your_consumer_group_for_primary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;kafka_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'JSONEachRow'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the secondary Kafka cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;kafka_cluster_b&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`first_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`last_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Kafka&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secondary_kafka_cluster&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;kafka_topic_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'your_topic_name_for_secondary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;kafka_group_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'your_consumer_group_for_secondary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;kafka_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'JSONEachRow'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. MergeTree Table:
&lt;/h4&gt;

&lt;p&gt;We'll use the MergeTree table to persistently store the data streamed from Kafka:&lt;/p&gt;

&lt;p&gt;For kafka.cluster_a:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;cluster_a_storage&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`first_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`last_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For kafka.cluster_b:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;cluster_b_storage&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`first_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`last_name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Materialized View:
&lt;/h4&gt;

&lt;p&gt;The Materialized View serves as the Kafka table's consumer, directing data flow:&lt;/p&gt;

&lt;p&gt;For kafka.cluster_a:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;cluster_a_mv&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;cluster_a_storage&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For kafka.cluster_b:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;cluster_b_mv&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;cluster_b_storage&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Practical Applications:
&lt;/h3&gt;

&lt;p&gt;With the above groundwork, ClickHouse is primed to consistently ingest and archive data from both Kafka clusters. This means any data dispatched to the delineated Kafka topics will be assimilated in real time. This is particularly advantageous for businesses seeking to conduct instantaneous analytics or data-driven decision-making.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion:
&lt;/h3&gt;

&lt;p&gt;Harnessing ClickHouse's Named Collections, establishing connections to multiple Kafka clusters transitions from being merely possible to efficient and organized. This structure guarantees instant data availability for querying, simplifying real-time analytics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Further Reading:
&lt;/h3&gt;

&lt;p&gt;For a profound understanding of ClickHouse's named_collections, explore the &lt;a href="https://clickhouse.com/docs/en/operations/named-collections" rel="noopener noreferrer"&gt;official ClickHouse documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>kafka</category>
      <category>dataintegration</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Safeguarding Secrets in Spring Boot with Vault</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Thu, 14 Sep 2023 15:44:16 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/safeguarding-secrets-in-spring-boot-with-vault-2iki</link>
      <guid>https://dev.to/ranjbaryshahab/safeguarding-secrets-in-spring-boot-with-vault-2iki</guid>
      <description>&lt;p&gt;When developing Spring Boot applications, we often encounter the need to manage sensitive information like database passwords, application keys, and other secrets. Typically, these secrets might reside in &lt;code&gt;application.properties&lt;/code&gt; or &lt;code&gt;application.yml&lt;/code&gt;. Pushing such sensitive information to repositories is risky. Thankfully, there's &lt;strong&gt;HashiCorp's Vault&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Vault
&lt;/h2&gt;

&lt;p&gt;Vault is a tool for securely accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, or certificates. Vault provides a unified interface to any secret while providing tight access control and recording a detailed audit log.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Vault with Docker Compose
&lt;/h2&gt;

&lt;p&gt;For rapid deployment, especially during development, Docker Compose offers a convenient method to get Vault up and running.&lt;/p&gt;

&lt;p&gt;1.Create a file named &lt;code&gt;docker-compose.yml&lt;/code&gt; and add the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;vault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault:1.13.3&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vault_dev&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8200:8200"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;VAULT_DEV_ROOT_TOKEN_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myroot&lt;/span&gt;
      &lt;span class="na"&gt;VAULT_DEV_LISTEN_ADDRESS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0:8200"&lt;/span&gt;
    &lt;span class="na"&gt;cap_add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;IPC_LOCK&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;server -dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.Run docker-compose up to start the Vault server.&lt;/p&gt;

&lt;p&gt;3.Access the Vault UI at &lt;a href="http://localhost:8200/ui" rel="noopener noreferrer"&gt;http://localhost:8200/ui&lt;/a&gt; and log in with the token myroot.&lt;/p&gt;

&lt;p&gt;After logging in, you can view this page&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxqfemfvvi7sbltzn3yj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxqfemfvvi7sbltzn3yj.png" alt="Image description" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Enable KV engine&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2crl7iapx04wc15rqyae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2crl7iapx04wc15rqyae.png" alt="Image description" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add the secrets in the specific path you want eg Postgresql settings&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqv4k8u0s8qsdz03ns7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqv4k8u0s8qsdz03ns7o.png" alt="Image description" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Vault with Spring Boot
&lt;/h2&gt;

&lt;p&gt;Once Vault is up and running, the next step is integrating it with your Spring Boot application.&lt;/p&gt;

&lt;p&gt;1.Setting Environments&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VAULT_URI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8200
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VAULT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myroot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.Spring Boot Initial Setup:&lt;br&gt;
Set up a basic Spring Boot application. If you're new to this, Spring Initializr provides a quick way to bootstrap a new Spring Boot project.&lt;/p&gt;

&lt;p&gt;3.Maven Dependency:&lt;br&gt;
To integrate Spring Vault, add the following dependency to your Maven pom.xml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;&amp;lt;groupId&amp;gt;org.springframework.vault&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;&amp;lt;artifactId&amp;gt;spring-vault-core&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;&amp;lt;version&amp;gt;3.0.4&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="s"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4.Integration Code:&lt;br&gt;
First, define your Vault configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VaultConfiguration&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getenv&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"VAULT_TOKEN"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getenv&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"VAULT_URI"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getToken&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getUri&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, use a utility to read secrets from Vault:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VaultConfigReaderUtil&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;MAIN_VAULT_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"kv/data/myapplication"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="nc"&gt;VaultEndpoint&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VaultEndpoint&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;VaultConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getUri&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
            &lt;span class="nc"&gt;VaultTemplate&lt;/span&gt; &lt;span class="n"&gt;vaultTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;VaultTemplate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TokenAuthentication&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;VaultConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getToken&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
            &lt;span class="nc"&gt;VaultResponse&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vaultTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s/%s"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;formatted&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MAIN_VAULT_PATH&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
            &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mapData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;)&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRequiredData&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;putAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mapData&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;URISyntaxException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, in your main application, use the utility to fetch secrets and feed them to Spring Boot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@SpringBootApplication&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyApplication&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;props&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;postgresProp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VaultConfigReaderUtil&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;read&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"postgres"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;putAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgresProp&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;SpringApplicationBuilder&lt;/span&gt; &lt;span class="n"&gt;springApplication&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SpringApplicationBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MyApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;springApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the ever-evolving landscape of software development, security remains paramount. Secrets management is a fundamental aspect of ensuring that our applications are robust against breaches and unforeseen vulnerabilities. As we've explored, Vault by HashiCorp offers a comprehensive solution to this challenge, seamlessly integrating with frameworks like Spring Boot.&lt;/p&gt;

&lt;p&gt;Utilizing Docker Compose, we've streamlined the deployment of Vault, making it more accessible, especially during the development phase. However, it's essential to understand that while development configurations simplify processes, they aren't suitable for production environments. Always ensure a secure, production-ready configuration for Vault when deploying in live scenarios.&lt;/p&gt;

&lt;p&gt;Spring Boot's flexibility and vast ecosystem, paired with Vault's secure secrets management capabilities, provide developers with powerful tools to build secure, scalable, and efficient applications. As we continue to innovate and build, let's prioritize the safety and integrity of our applications, ensuring that our secrets remain just that – secret.&lt;/p&gt;

</description>
      <category>springboot</category>
      <category>vault</category>
      <category>security</category>
      <category>devops</category>
    </item>
    <item>
      <title>ClickHouse Custom Function: Convert Georgian Dates to Jalali Dates</title>
      <dc:creator>Shahab Ranjbary</dc:creator>
      <pubDate>Thu, 07 Sep 2023 10:50:13 +0000</pubDate>
      <link>https://dev.to/ranjbaryshahab/clickhouse-custom-function-convert-georgian-dates-to-jalali-dates-2ip8</link>
      <guid>https://dev.to/ranjbaryshahab/clickhouse-custom-function-convert-georgian-dates-to-jalali-dates-2ip8</guid>
      <description>&lt;p&gt;Dates are a fundamental part of data analysis, and often, there's a need to convert between different date systems. While the Gregorian (or Georgian) calendar is internationally the most widely used civil calendar, the Jalali or Persian calendar is used predominantly in certain regions like Iran. and has its own significance. Especially when dealing with data from regions using the Jalali calendar, converting between these date systems becomes essential.&lt;/p&gt;

&lt;p&gt;In this guide, we'll walk through setting up a simple integration in ClickHouse that allows you to convert Georgian dates to Jalali dates. ClickHouse, a fast and efficient column-oriented database management system, allows the integration of custom functions. We'll leverage this capability to use a Python script for our date conversion.&lt;/p&gt;

&lt;p&gt;Follow the steps below to set up this conversion functionality:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install the Necessary Python Library:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;jdatetime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Scripting the Conversion:&lt;/strong&gt;&lt;br&gt;
Initiate by creating a Python script named georgian_to_jalali.py.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /var/lib/clickhouse/user_scripts/georgian_to_jalali.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Populate the script with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#!/usr/bin/python3
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;jdatetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;georgian_to_jalali&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;georgian_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdatetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strptime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;jalali_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdatetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromgregorian&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;georgian_date&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jalali_date&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;date_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rstrip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;georgian_to_jalali&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ensure the script is executable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;777 /var/lib/clickhouse/user_scripts/georgian_to_jalali.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Integrate with ClickHouse:&lt;/strong&gt;&lt;br&gt;
Define the ClickHouse function by creating an XML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/clickhouse-server/georgian_to_jalali_function.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Input the following XML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;functions&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;function&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;type&amp;gt;&lt;/span&gt;executable&lt;span class="nt"&gt;&amp;lt;/type&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;georgian_to_jalali&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;return_type&amp;gt;&lt;/span&gt;String&lt;span class="nt"&gt;&amp;lt;/return_type&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;argument&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;type&amp;gt;&lt;/span&gt;String&lt;span class="nt"&gt;&amp;lt;/type&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;value&lt;span class="nt"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/argument&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;format&amp;gt;&lt;/span&gt;TabSeparated&lt;span class="nt"&gt;&amp;lt;/format&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;command&amp;gt;&lt;/span&gt;georgian_to_jalali.py&lt;span class="nt"&gt;&amp;lt;/command&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/function&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/functions&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;br&gt;
With everything set, utilize the function in ClickHouse queries. Sample queries include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;georgian_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;georgian_to_jalali&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;jalali_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After executing, the results align with the referenced image provided.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqcbj3y3b7eqprd8yvgw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqcbj3y3b7eqprd8yvgw.png" alt="Image description" width="800" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>function</category>
      <category>jalalidate</category>
    </item>
  </channel>
</rss>
