<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kanishga Subramani</title>
    <description>The latest articles on DEV Community by Kanishga Subramani (@kanishga_subramani_49ad73).</description>
    <link>https://dev.to/kanishga_subramani_49ad73</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3951880%2F08e2b1d3-1c3e-4280-91fc-99fd18e39198.jpg</url>
      <title>DEV Community: Kanishga Subramani</title>
      <link>https://dev.to/kanishga_subramani_49ad73</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kanishga_subramani_49ad73"/>
    <language>en</language>
    <item>
      <title>Day 30 of 100 Days of ClickHouse® - Using ClickHouse® Arrays for Complex Data Structures</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Mon, 22 Jun 2026 17:12:04 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-30-of-100-days-of-clickhouser-using-clickhouser-arrays-for-complex-data-structures-3ma0</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-30-of-100-days-of-clickhouser-using-clickhouser-arrays-for-complex-data-structures-3ma0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern analytical workloads often involve working with multi-valued and semi-structured data. Traditional relational database designs typically represent these relationships using separate tables connected through joins. While this approach works well, it can increase schema complexity and slow down analytical queries as datasets grow.&lt;/p&gt;

&lt;p&gt;ClickHouse® provides a powerful &lt;strong&gt;Array&lt;/strong&gt; data type that allows multiple values to be stored within a single column. Combined with a rich collection of built-in array functions, arrays make it possible to efficiently store, query, and transform complex datasets while maintaining the high-performance analytics that ClickHouse® is known for.&lt;/p&gt;

&lt;p&gt;In this article, you'll learn how to create tables with array columns, insert and query array data, explore commonly used array functions, and understand real-world scenarios where arrays simplify data modeling and improve performance.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Use Arrays in ClickHouse®?
&lt;/h1&gt;

&lt;p&gt;Arrays allow you to store multiple related values inside a single column instead of spreading them across multiple rows or tables.&lt;/p&gt;

&lt;p&gt;This approach offers several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces the need for joins&lt;/li&gt;
&lt;li&gt;Simplifies database schema design&lt;/li&gt;
&lt;li&gt;Improves query performance&lt;/li&gt;
&lt;li&gt;Efficiently stores lists, tags, events, and user activities&lt;/li&gt;
&lt;li&gt;Supports a wide range of optimized built-in array functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Arrays are widely used in applications such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Website analytics&lt;/li&gt;
&lt;li&gt;User behavior tracking&lt;/li&gt;
&lt;li&gt;IoT monitoring&lt;/li&gt;
&lt;li&gt;Recommendation systems&lt;/li&gt;
&lt;li&gt;Product catalogs&lt;/li&gt;
&lt;li&gt;Event logging&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Understanding the Array Data Type
&lt;/h1&gt;

&lt;p&gt;An Array in ClickHouse® is a collection of values of the same data type stored together in a single column.&lt;/p&gt;

&lt;p&gt;The general syntax is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DataType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Examples include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Arrays are commonly used for storing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product lists&lt;/li&gt;
&lt;li&gt;User interests&lt;/li&gt;
&lt;li&gt;Website page visits&lt;/li&gt;
&lt;li&gt;Sensor readings&lt;/li&gt;
&lt;li&gt;Event sequences&lt;/li&gt;
&lt;li&gt;Tags and categories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of storing product names across multiple rows, they can be stored together in a single array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Laptop', 'Mouse', 'Keyboard']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps related information together and often makes analytical queries much simpler.&lt;/p&gt;




&lt;h1&gt;
  
  
  Creating a Table with Array Columns
&lt;/h1&gt;

&lt;p&gt;Let's create a table that stores each user's purchase history.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;purchased_items&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Table Structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;user_id&lt;/td&gt;
&lt;td&gt;Unique user identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;purchased_items&lt;/td&gt;
&lt;td&gt;Array containing purchased products&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This design works well because all purchased items naturally belong to the same user.&lt;/p&gt;




&lt;h1&gt;
  
  
  Inserting Array Data
&lt;/h1&gt;

&lt;p&gt;Insert a single record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Laptop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Keyboard'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert multiple records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Monitor'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Webcam'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'SSD'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'RAM'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'CPU'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Querying Array Data
&lt;/h1&gt;

&lt;p&gt;Retrieve all records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;user_id&lt;/th&gt;
&lt;th&gt;purchased_items&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;['Laptop','Mouse','Keyboard']&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;['Monitor','Webcam']&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;['SSD','RAM','CPU']&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h1&gt;
  
  
  Accessing Individual Array Elements
&lt;/h1&gt;

&lt;p&gt;ClickHouse® arrays use &lt;strong&gt;1-based indexing&lt;/strong&gt;, meaning the first element starts at index &lt;strong&gt;1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Retrieve the first purchased item:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;purchased_items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_item&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Laptop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can access any position using the corresponding index.&lt;/p&gt;




&lt;h1&gt;
  
  
  Common Array Functions
&lt;/h1&gt;

&lt;p&gt;ClickHouse® includes many optimized functions for working with arrays.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. arrayJoin()
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;arrayJoin()&lt;/code&gt; function expands every array element into a separate row.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;arrayJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;purchased_items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;user_id&lt;/th&gt;
&lt;th&gt;item&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Mouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Keyboard&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Product analytics&lt;/li&gt;
&lt;li&gt;Event tracking&lt;/li&gt;
&lt;li&gt;Aggregations&lt;/li&gt;
&lt;li&gt;Reporting&lt;/li&gt;
&lt;li&gt;Visualization&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. arrayMap()
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayMap()&lt;/code&gt; applies a transformation to every element in an array.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;arrayMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;upperUTF8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;purchased_items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['LAPTOP','MOUSE','KEYBOARD']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data transformation&lt;/li&gt;
&lt;li&gt;String formatting&lt;/li&gt;
&lt;li&gt;Mathematical operations&lt;/li&gt;
&lt;li&gt;Data standardization&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. arrayFilter()
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayFilter()&lt;/code&gt; returns only the elements that satisfy a condition.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;arrayFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%o%'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purchased_items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Laptop','Mouse']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Product filtering&lt;/li&gt;
&lt;li&gt;Event selection&lt;/li&gt;
&lt;li&gt;Preference analysis&lt;/li&gt;
&lt;li&gt;Data cleansing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. arrayExists()
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayExists()&lt;/code&gt; checks whether at least one element satisfies a condition.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;arrayExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purchased_items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_purchases&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; = True&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; = False&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Membership testing&lt;/li&gt;
&lt;li&gt;Product existence checks&lt;/li&gt;
&lt;li&gt;User activity validation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. arrayDistinct()
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayDistinct()&lt;/code&gt; removes duplicate values.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;arrayDistinct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Laptop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Laptop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Keyboard'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Laptop','Mouse','Keyboard']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Removing duplicates&lt;/li&gt;
&lt;li&gt;Data cleaning&lt;/li&gt;
&lt;li&gt;Extracting unique values&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6. arrayCount()
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayCount()&lt;/code&gt; counts elements matching a condition.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;arrayCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only &lt;strong&gt;15&lt;/strong&gt; and &lt;strong&gt;20&lt;/strong&gt; satisfy the condition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Threshold monitoring&lt;/li&gt;
&lt;li&gt;User activity analysis&lt;/li&gt;
&lt;li&gt;Counting qualifying values&lt;/li&gt;
&lt;li&gt;Analytical calculations&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Working with Related Arrays
&lt;/h1&gt;

&lt;p&gt;Arrays become even more useful when multiple arrays represent related information.&lt;/p&gt;

&lt;p&gt;Consider website analytics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;website_sessions&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pages&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;durations&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert sample data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;website_sessions&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'/home'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'/products'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'/checkout'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This represents:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Page&lt;/th&gt;
&lt;th&gt;Duration (Seconds)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;/home&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;/products&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;/checkout&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Keeping these arrays together preserves both sequence and context, making session analysis much easier.&lt;/p&gt;

&lt;p&gt;Common applications include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Website navigation tracking&lt;/li&gt;
&lt;li&gt;User journey analysis&lt;/li&gt;
&lt;li&gt;IoT sensor monitoring&lt;/li&gt;
&lt;li&gt;Event stream processing&lt;/li&gt;
&lt;li&gt;Application monitoring&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Performance Considerations
&lt;/h1&gt;

&lt;p&gt;To get the best performance when using arrays:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep arrays reasonably sized.&lt;/li&gt;
&lt;li&gt;Avoid storing thousands of elements in a single row unless necessary.&lt;/li&gt;
&lt;li&gt;Use optimized functions such as &lt;code&gt;arrayMap()&lt;/code&gt;, &lt;code&gt;arrayFilter()&lt;/code&gt;, and &lt;code&gt;arrayExists()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Minimize the use of &lt;code&gt;arrayJoin()&lt;/code&gt; since it expands rows and can significantly increase query processing.&lt;/li&gt;
&lt;li&gt;Store related values together when they are frequently queried.&lt;/li&gt;
&lt;li&gt;Benchmark queries on production-like datasets before deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Following these recommendations helps maintain ClickHouse®'s excellent analytical performance.&lt;/p&gt;




&lt;h1&gt;
  
  
  Real-World Use Cases
&lt;/h1&gt;

&lt;h2&gt;
  
  
  E-Commerce Analytics
&lt;/h2&gt;

&lt;p&gt;Store purchased products in a transaction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Laptop','Mouse','Keyboard']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simplified schema&lt;/li&gt;
&lt;li&gt;Fewer joins&lt;/li&gt;
&lt;li&gt;Faster analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  User Activity Tracking
&lt;/h2&gt;

&lt;p&gt;Track user actions during a session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Login','Search','AddToCart','Checkout']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer journey analysis&lt;/li&gt;
&lt;li&gt;Behavioral analytics&lt;/li&gt;
&lt;li&gt;Funnel analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Website Navigation Analysis
&lt;/h2&gt;

&lt;p&gt;Store page visit sequences.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Home','Products','Pricing','Contact']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clickstream analysis&lt;/li&gt;
&lt;li&gt;Conversion optimization&lt;/li&gt;
&lt;li&gt;Navigation insights&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  IoT Sensor Monitoring
&lt;/h2&gt;

&lt;p&gt;Store multiple sensor readings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[23.1, 23.4, 22.9, 24.0]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time-series analytics&lt;/li&gt;
&lt;li&gt;Monitoring systems&lt;/li&gt;
&lt;li&gt;Device telemetry&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Recommendation Systems
&lt;/h2&gt;

&lt;p&gt;Store user interests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;['Technology','Programming','AI']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Personalized recommendations&lt;/li&gt;
&lt;li&gt;Content suggestions&lt;/li&gt;
&lt;li&gt;Product recommendations&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Best Practices
&lt;/h1&gt;

&lt;p&gt;When working with arrays:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use arrays only for naturally grouped data.&lt;/li&gt;
&lt;li&gt;Avoid extremely large arrays.&lt;/li&gt;
&lt;li&gt;Prefer built-in array functions over custom processing.&lt;/li&gt;
&lt;li&gt;Avoid excessive nesting of arrays.&lt;/li&gt;
&lt;li&gt;Monitor query performance using ClickHouse® system tables and query logs.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Document array structures, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Element ordering&lt;/li&gt;
&lt;li&gt;Data types&lt;/li&gt;
&lt;li&gt;Business meaning&lt;/li&gt;
&lt;li&gt;Expected size limits&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good documentation helps maintain consistency as applications evolve.&lt;/p&gt;




&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Arrays are one of the most powerful data types available in ClickHouse®. They provide an efficient way to store and analyze multi-valued data without relying heavily on additional tables or complex joins.&lt;/p&gt;

&lt;p&gt;Functions such as &lt;code&gt;arrayJoin()&lt;/code&gt;, &lt;code&gt;arrayMap()&lt;/code&gt;, &lt;code&gt;arrayFilter()&lt;/code&gt;, &lt;code&gt;arrayExists()&lt;/code&gt;, &lt;code&gt;arrayDistinct()&lt;/code&gt;, and &lt;code&gt;arrayCount()&lt;/code&gt; make it easy to manipulate and analyze array data while taking advantage of ClickHouse®'s optimized execution engine.&lt;/p&gt;

&lt;p&gt;Whether you're building analytics for e-commerce platforms, monitoring IoT devices, tracking website sessions, or powering recommendation systems, arrays help create cleaner schemas, simplify queries, and deliver fast analytical performance.&lt;/p&gt;

&lt;p&gt;Understanding when and how to use arrays effectively will allow you to model complex datasets more naturally while getting the most out of ClickHouse®'s high-performance architecture.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>clickhouse</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Day 29 of 100 Days of ClickHouse® - Understanding Distributed Tables</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Sun, 21 Jun 2026 16:09:42 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-29-of-100-days-of-clickhouser-understanding-distributed-tables-5jb</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-29-of-100-days-of-clickhouser-understanding-distributed-tables-5jb</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As datasets grow, a single ClickHouse® server may eventually become insufficient for handling storage, ingestion, or query workloads.&lt;/p&gt;

&lt;p&gt;At that point, scaling horizontally becomes necessary.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Distributed tables&lt;/strong&gt; come into play.&lt;/p&gt;

&lt;p&gt;Distributed tables provide a unified way to query data spread across multiple servers while maintaining a familiar SQL experience. Instead of worrying about which node contains specific data, users interact with a single logical table while ClickHouse® handles query routing and result aggregation behind the scenes.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore what Distributed tables are, how they work, when to use them, and the architectural concepts every ClickHouse® user should understand.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Distributed Tables Exist
&lt;/h1&gt;

&lt;p&gt;A single ClickHouse® server is capable of processing billions of rows efficiently. However, as applications grow, organizations often encounter requirements such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More storage capacity&lt;/li&gt;
&lt;li&gt;Higher ingestion throughput&lt;/li&gt;
&lt;li&gt;Increased query processing power&lt;/li&gt;
&lt;li&gt;High availability&lt;/li&gt;
&lt;li&gt;Geographic distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than continually upgrading a single machine, it's often more practical to distribute data across multiple servers.&lt;/p&gt;

&lt;p&gt;The challenge then becomes simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can users query data spread across several servers without manually connecting to each one?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Distributed tables solve this problem by providing a single logical entry point for the entire cluster.&lt;/p&gt;




&lt;h1&gt;
  
  
  Understanding the Distributed Engine
&lt;/h1&gt;

&lt;p&gt;The &lt;strong&gt;Distributed&lt;/strong&gt; engine acts as a logical layer over existing local tables.&lt;/p&gt;

&lt;p&gt;Unlike MergeTree engines, a Distributed table &lt;strong&gt;does not store any data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead, it routes queries and inserts to the appropriate servers.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;events_local&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Distributed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;analytics_cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;events_local&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cityHash64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;analytics_cluster&lt;/code&gt; is the cluster configuration defined in ClickHouse.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;analytics&lt;/code&gt; is the database name.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events_local&lt;/code&gt; is the underlying local table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cityHash64(user_id)&lt;/code&gt; is the sharding key.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Distributed table itself contains no rows. All data remains inside the local MergeTree tables on each server.&lt;/p&gt;




&lt;h1&gt;
  
  
  Local Tables vs Distributed Tables
&lt;/h1&gt;

&lt;p&gt;This distinction is one of the most important concepts to understand.&lt;/p&gt;

&lt;p&gt;Suppose you have three shards.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shard 1
└── events_local

Shard 2
└── events_local

Shard 3
└── events_local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each server stores its own portion of the data inside its local table.&lt;/p&gt;

&lt;p&gt;Above them sits a Distributed table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;events_distributed
        │
 ┌──────┼──────┐
 │      │      │
Shard1 Shard2 Shard3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applications query the Distributed table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse automatically sends the query to the relevant shards, collects the partial results, merges them, and returns a single response.&lt;/p&gt;

&lt;p&gt;From the application's perspective, it behaves like querying a normal table.&lt;/p&gt;




&lt;h1&gt;
  
  
  Understanding Sharding
&lt;/h1&gt;

&lt;p&gt;Distributed tables are typically used together with &lt;strong&gt;sharding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Sharding means splitting data across multiple servers.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User 1 → Shard 1
User 2 → Shard 2
User 3 → Shard 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;strong&gt;sharding key&lt;/strong&gt; determines where each row is stored.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;cityHash64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A good sharding key helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Balance storage across servers&lt;/li&gt;
&lt;li&gt;Distribute query workloads&lt;/li&gt;
&lt;li&gt;Improve scalability&lt;/li&gt;
&lt;li&gt;Prevent bottlenecks&lt;/li&gt;
&lt;li&gt;Keep related data together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the right sharding strategy is one of the most important architectural decisions in a distributed ClickHouse deployment.&lt;/p&gt;




&lt;h1&gt;
  
  
  How Distributed Queries Work
&lt;/h1&gt;

&lt;p&gt;Suppose a user runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse performs the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sends the query to each participating shard.&lt;/li&gt;
&lt;li&gt;Executes the query locally on every shard.&lt;/li&gt;
&lt;li&gt;Collects the partial aggregation results.&lt;/li&gt;
&lt;li&gt;Merges the results.&lt;/li&gt;
&lt;li&gt;Returns the final output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because multiple servers process the query simultaneously, distributed queries can scale efficiently as data grows.&lt;/p&gt;




&lt;h1&gt;
  
  
  Distributed Inserts
&lt;/h1&gt;

&lt;p&gt;Distributed tables can also receive inserts.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Distributed engine evaluates the sharding key and automatically forwards each row to the appropriate shard.&lt;/p&gt;

&lt;p&gt;Applications don't need to know where data belongs, making distributed ingestion much simpler.&lt;/p&gt;




&lt;h1&gt;
  
  
  Choosing the Right Sharding Key
&lt;/h1&gt;

&lt;p&gt;Although a Distributed table can be created without a meaningful sharding strategy, doing so often leads to problems such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uneven data distribution&lt;/li&gt;
&lt;li&gt;Hot shards&lt;/li&gt;
&lt;li&gt;Storage imbalance&lt;/li&gt;
&lt;li&gt;Poor query performance&lt;/li&gt;
&lt;li&gt;Limited scalability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common sharding keys include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;user_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;customer_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tenant_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;organization_id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best choice depends on how your application queries data and how evenly values are distributed.&lt;/p&gt;




&lt;h1&gt;
  
  
  Distributed Tables vs Replication
&lt;/h1&gt;

&lt;p&gt;These concepts are frequently confused, but they solve different problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sharding
&lt;/h3&gt;

&lt;p&gt;Sharding distributes data across multiple servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Scale horizontally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replication
&lt;/h3&gt;

&lt;p&gt;Replication creates multiple copies of the same data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; High availability and fault tolerance.&lt;/p&gt;

&lt;p&gt;A production cluster commonly combines both.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shard 1
├── Replica A
└── Replica B

Shard 2
├── Replica A
└── Replica B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Distributed tables provide a single interface that allows applications to query the entire cluster regardless of where the data is stored.&lt;/p&gt;




&lt;h1&gt;
  
  
  Query Optimization Tips
&lt;/h1&gt;

&lt;p&gt;Distributed queries involve network communication between nodes.&lt;/p&gt;

&lt;p&gt;To achieve the best performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter data as early as possible.&lt;/li&gt;
&lt;li&gt;Choose efficient sorting keys.&lt;/li&gt;
&lt;li&gt;Minimize unnecessary data movement.&lt;/li&gt;
&lt;li&gt;Design effective sharding keys.&lt;/li&gt;
&lt;li&gt;Avoid excessive cross-shard operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The less data transferred across the network, the faster distributed queries become.&lt;/p&gt;




&lt;h1&gt;
  
  
  Common Mistakes
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Assuming Distributed Tables Store Data
&lt;/h3&gt;

&lt;p&gt;They don't.&lt;/p&gt;

&lt;p&gt;The data always resides in local MergeTree tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing a Poor Sharding Key
&lt;/h3&gt;

&lt;p&gt;An uneven sharding strategy can overload specific nodes while others remain underutilized.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ignoring Query Patterns
&lt;/h3&gt;

&lt;p&gt;Your sharding strategy should align with how the application queries data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confusing Replication with Distribution
&lt;/h3&gt;

&lt;p&gt;Replication improves availability.&lt;/p&gt;

&lt;p&gt;Distribution improves scalability.&lt;/p&gt;

&lt;p&gt;Both are important, but they serve different purposes.&lt;/p&gt;




&lt;h1&gt;
  
  
  Example Architecture
&lt;/h1&gt;

&lt;p&gt;A typical ClickHouse deployment may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;analytics_cluster

Shard 1
├── events_local
└── Replica

Shard 2
├── events_local
└── Replica

events_distributed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applications interact only with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;events_distributed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse automatically handles query routing, distributed execution, and result aggregation.&lt;/p&gt;




&lt;h1&gt;
  
  
  When Should You Use Distributed Tables?
&lt;/h1&gt;

&lt;p&gt;Distributed tables are useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your data no longer fits comfortably on a single server.&lt;/li&gt;
&lt;li&gt;You need horizontal scalability.&lt;/li&gt;
&lt;li&gt;Your cluster contains multiple shards.&lt;/li&gt;
&lt;li&gt;Applications require a unified SQL interface.&lt;/li&gt;
&lt;li&gt;Large analytical workloads need to be distributed across several nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For single-node deployments, Distributed tables usually aren't necessary.&lt;/p&gt;




&lt;h1&gt;
  
  
  Best Practices
&lt;/h1&gt;

&lt;p&gt;When designing distributed ClickHouse clusters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose sharding keys carefully.&lt;/li&gt;
&lt;li&gt;Understand your query patterns before distributing data.&lt;/li&gt;
&lt;li&gt;Keep local table schemas identical across all nodes.&lt;/li&gt;
&lt;li&gt;Minimize cross-shard communication whenever possible.&lt;/li&gt;
&lt;li&gt;Combine sharding with replication for both scalability and availability.&lt;/li&gt;
&lt;li&gt;Monitor cluster health regularly.&lt;/li&gt;
&lt;li&gt;Validate data distribution before moving to production.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Distributed tables are one of the core building blocks that enable ClickHouse® to scale beyond a single server.&lt;/p&gt;

&lt;p&gt;Rather than storing data themselves, they act as a routing layer that allows applications to query and insert data across multiple shards using a single SQL interface.&lt;/p&gt;

&lt;p&gt;Understanding the relationship between local tables, Distributed tables, sharding, and replication is essential for designing efficient ClickHouse clusters.&lt;/p&gt;

&lt;p&gt;With a well-planned sharding strategy and thoughtful cluster design, Distributed tables make it possible to scale analytical workloads across multiple servers while keeping the SQL experience simple and familiar.&lt;/p&gt;

&lt;p&gt;Read more... &lt;a href="https://www.quantrail-data.com/clickhouse-distributed-tables-explained" rel="noopener noreferrer"&gt;https://www.quantrail-data.com/clickhouse-distributed-tables-explained&lt;/a&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>clickhouse</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Day 28 of 100 Days of ClickHouse® – Understanding TTL (Time To Live) for Automated Data Lifecycle Management</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Sat, 20 Jun 2026 14:51:59 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-28-of-100-days-of-clickhouser-understanding-ttl-time-to-live-for-automated-data-lifecycle-249d</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-28-of-100-days-of-clickhouser-understanding-ttl-time-to-live-for-automated-data-lifecycle-249d</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As data grows over time, storing every row forever becomes increasingly expensive and often unnecessary. Application logs, IoT events, monitoring metrics, audit records, and clickstream data typically have different retention requirements depending on their age.&lt;/p&gt;

&lt;p&gt;ClickHouse® provides &lt;strong&gt;TTL (Time To Live)&lt;/strong&gt;, a powerful built-in feature that automates data lifecycle management. Instead of relying on scheduled cleanup scripts or manual &lt;code&gt;DELETE&lt;/code&gt; operations, TTL rules can be defined directly within a table schema.&lt;/p&gt;

&lt;p&gt;With TTL, ClickHouse can automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delete expired data&lt;/li&gt;
&lt;li&gt;Move older data to lower-cost storage&lt;/li&gt;
&lt;li&gt;Recompress aging data using stronger codecs&lt;/li&gt;
&lt;li&gt;Aggregate historical records into summarized data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These capabilities make TTL one of the most effective storage optimization features available in ClickHouse.&lt;/p&gt;




&lt;h1&gt;
  
  
  What is TTL?
&lt;/h1&gt;

&lt;p&gt;TTL (Time To Live) defines what should happen to data after a specified period of time.&lt;/p&gt;

&lt;p&gt;A TTL expression is usually based on a &lt;code&gt;Date&lt;/code&gt; or &lt;code&gt;DateTime&lt;/code&gt; column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the specified interval expires, ClickHouse performs the configured action during background merge operations.&lt;/p&gt;

&lt;p&gt;Unlike scheduled jobs, TTL is completely integrated into the MergeTree engine, making lifecycle management automatic and maintenance-free.&lt;/p&gt;




&lt;h1&gt;
  
  
  How TTL Works Internally
&lt;/h1&gt;

&lt;p&gt;When new data is inserted into a MergeTree table, the following process occurs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data is written into immutable parts.&lt;/li&gt;
&lt;li&gt;Background merge operations combine smaller parts into larger ones.&lt;/li&gt;
&lt;li&gt;During each merge, ClickHouse evaluates TTL expressions.&lt;/li&gt;
&lt;li&gt;Expired rows are deleted, moved, recompressed, or aggregated depending on the configured rule.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because TTL works during merges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expired data may remain visible for a short period.&lt;/li&gt;
&lt;li&gt;Cleanup is asynchronous.&lt;/li&gt;
&lt;li&gt;Execution depends on the merge schedule.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For testing purposes, you can force a merge using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  1. Automatically Deleting Expired Data
&lt;/h1&gt;

&lt;p&gt;The most common TTL use case is automatic data deletion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What happens?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Newly inserted data remains available.&lt;/li&gt;
&lt;li&gt;After 30 days, rows become eligible for removal.&lt;/li&gt;
&lt;li&gt;During future background merges, ClickHouse automatically deletes expired rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This eliminates the need for cron jobs or periodic cleanup scripts while reducing storage usage automatically.&lt;/p&gt;




&lt;h1&gt;
  
  
  2. Column-Level TTL
&lt;/h1&gt;

&lt;p&gt;TTL isn't limited to entire rows. It can also be applied to individual columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;user_events&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_data&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Result
&lt;/h3&gt;

&lt;p&gt;After seven days:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;session_data&lt;/code&gt; is removed.&lt;/li&gt;
&lt;li&gt;The remaining columns stay intact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is particularly useful when temporary or sensitive information only needs to be retained for a limited time.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. Moving Data to Cold Storage
&lt;/h1&gt;

&lt;p&gt;Many organizations use multiple storage tiers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Tier&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SSD&lt;/td&gt;
&lt;td&gt;Frequently accessed recent data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HDD / Object Storage&lt;/td&gt;
&lt;td&gt;Historical or infrequently accessed data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;TTL can automatically move older data between storage volumes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;VOLUME&lt;/span&gt; &lt;span class="s1"&gt;'cold_storage'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Result
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Recent data remains on fast storage.&lt;/li&gt;
&lt;li&gt;Older data is moved to lower-cost storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps reduce infrastructure costs while keeping historical data available when needed.&lt;/p&gt;




&lt;h1&gt;
  
  
  4. Recompressing Older Data
&lt;/h1&gt;

&lt;p&gt;Recent data is queried frequently and benefits from fast compression codecs such as &lt;strong&gt;LZ4&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Historical data is accessed less often, making stronger compression more efficient.&lt;/p&gt;

&lt;p&gt;TTL supports automatic recompression.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="n"&gt;Float64&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="n"&gt;RECOMPRESS&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Compression lifecycle
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data Age&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0–30 Days&lt;/td&gt;
&lt;td&gt;Default (LZ4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After 30 Days&lt;/td&gt;
&lt;td&gt;ZSTD(17)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lower storage consumption&lt;/li&gt;
&lt;li&gt;Better disk utilization&lt;/li&gt;
&lt;li&gt;Fully automated optimization&lt;/li&gt;
&lt;li&gt;No manual recompression required&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  5. Aggregating Historical Data with GROUP BY TTL
&lt;/h1&gt;

&lt;p&gt;One of the most powerful TTL capabilities is &lt;strong&gt;automatic rollup aggregation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of deleting old data, ClickHouse can summarize historical records into aggregated values.&lt;/p&gt;

&lt;p&gt;This is particularly useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring systems&lt;/li&gt;
&lt;li&gt;IoT platforms&lt;/li&gt;
&lt;li&gt;Clickstream analytics&lt;/li&gt;
&lt;li&gt;Time-series databases&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_hits&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sum_hits&lt;/span&gt; &lt;span class="n"&gt;Int64&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toStartOfDay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;toStartOfDay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
    &lt;span class="n"&gt;max_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_hits&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;sum_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sum_hits&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Before Rollup
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Timestamp&lt;/th&gt;
&lt;th&gt;ID&lt;/th&gt;
&lt;th&gt;Hits&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10:00&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11:00&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12:00&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After one day, the TTL rule executes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;toStartOfDay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
    &lt;span class="n"&gt;max_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_hits&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;sum_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sum_hits&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  After Rollup
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Day&lt;/th&gt;
&lt;th&gt;ID&lt;/th&gt;
&lt;th&gt;Max Hits&lt;/th&gt;
&lt;th&gt;Sum Hits&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2026-06-20&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Instead of storing three separate rows, ClickHouse stores one summarized row.&lt;/p&gt;

&lt;p&gt;This significantly reduces storage while preserving meaningful historical insights.&lt;/p&gt;




&lt;h1&gt;
  
  
  Best Practices for Using TTL
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Use TTL for predictable retention policies instead of scheduled DELETE statements.&lt;/li&gt;
&lt;li&gt;Remember that TTL actions occur during background merges, not immediately after expiration.&lt;/li&gt;
&lt;li&gt;Apply column-level TTL when only specific fields need limited retention.&lt;/li&gt;
&lt;li&gt;Use storage tiering to reduce infrastructure costs without losing historical data.&lt;/li&gt;
&lt;li&gt;Combine TTL with recompression to maximize disk efficiency.&lt;/li&gt;
&lt;li&gt;Consider &lt;code&gt;GROUP BY TTL&lt;/code&gt; for time-series workloads where summarized historical data is sufficient.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;OPTIMIZE TABLE ... FINAL&lt;/code&gt; only for testing, not as a regular maintenance task.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;TTL is one of ClickHouse's most valuable automation features for managing data throughout its lifecycle. With a single declarative rule, you can automatically delete expired data, move historical records to lower-cost storage, apply stronger compression to older data, or aggregate detailed records into summarized insights.&lt;/p&gt;

&lt;p&gt;By leveraging TTL effectively, organizations can reduce storage costs, simplify maintenance, improve long-term performance, and build scalable analytical systems without relying on external cleanup jobs or manual intervention.&lt;/p&gt;

&lt;p&gt;Whether you're managing log data, monitoring metrics, IoT telemetry, or large-scale analytical workloads, TTL provides a clean and efficient approach to implementing automated data retention policies in ClickHouse.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Day 27 of 100 Days of ClickHouse® - Optimizing ClickHouse® Queries for Faster Execution</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Sat, 20 Jun 2026 11:19:10 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-27-of-100-days-of-clickhouser-optimizing-clickhouser-queries-for-faster-execution-5356</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-27-of-100-days-of-clickhouser-optimizing-clickhouser-queries-for-faster-execution-5356</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As datasets grow larger and analytical workloads become more demanding, query performance becomes increasingly important. While ClickHouse® is built for lightning-fast analytical processing, poorly optimized schemas and inefficient queries can still waste CPU cycles, consume excessive memory, and increase execution times.&lt;/p&gt;

&lt;p&gt;The good news is that ClickHouse® provides numerous features to help you optimize both your data model and query patterns. From choosing the right &lt;code&gt;ORDER BY&lt;/code&gt; key to using skip indexes, projections, materialized views, and efficient filtering techniques, a few design decisions can dramatically improve performance.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore practical techniques for optimizing ClickHouse® queries so your analytical workloads remain fast and scalable—even when working with billions of rows.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Query Optimization Matters
&lt;/h1&gt;

&lt;p&gt;ClickHouse® is a column-oriented analytical database that processes data differently from traditional row-based databases.&lt;/p&gt;

&lt;p&gt;It achieves exceptional performance through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Columnar storage&lt;/li&gt;
&lt;li&gt;Vectorized query execution&lt;/li&gt;
&lt;li&gt;Parallel processing&lt;/li&gt;
&lt;li&gt;Data compression&lt;/li&gt;
&lt;li&gt;Sparse primary indexes&lt;/li&gt;
&lt;li&gt;Intelligent data skipping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, these optimizations are only effective when tables are designed correctly and queries take advantage of them.&lt;/p&gt;

&lt;p&gt;Poorly optimized queries typically result in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased query latency&lt;/li&gt;
&lt;li&gt;Higher CPU utilization&lt;/li&gt;
&lt;li&gt;Excessive memory consumption&lt;/li&gt;
&lt;li&gt;Longer dashboard loading times&lt;/li&gt;
&lt;li&gt;Reduced throughput&lt;/li&gt;
&lt;li&gt;Higher infrastructure costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding how ClickHouse® reads and skips data is the foundation of writing efficient queries.&lt;/p&gt;




&lt;h1&gt;
  
  
  1. Choose a Good ORDER BY Key
&lt;/h1&gt;

&lt;p&gt;The &lt;code&gt;ORDER BY&lt;/code&gt; clause in MergeTree tables defines the physical sort order of data on disk.&lt;/p&gt;

&lt;p&gt;Unlike many databases, this is far more than just a sorting preference—it determines how efficiently ClickHouse® can locate data using its sparse primary index.&lt;/p&gt;

&lt;p&gt;A well-designed ORDER BY key allows ClickHouse® to skip large portions of data during query execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="n"&gt;Float64&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now queries filtering on these columns become extremely efficient.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'purchase'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Put frequently filtered columns first.&lt;/li&gt;
&lt;li&gt;Place high-selectivity columns earlier.&lt;/li&gt;
&lt;li&gt;Avoid random or highly unique first columns unless appropriate.&lt;/li&gt;
&lt;li&gt;Design ORDER BY around your most common queries.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  2. Partition Large Tables
&lt;/h1&gt;

&lt;p&gt;Partitioning divides data into separate physical parts.&lt;/p&gt;

&lt;p&gt;When a query filters on the partition key, ClickHouse® skips entire partitions without reading them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="n"&gt;Float64&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toYYYYMM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of scanning years of data, ClickHouse® only reads recent monthly partitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Partition Keys
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;toYYYYMM(timestamp)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;toYYYYMMDD(timestamp)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Region&lt;/li&gt;
&lt;li&gt;Tenant ID (for multi-tenant systems)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid over-partitioning, as too many small partitions can negatively impact performance.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. Avoid SELECT *
&lt;/h1&gt;

&lt;p&gt;ClickHouse® only reads the columns referenced in your query.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;SELECT *&lt;/code&gt; forces the database to read and decompress every column, increasing I/O and CPU usage.&lt;/p&gt;

&lt;p&gt;Slow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always retrieve only the columns your application actually needs.&lt;/p&gt;




&lt;h1&gt;
  
  
  4. Use LowCardinality for String Columns
&lt;/h1&gt;

&lt;p&gt;Columns with relatively few unique values should use &lt;code&gt;LowCardinality&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;status&lt;/li&gt;
&lt;li&gt;country&lt;/li&gt;
&lt;li&gt;region&lt;/li&gt;
&lt;li&gt;event_type&lt;/li&gt;
&lt;li&gt;device_type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benefits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller storage&lt;/li&gt;
&lt;li&gt;Better compression&lt;/li&gt;
&lt;li&gt;Faster filtering&lt;/li&gt;
&lt;li&gt;Faster GROUP BY&lt;/li&gt;
&lt;li&gt;Reduced memory usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a rule of thumb, use &lt;code&gt;LowCardinality&lt;/code&gt; for string columns with fewer than roughly 10,000 distinct values.&lt;/p&gt;




&lt;h1&gt;
  
  
  5. Filter on ORDER BY Columns First
&lt;/h1&gt;

&lt;p&gt;ClickHouse® can only use its sparse primary index efficiently when filtering on the leading columns of the ORDER BY key.&lt;/p&gt;

&lt;p&gt;Given:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fast:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'purchase'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Slow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you frequently filter on non-indexed columns, consider adding a skip index.&lt;/p&gt;




&lt;h1&gt;
  
  
  6. Use Skip Indexes
&lt;/h1&gt;

&lt;p&gt;Skip indexes help ClickHouse® skip granules when filtering on columns outside the primary key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_value&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;minmax&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Materialize it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="n"&gt;MATERIALIZE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse® skips granules whose minimum and maximum values fall outside the requested range.&lt;/p&gt;

&lt;p&gt;Common skip index types include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;minmax&lt;/li&gt;
&lt;li&gt;set&lt;/li&gt;
&lt;li&gt;bloom_filter&lt;/li&gt;
&lt;li&gt;tokenbf_v1&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  7. Avoid Functions on Filtered Columns
&lt;/h1&gt;

&lt;p&gt;Applying functions to indexed columns prevents ClickHouse® from using the primary index efficiently.&lt;/p&gt;

&lt;p&gt;Avoid:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;toYear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead use range filters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Range conditions allow ClickHouse® to leverage index skipping effectively.&lt;/p&gt;




&lt;h1&gt;
  
  
  8. Use PREWHERE
&lt;/h1&gt;

&lt;p&gt;&lt;code&gt;PREWHERE&lt;/code&gt; is a ClickHouse® optimization that filters rows before loading all requested columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="n"&gt;PREWHERE&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads fewer columns initially&lt;/li&gt;
&lt;li&gt;Reduces disk I/O&lt;/li&gt;
&lt;li&gt;Lowers memory consumption&lt;/li&gt;
&lt;li&gt;Speeds up wide-table queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although ClickHouse® automatically applies PREWHERE in many cases, specifying it manually gives you greater control.&lt;/p&gt;




&lt;h1&gt;
  
  
  9. Use SAMPLE for Exploratory Queries
&lt;/h1&gt;

&lt;p&gt;For approximate analytics, SAMPLE dramatically reduces execution time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'purchase'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reads only 10% of the data while producing statistically representative results.&lt;/p&gt;

&lt;p&gt;Remember that SAMPLE requires a table created with a &lt;code&gt;SAMPLE BY&lt;/code&gt; clause.&lt;/p&gt;




&lt;h1&gt;
  
  
  10. Insert Data in Batches
&lt;/h1&gt;

&lt;p&gt;Large batch inserts are far more efficient than numerous single-row inserts.&lt;/p&gt;

&lt;p&gt;Slow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'purchase'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1002&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'view'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'purchase'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1002&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'view'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'click'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Batch inserts reduce merge overhead and improve ingestion throughput.&lt;/p&gt;




&lt;h1&gt;
  
  
  11. Avoid Expensive JOINs
&lt;/h1&gt;

&lt;p&gt;Although JOIN performance has improved significantly, joins remain expensive compared to scanning a single table.&lt;/p&gt;

&lt;p&gt;Whenever possible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Denormalize frequently queried data.&lt;/li&gt;
&lt;li&gt;Join only after filtering.&lt;/li&gt;
&lt;li&gt;Join smaller tables to larger tables.&lt;/li&gt;
&lt;li&gt;Consider Dictionaries for dimension lookups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filter first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reducing rows before the JOIN often leads to substantial performance improvements.&lt;/p&gt;




&lt;h1&gt;
  
  
  12. Choose the Right MergeTree Engine
&lt;/h1&gt;

&lt;p&gt;Different workloads benefit from different MergeTree engines.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MergeTree → General-purpose analytics&lt;/li&gt;
&lt;li&gt;ReplacingMergeTree → Deduplication&lt;/li&gt;
&lt;li&gt;SummingMergeTree → Incremental aggregation&lt;/li&gt;
&lt;li&gt;AggregatingMergeTree → Precomputed aggregates&lt;/li&gt;
&lt;li&gt;CollapsingMergeTree → Event state transitions&lt;/li&gt;
&lt;li&gt;VersionedCollapsingMergeTree → Slowly changing records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the right engine can significantly reduce query complexity and execution time.&lt;/p&gt;




&lt;h1&gt;
  
  
  13. Monitor Query Performance
&lt;/h1&gt;

&lt;p&gt;Use system tables to identify slow queries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;query_duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;read_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;read_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_usage&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query_log&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query duration&lt;/li&gt;
&lt;li&gt;Rows read&lt;/li&gt;
&lt;li&gt;Bytes read&lt;/li&gt;
&lt;li&gt;Memory usage&lt;/li&gt;
&lt;li&gt;Scan efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regular monitoring helps detect regressions before they affect production workloads.&lt;/p&gt;




&lt;h1&gt;
  
  
  14. Use LIMIT Efficiently
&lt;/h1&gt;

&lt;p&gt;If users only require a subset of rows, don't scan the entire dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combined with a suitable ORDER BY key, ClickHouse® can stop reading data earlier.&lt;/p&gt;




&lt;h1&gt;
  
  
  15. Use Appropriate Data Types
&lt;/h1&gt;

&lt;p&gt;Smaller data types improve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compression&lt;/li&gt;
&lt;li&gt;Cache efficiency&lt;/li&gt;
&lt;li&gt;Memory utilization&lt;/li&gt;
&lt;li&gt;Query speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;if appropriate.&lt;/p&gt;

&lt;p&gt;Other recommendations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UInt8 instead of UInt32 where applicable&lt;/li&gt;
&lt;li&gt;Date instead of DateTime if time isn't needed&lt;/li&gt;
&lt;li&gt;Enum or LowCardinality for categorical values&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  16. Use Compression Codecs
&lt;/h1&gt;

&lt;p&gt;Column-specific codecs can further reduce storage while improving read performance.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="n"&gt;Float32&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Gorilla&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DoubleDelta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choosing the right codec depends on the data distribution and access patterns.&lt;/p&gt;




&lt;h1&gt;
  
  
  17. Avoid Large OFFSET Pagination
&lt;/h1&gt;

&lt;p&gt;OFFSET pagination forces ClickHouse® to scan and discard rows.&lt;/p&gt;

&lt;p&gt;Avoid:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prefer keyset pagination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01 12:00:00'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keyset pagination scales much better for large datasets.&lt;/p&gt;




&lt;h1&gt;
  
  
  18. Use Projections
&lt;/h1&gt;

&lt;p&gt;Projections store alternative sorted layouts inside the same table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;PROJECTION&lt;/span&gt; &lt;span class="n"&gt;by_event_type&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Queries filtering by &lt;code&gt;event_type&lt;/code&gt; can automatically use the projection, reducing scan time without changing application queries.&lt;/p&gt;




&lt;h1&gt;
  
  
  19. Use Materialized Views
&lt;/h1&gt;

&lt;p&gt;Repeatedly calculating the same aggregations is expensive.&lt;/p&gt;

&lt;p&gt;Materialized Views compute aggregates during data insertion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;events_mv&lt;/span&gt;
&lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;events_hourly_summary&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;toStartOfHour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now dashboards can query the summary table instead of billions of raw rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;total_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;total_value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events_hourly_summary&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  20. Use EXPLAIN to Understand Query Execution
&lt;/h1&gt;

&lt;p&gt;The &lt;code&gt;EXPLAIN&lt;/code&gt; statement helps visualize how ClickHouse® executes queries.&lt;/p&gt;

&lt;p&gt;Basic example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful options include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PIPELINE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These commands reveal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query pipeline&lt;/li&gt;
&lt;li&gt;Index usage&lt;/li&gt;
&lt;li&gt;Estimated rows&lt;/li&gt;
&lt;li&gt;Execution stages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are invaluable when diagnosing slow queries.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;ClickHouse® is already one of the fastest analytical databases available, but achieving consistent sub-second performance requires thoughtful schema design and efficient query patterns.&lt;/p&gt;

&lt;p&gt;Start with the fundamentals: choose an effective &lt;code&gt;ORDER BY&lt;/code&gt; key, partition wisely, avoid unnecessary column reads, and write filters that leverage the primary index. As your datasets grow, take advantage of advanced features such as skip indexes, PREWHERE, projections, compression codecs, materialized views, and the EXPLAIN statement to fine-tune performance.&lt;/p&gt;

&lt;p&gt;By combining these best practices, ClickHouse® can efficiently process billions of rows with minimal latency, making it an exceptional platform for modern analytical workloads.&lt;/p&gt;

&lt;p&gt;Small optimizations compound over time. The more closely your table design aligns with your query patterns, the greater the performance gains you'll achieve&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>analytics</category>
      <category>database</category>
    </item>
    <item>
      <title>Day 26 of 100 Days of ClickHouse: Understanding Distributed Tables</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Fri, 19 Jun 2026 10:26:37 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-26-of-100-days-of-clickhouse-understanding-distributed-tables-4628</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-26-of-100-days-of-clickhouse-understanding-distributed-tables-4628</guid>
      <description>&lt;p&gt;As your data grows, a single ClickHouse server may eventually reach its limits. Whether it's storage, ingestion speed, or query performance, there comes a point where scaling vertically is no longer enough.&lt;/p&gt;

&lt;p&gt;That's where &lt;strong&gt;Distributed Tables&lt;/strong&gt; come in.&lt;/p&gt;

&lt;p&gt;Distributed tables allow ClickHouse to scale horizontally across multiple servers while providing a single logical interface for querying your data. Instead of worrying about which server stores which records, you query one table, and ClickHouse takes care of the rest.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore what Distributed tables are, how they work, and why they're a key component of large-scale ClickHouse deployments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Do We Need Distributed Tables?
&lt;/h2&gt;

&lt;p&gt;ClickHouse is incredibly fast, and a single server can handle billions of rows. But as applications grow, organizations often need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store more data than a single machine can hold&lt;/li&gt;
&lt;li&gt;Increase data ingestion throughput&lt;/li&gt;
&lt;li&gt;Speed up query execution&lt;/li&gt;
&lt;li&gt;Improve fault tolerance&lt;/li&gt;
&lt;li&gt;Scale across multiple servers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once data is spread across several nodes, querying each server individually becomes impractical.&lt;/p&gt;

&lt;p&gt;Distributed tables solve this problem by providing a single entry point for the entire cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Distributed Table?
&lt;/h2&gt;

&lt;p&gt;A Distributed table is a &lt;strong&gt;logical table&lt;/strong&gt; that sits on top of existing local tables.&lt;/p&gt;

&lt;p&gt;Unlike MergeTree engines, it &lt;strong&gt;does not store any data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead, it forwards queries to the appropriate servers, collects the results, merges them, and returns a unified response.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;events_local&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Distributed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;analytics_cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;events_local&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cityHash64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;analytics_cluster&lt;/code&gt; is the cluster configuration.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;analytics&lt;/code&gt; is the database name.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events_local&lt;/code&gt; is the table that actually stores the data.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cityHash64(user_id)&lt;/code&gt; determines how rows are distributed across shards.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important thing to remember is that the Distributed table contains &lt;strong&gt;no data of its own&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Local Tables vs Distributed Tables
&lt;/h2&gt;

&lt;p&gt;Every server in the cluster stores data inside its own local table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shard 1
└── events_local

Shard 2
└── events_local

Shard 3
└── events_local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Distributed table simply sits above them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;events_distributed
        │
 ┌──────┼──────┐
 │      │      │
Shard1 Shard2 Shard3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse automatically sends the query to the required shards, gathers the partial results, merges them, and returns the final output.&lt;/p&gt;

&lt;p&gt;From the application's perspective, it feels like querying a single table.&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding Sharding
&lt;/h2&gt;

&lt;p&gt;Distributed tables are commonly used with &lt;strong&gt;sharding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Sharding means splitting data across multiple servers.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User 1 → Shard 1
User 2 → Shard 2
User 3 → Shard 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server that stores each row is determined by a &lt;strong&gt;sharding key&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A common choice is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;cityHash64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A good sharding key helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Balance storage across servers&lt;/li&gt;
&lt;li&gt;Distribute query workload&lt;/li&gt;
&lt;li&gt;Prevent bottlenecks&lt;/li&gt;
&lt;li&gt;Improve scalability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the right sharding key is one of the most important design decisions in a distributed ClickHouse cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Distributed Queries Work
&lt;/h2&gt;

&lt;p&gt;Suppose you execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse performs several steps automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sends the query to each participating shard.&lt;/li&gt;
&lt;li&gt;Executes the query locally on every server.&lt;/li&gt;
&lt;li&gt;Collects partial results.&lt;/li&gt;
&lt;li&gt;Merges those results.&lt;/li&gt;
&lt;li&gt;Returns the final aggregated response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because every shard processes its own data simultaneously, queries can scale efficiently as your cluster grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Distributed Inserts
&lt;/h2&gt;

&lt;p&gt;Distributed tables can also be used for inserting data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events_distributed&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of the client deciding which server should receive each row, ClickHouse evaluates the sharding key and forwards the data to the appropriate shard automatically.&lt;/p&gt;

&lt;p&gt;This greatly simplifies application development.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sharding vs Replication
&lt;/h2&gt;

&lt;p&gt;Many beginners confuse sharding with replication, but they solve different problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sharding
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Splits data across multiple servers&lt;/li&gt;
&lt;li&gt;Improves scalability&lt;/li&gt;
&lt;li&gt;Increases storage capacity&lt;/li&gt;
&lt;li&gt;Distributes workload&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Replication
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Creates copies of data&lt;/li&gt;
&lt;li&gt;Provides high availability&lt;/li&gt;
&lt;li&gt;Protects against server failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A production ClickHouse cluster usually combines both techniques.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shard 1
├── Replica A
└── Replica B

Shard 2
├── Replica A
└── Replica B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Distributed table then acts as the single entry point for the cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;When working with Distributed tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Design your sharding key carefully.&lt;/li&gt;
&lt;li&gt;Keep local table schemas identical across all nodes.&lt;/li&gt;
&lt;li&gt;Filter data early to reduce network traffic.&lt;/li&gt;
&lt;li&gt;Minimize unnecessary cross-shard queries.&lt;/li&gt;
&lt;li&gt;Monitor shard balance and cluster health regularly.&lt;/li&gt;
&lt;li&gt;Test your data distribution before moving to production.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;p&gt;Here are a few common misconceptions:&lt;/p&gt;

&lt;h3&gt;
  
  
  Distributed tables store data
&lt;/h3&gt;

&lt;p&gt;They don't. Data is stored only in the local tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replication and distribution are the same
&lt;/h3&gt;

&lt;p&gt;Replication improves availability.&lt;/p&gt;

&lt;p&gt;Distribution improves scalability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Any sharding key will work
&lt;/h3&gt;

&lt;p&gt;A poor sharding key can create uneven data distribution, overloaded servers, and slower queries.&lt;/p&gt;

&lt;p&gt;Always choose a key that aligns with your application's query patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Distributed tables are one of the core building blocks that make ClickHouse capable of handling massive analytical workloads.&lt;/p&gt;

&lt;p&gt;They provide a simple abstraction that lets applications interact with a single logical table while ClickHouse manages query routing, distributed execution, and result aggregation behind the scenes.&lt;/p&gt;

&lt;p&gt;If you're planning to build large-scale analytics platforms with ClickHouse, understanding the relationship between &lt;strong&gt;local tables&lt;/strong&gt;, &lt;strong&gt;Distributed tables&lt;/strong&gt;, &lt;strong&gt;sharding&lt;/strong&gt;, and &lt;strong&gt;replication&lt;/strong&gt; is essential.&lt;/p&gt;

&lt;p&gt;Once these concepts click, scaling ClickHouse across multiple servers becomes much easier to understand and implement.&lt;/p&gt;

&lt;p&gt;Thanks for reading! If you're following along with my &lt;strong&gt;100 Days of ClickHouse&lt;/strong&gt; series, stay tuned for Day 27, where we'll continue exploring another key ClickHouse concept.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Day 25 of 100 Days of ClickHouse: Mastering the ClickHouse HTTP API</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Fri, 19 Jun 2026 06:19:52 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-25-of-100-days-of-clickhouse-mastering-the-clickhouse-http-api-2cpc</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-25-of-100-days-of-clickhouse-mastering-the-clickhouse-http-api-2cpc</guid>
      <description>&lt;h1&gt;
  
  
  ClickHouse HTTP API: A Complete Beginner's Guide
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When most people think about interacting with a database, they usually imagine connecting through a database client or application. However, ClickHouse also provides a simple and powerful HTTP API that allows you to query and manage your database using standard HTTP requests.&lt;/p&gt;

&lt;p&gt;The ClickHouse HTTP API provides a universal interface for communicating with your ClickHouse server. Since almost every programming language and automation tool supports HTTP, it becomes an excellent choice for integrations, monitoring, scripting, and lightweight applications.&lt;/p&gt;

&lt;p&gt;In this guide, you'll learn what the ClickHouse HTTP API is, why it's useful, and how to perform common database operations using simple HTTP requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is the ClickHouse HTTP API?
&lt;/h2&gt;

&lt;p&gt;The ClickHouse HTTP API is a built-in interface that enables clients to communicate with a ClickHouse server using the HTTP protocol.&lt;/p&gt;

&lt;p&gt;Instead of connecting through the native TCP protocol, you simply send HTTP requests and receive responses in formats such as JSON, CSV, TSV, XML, or plain text.&lt;/p&gt;

&lt;p&gt;The HTTP interface is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Language agnostic&lt;/li&gt;
&lt;li&gt;Easy to integrate with web applications&lt;/li&gt;
&lt;li&gt;Firewall friendly&lt;/li&gt;
&lt;li&gt;Simple to test using tools like cURL, Postman, or a web browser&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of its simplicity, the HTTP API is widely used for automation, dashboards, data pipelines, and monitoring systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Use the HTTP API?
&lt;/h2&gt;

&lt;p&gt;The ClickHouse HTTP API offers several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No dedicated database driver is required.&lt;/li&gt;
&lt;li&gt;Works with virtually every programming language.&lt;/li&gt;
&lt;li&gt;Easy integration with REST-based applications.&lt;/li&gt;
&lt;li&gt;Supports multiple output formats.&lt;/li&gt;
&lt;li&gt;Ideal for automation and scripting.&lt;/li&gt;
&lt;li&gt;Perfect for cloud-native applications and microservices.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Operations
&lt;/h2&gt;

&lt;p&gt;Using the HTTP API, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Execute SQL queries&lt;/li&gt;
&lt;li&gt;Insert data&lt;/li&gt;
&lt;li&gt;Create and modify tables&lt;/li&gt;
&lt;li&gt;Retrieve query results&lt;/li&gt;
&lt;li&gt;Export data in different formats&lt;/li&gt;
&lt;li&gt;Automate database operations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Authentication Options
&lt;/h2&gt;

&lt;p&gt;ClickHouse supports multiple authentication methods when using the HTTP API.&lt;/p&gt;

&lt;p&gt;You can authenticate using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;URL parameters&lt;/li&gt;
&lt;li&gt;HTTP Basic Authentication&lt;/li&gt;
&lt;li&gt;HTTP request headers (recommended)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using request headers is generally considered the most secure and maintainable approach for production environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;To get the most out of the ClickHouse HTTP API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always authenticate your requests.&lt;/li&gt;
&lt;li&gt;Use HTTPS in production environments.&lt;/li&gt;
&lt;li&gt;Validate user input before executing queries.&lt;/li&gt;
&lt;li&gt;Limit permissions using appropriate user roles.&lt;/li&gt;
&lt;li&gt;Monitor API usage and query performance.&lt;/li&gt;
&lt;li&gt;Use JSON output for applications that consume API responses.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When Should You Use the HTTP API?
&lt;/h2&gt;

&lt;p&gt;The HTTP API is an excellent choice for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web applications&lt;/li&gt;
&lt;li&gt;REST services&lt;/li&gt;
&lt;li&gt;Automation scripts&lt;/li&gt;
&lt;li&gt;Monitoring tools&lt;/li&gt;
&lt;li&gt;ETL pipelines&lt;/li&gt;
&lt;li&gt;Serverless applications&lt;/li&gt;
&lt;li&gt;Data integrations&lt;/li&gt;
&lt;li&gt;Analytics dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your application already communicates over HTTP, integrating with ClickHouse becomes straightforward.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The ClickHouse HTTP API provides a simple, flexible, and powerful way to interact with your database using standard HTTP requests. Whether you're querying data, inserting records, or automating workflows, it eliminates the need for specialized database drivers while remaining easy to use across different platforms and programming languages.&lt;/p&gt;

&lt;p&gt;For developers building modern analytics applications, automation workflows, or cloud-native services, the HTTP API is one of the easiest ways to integrate ClickHouse into existing systems.&lt;/p&gt;

&lt;p&gt;Learning how to use the HTTP API effectively is an essential step toward becoming proficient with ClickHouse and building scalable, high-performance data solutions.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>database</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Day 24 of 100 Days of ClickHouse: Working with the ClickHouse HTTP API</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Fri, 19 Jun 2026 05:48:21 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-24-of-100-days-of-clickhouse-working-with-the-clickhouse-http-api-45bi</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-24-of-100-days-of-clickhouse-working-with-the-clickhouse-http-api-45bi</guid>
      <description>&lt;p&gt;When most people think about interacting with a database, they imagine using a client library, a graphical interface, or a command-line tool. While these are common approaches, ClickHouse also provides a built-in HTTP API that enables you to communicate with the database using standard HTTP requests.&lt;/p&gt;

&lt;p&gt;This means that if your application, script, or service can send an HTTP request, it can interact with ClickHouse. Whether you're using Python, JavaScript, Go, Java, cURL, Postman, or even a simple shell script, the HTTP API provides a universal and lightweight way to execute queries, insert data, and manage your database.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore how the ClickHouse HTTP API works, how to use it effectively, and why it's such an important feature for modern applications.&lt;/p&gt;




&lt;h1&gt;
  
  
  What Is the ClickHouse HTTP API?
&lt;/h1&gt;

&lt;p&gt;The ClickHouse HTTP API is a built-in interface that allows applications to communicate with ClickHouse over the HTTP protocol.&lt;/p&gt;

&lt;p&gt;Instead of connecting through the native ClickHouse protocol, your application simply sends an HTTP request containing an SQL query, and ClickHouse executes it before returning the result.&lt;/p&gt;

&lt;p&gt;By default, the HTTP API runs on &lt;strong&gt;port 8123&lt;/strong&gt; and supports both &lt;strong&gt;GET&lt;/strong&gt; and &lt;strong&gt;POST&lt;/strong&gt; requests.&lt;/p&gt;

&lt;p&gt;Unlike the native protocol, the HTTP interface is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Language independent&lt;/li&gt;
&lt;li&gt;Easy to integrate with any application&lt;/li&gt;
&lt;li&gt;Firewall friendly&lt;/li&gt;
&lt;li&gt;Ideal for automation&lt;/li&gt;
&lt;li&gt;Simple to test using browsers, cURL, or Postman&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility makes it popular for scripting, automation, lightweight services, and REST-based applications.&lt;/p&gt;




&lt;h1&gt;
  
  
  How the HTTP API Works
&lt;/h1&gt;

&lt;p&gt;The communication flow is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application
     │
     ▼
HTTP Request
     │
     ▼
ClickHouse Server
     │
     ▼
SQL Execution
     │
     ▼
HTTP Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response can be returned in many formats including JSON, CSV, TSV, Parquet, Arrow, and more.&lt;/p&gt;




&lt;h1&gt;
  
  
  Default Ports
&lt;/h1&gt;

&lt;p&gt;ClickHouse exposes multiple interfaces.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Interface&lt;/th&gt;
&lt;th&gt;Default Port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Native TCP&lt;/td&gt;
&lt;td&gt;9000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP API&lt;/td&gt;
&lt;td&gt;8123&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For HTTP-based communication, you'll primarily work with port &lt;strong&gt;8123&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  Prerequisites
&lt;/h1&gt;

&lt;p&gt;Before following along, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A running ClickHouse server&lt;/li&gt;
&lt;li&gt;cURL installed&lt;/li&gt;
&lt;li&gt;Basic SQL knowledge&lt;/li&gt;
&lt;li&gt;Access to port 8123&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Step 1: Verify the HTTP API
&lt;/h1&gt;

&lt;p&gt;The simplest way to verify that the HTTP API is available is by checking the built-in ping endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8123/ping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ok.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you receive this response, your HTTP interface is working correctly.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 2: Execute Your First Query
&lt;/h1&gt;

&lt;p&gt;One of the easiest ways to query ClickHouse is by passing SQL as a URL parameter.&lt;/p&gt;

&lt;p&gt;Using GET:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"http://localhost:8123/?query=SELECT+version()"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;25.3.1.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although GET works well for simple queries, POST requests are generally preferred for larger SQL statements.&lt;/p&gt;

&lt;p&gt;Using POST:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8123/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data-binary&lt;/span&gt; &lt;span class="s2"&gt;"SELECT version()"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;25.3.1.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;POST requests are cleaner and avoid URL length limitations.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 3: Authentication
&lt;/h1&gt;

&lt;p&gt;If authentication is enabled, ClickHouse supports several methods.&lt;/p&gt;

&lt;h3&gt;
  
  
  URL Parameters
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"http://localhost:8123/?user=default&amp;amp;password=yourpassword&amp;amp;query=SELECT+1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HTTP Basic Authentication
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-u&lt;/span&gt; default:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"http://localhost:8123/?query=SELECT+1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HTTP Headers (Recommended)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8123/ &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-ClickHouse-User: default"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-ClickHouse-Key: yourpassword"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-binary&lt;/span&gt; &lt;span class="s2"&gt;"SELECT 1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using request headers is recommended because credentials aren't exposed in URLs or logs.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 4: Selecting a Database
&lt;/h1&gt;

&lt;p&gt;Queries execute against the &lt;strong&gt;default&lt;/strong&gt; database unless another database is specified.&lt;/p&gt;

&lt;p&gt;Specify a database using the request parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-u&lt;/span&gt; default:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"http://localhost:8123/?database=my_database"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-binary&lt;/span&gt; &lt;span class="s2"&gt;"SELECT count() FROM my_table"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or reference the database directly inside the SQL statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;SELECT count&lt;span class="o"&gt;()&lt;/span&gt;
FROM my_database.my_table&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Step 5: Querying Data
&lt;/h1&gt;

&lt;p&gt;Executing SELECT statements is straightforward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-u&lt;/span&gt; default:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
http://localhost:8123/ &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-binary&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM default.products LIMIT 5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default, ClickHouse returns data in &lt;strong&gt;TabSeparated&lt;/strong&gt; format.&lt;/p&gt;




&lt;h1&gt;
  
  
  Output Formats
&lt;/h1&gt;

&lt;p&gt;One of the strengths of the HTTP API is its support for multiple output formats.&lt;/p&gt;

&lt;h2&gt;
  
  
  JSON
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="n"&gt;FORMAT&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;REST APIs&lt;/li&gt;
&lt;li&gt;Web applications&lt;/li&gt;
&lt;li&gt;JavaScript clients&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  CSV
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="n"&gt;FORMAT&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excel&lt;/li&gt;
&lt;li&gt;Data exports&lt;/li&gt;
&lt;li&gt;ETL pipelines&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  JSONEachRow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="n"&gt;FORMAT&lt;/span&gt; &lt;span class="n"&gt;JSONEachRow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each row is returned as an individual JSON document, making it perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Streaming&lt;/li&gt;
&lt;li&gt;Kafka&lt;/li&gt;
&lt;li&gt;Log processing&lt;/li&gt;
&lt;li&gt;Large datasets&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Other Supported Formats
&lt;/h2&gt;

&lt;p&gt;ClickHouse supports numerous formats including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON&lt;/li&gt;
&lt;li&gt;JSONCompact&lt;/li&gt;
&lt;li&gt;JSONEachRow&lt;/li&gt;
&lt;li&gt;CSV&lt;/li&gt;
&lt;li&gt;TSV&lt;/li&gt;
&lt;li&gt;Pretty&lt;/li&gt;
&lt;li&gt;PrettyCompact&lt;/li&gt;
&lt;li&gt;Parquet&lt;/li&gt;
&lt;li&gt;Apache Arrow&lt;/li&gt;
&lt;li&gt;TabSeparated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the appropriate format depends on how downstream systems consume the data.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 6: Inserting Data
&lt;/h1&gt;

&lt;p&gt;The HTTP API also supports data ingestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Insert TabSeparated Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-u&lt;/span&gt; default:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"http://localhost:8123/?query=INSERT+INTO+default.products+FORMAT+TabSeparated"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-binary&lt;/span&gt; &lt;span class="s1"&gt;$'5&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;Mechanical Keyboard&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;Accessories&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;6&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;Monitor 4K&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;Electronics&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Insert CSV
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-u&lt;/span&gt; default:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"http://localhost:8123/?query=INSERT+INTO+default.products+FORMAT+CSV"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-binary&lt;/span&gt; @products.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Insert JSON
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-u&lt;/span&gt; default:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"http://localhost:8123/?query=INSERT+INTO+default.products+FORMAT+JSONEachRow"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-binary&lt;/span&gt; &lt;span class="s1"&gt;'
{"product_id":7,"name":"Webcam HD","category":"Accessories"}
{"product_id":8,"name":"Desk Lamp","category":"Furniture"}
'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This flexibility allows applications to ingest data using the format most convenient for them.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 7: Creating and Managing Tables
&lt;/h1&gt;

&lt;p&gt;The HTTP API supports all standard Data Definition Language (DDL) operations.&lt;/p&gt;

&lt;p&gt;Create a table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;server_id&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;level&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop a table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Similarly, ALTER, TRUNCATE, OPTIMIZE, and other administrative statements are fully supported.&lt;/p&gt;




&lt;h1&gt;
  
  
  Useful HTTP Parameters
&lt;/h1&gt;

&lt;p&gt;Several query parameters can simplify common tasks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;query&lt;/td&gt;
&lt;td&gt;SQL query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;user&lt;/td&gt;
&lt;td&gt;Username&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;password&lt;/td&gt;
&lt;td&gt;Password&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;database&lt;/td&gt;
&lt;td&gt;Database name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;default_format&lt;/td&gt;
&lt;td&gt;Default response format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;max_rows_to_read&lt;/td&gt;
&lt;td&gt;Scan limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;max_execution_time&lt;/td&gt;
&lt;td&gt;Query timeout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;compress&lt;/td&gt;
&lt;td&gt;Enable response compression&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These parameters help control execution behavior without modifying SQL itself.&lt;/p&gt;




&lt;h1&gt;
  
  
  Best Practices
&lt;/h1&gt;

&lt;p&gt;To get the most out of the ClickHouse HTTP API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use POST requests for longer SQL statements.&lt;/li&gt;
&lt;li&gt;Send credentials using HTTP headers instead of URLs.&lt;/li&gt;
&lt;li&gt;Use JSONEachRow for streaming large datasets.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;max_execution_time&lt;/code&gt; to prevent runaway queries.&lt;/li&gt;
&lt;li&gt;Enable compression for large responses.&lt;/li&gt;
&lt;li&gt;Always specify the database explicitly.&lt;/li&gt;
&lt;li&gt;Use official ClickHouse client libraries in production when advanced features like connection pooling and automatic retries are required.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Following these practices results in more secure, reliable, and efficient integrations.&lt;/p&gt;




&lt;h1&gt;
  
  
  HTTP API vs Native Protocol
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;HTTP API&lt;/th&gt;
&lt;th&gt;Native Protocol&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ease of Use&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Driver Required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Universal Support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Driver Dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automation&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk Inserts&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The HTTP API is ideal for integrations, scripting, and lightweight services, while the native protocol remains the best option for performance-critical applications.&lt;/p&gt;




&lt;h1&gt;
  
  
  Real-World Use Cases
&lt;/h1&gt;

&lt;p&gt;The ClickHouse HTTP API is commonly used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building REST APIs&lt;/li&gt;
&lt;li&gt;Internal dashboards&lt;/li&gt;
&lt;li&gt;Automation scripts&lt;/li&gt;
&lt;li&gt;Monitoring systems&lt;/li&gt;
&lt;li&gt;CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Serverless applications&lt;/li&gt;
&lt;li&gt;ETL workflows&lt;/li&gt;
&lt;li&gt;Data export services&lt;/li&gt;
&lt;li&gt;Microservices&lt;/li&gt;
&lt;li&gt;Third-party integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because HTTP is universally supported, integrating ClickHouse into existing systems becomes remarkably simple.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;The ClickHouse HTTP API demonstrates how powerful database interactions can also be simple. By exposing SQL operations over standard HTTP, ClickHouse removes the dependency on specialized drivers and enables seamless integration with virtually any programming language or platform.&lt;/p&gt;

&lt;p&gt;Whether you're executing queries, loading data, automating administrative tasks, or building cloud-native applications, the HTTP API offers a flexible and reliable interface that scales from quick experiments to production workloads.&lt;/p&gt;

&lt;p&gt;As your ClickHouse deployments grow, understanding the HTTP API becomes increasingly valuable. It's a feature that bridges the gap between databases and modern software architectures, making automation, integration, and analytics significantly easier.&lt;/p&gt;

&lt;p&gt;In the next article of this &lt;strong&gt;100 Days of ClickHouse&lt;/strong&gt; series, we'll continue exploring another powerful capability that helps you build faster, more scalable analytical systems.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Day 23 of 100 Days of ClickHouse: Backups Explained – A Complete Guide to BACKUP and RESTORE</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Thu, 18 Jun 2026 08:33:59 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-23-of-100-days-of-clickhouse-backups-explained-a-complete-guide-to-backup-and-restore-1en4</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-23-of-100-days-of-clickhouse-backups-explained-a-complete-guide-to-backup-and-restore-1en4</guid>
      <description>&lt;h1&gt;
  
  
  ClickHouse Backups Explained: How to Protect Your Data with BACKUP and RESTORE
&lt;/h1&gt;

&lt;p&gt;Data is one of the most valuable assets for any organization. Whether you're using ClickHouse for observability, log analytics, real-time dashboards, or business intelligence, losing data can lead to downtime, inaccurate reporting, and difficult recovery efforts.&lt;/p&gt;

&lt;p&gt;Although ClickHouse is designed for high availability and performance, no database is immune to accidental deletions, hardware failures, software bugs, or human error. That's why every production deployment should include a reliable backup strategy.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore why backups matter, the different backup types available, and how to use ClickHouse's built-in &lt;code&gt;BACKUP&lt;/code&gt; and &lt;code&gt;RESTORE&lt;/code&gt; commands to safeguard your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Backups Are Essential
&lt;/h2&gt;

&lt;p&gt;A backup is a recoverable copy of your database's data and metadata. It allows you to restore your environment to a known good state after data loss or corruption.&lt;/p&gt;

&lt;p&gt;One common misconception is that &lt;strong&gt;replication replaces backups&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It doesn't.&lt;/p&gt;

&lt;p&gt;Replication protects against server or node failures by maintaining copies of data across multiple replicas. However, if someone accidentally drops a table, deletes important records, or inserts incorrect data, those changes are replicated as well.&lt;/p&gt;

&lt;p&gt;Typical scenarios where backups become essential include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accidental table deletion&lt;/li&gt;
&lt;li&gt;Incorrect UPDATE or DELETE operations&lt;/li&gt;
&lt;li&gt;Hardware or storage failures&lt;/li&gt;
&lt;li&gt;Software bugs&lt;/li&gt;
&lt;li&gt;Disaster recovery&lt;/li&gt;
&lt;li&gt;Compliance and long-term retention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A backup provides an independent recovery point that replication alone cannot offer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Backup Types
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Full Backup
&lt;/h3&gt;

&lt;p&gt;A full backup contains a complete copy of the selected database, including both metadata and data.&lt;/p&gt;

&lt;p&gt;It's commonly used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before major upgrades&lt;/li&gt;
&lt;li&gt;Before migrations&lt;/li&gt;
&lt;li&gt;As the baseline for future backups&lt;/li&gt;
&lt;li&gt;For long-term archival&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although full backups require more storage, they provide the fastest recovery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incremental Backup
&lt;/h3&gt;

&lt;p&gt;Incremental backups store only the data that has changed since the previous backup.&lt;/p&gt;

&lt;p&gt;Benefits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster backup creation&lt;/li&gt;
&lt;li&gt;Lower storage consumption&lt;/li&gt;
&lt;li&gt;Ideal for large production datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many organizations schedule weekly full backups with daily incremental backups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Differential Backup
&lt;/h3&gt;

&lt;p&gt;A differential backup stores all changes made since the last full backup.&lt;/p&gt;

&lt;p&gt;Compared to incremental backups, they require more storage but simplify the restore process because only the latest differential backup and the original full backup are required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backup Methods in ClickHouse
&lt;/h2&gt;

&lt;p&gt;ClickHouse supports several approaches to protecting data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Native BACKUP and RESTORE
&lt;/h3&gt;

&lt;p&gt;Modern ClickHouse versions include built-in backup functionality, making it easy to create and restore backups without external tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  File System Backups
&lt;/h3&gt;

&lt;p&gt;Administrators can also back up the underlying data directories directly.&lt;/p&gt;

&lt;p&gt;While simple, this method requires additional care to ensure consistency during backup operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage Snapshots
&lt;/h3&gt;

&lt;p&gt;Infrastructure providers often support volume snapshots, allowing administrators to capture the state of storage at a specific point in time.&lt;/p&gt;

&lt;p&gt;These are commonly used in cloud deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Third-Party Tools
&lt;/h3&gt;

&lt;p&gt;Utilities such as &lt;code&gt;clickhouse-backup&lt;/code&gt; provide advanced capabilities including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled backups&lt;/li&gt;
&lt;li&gt;Cloud storage integration&lt;/li&gt;
&lt;li&gt;Retention management&lt;/li&gt;
&lt;li&gt;Backup automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools are popular in larger production environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Backups Using Native Commands
&lt;/h2&gt;

&lt;p&gt;Before using the backup feature, configure a backup disk in the ClickHouse configuration.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;storage_configuration&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;disks&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;backups&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;type&amp;gt;&lt;/span&gt;local&lt;span class="nt"&gt;&amp;lt;/type&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;path&amp;gt;&lt;/span&gt;/backups/&lt;span class="nt"&gt;&amp;lt;/path&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/backups&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/disks&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/storage_configuration&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Backup an Entire Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;BACKUP&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;
&lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'backups'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'analytics_backup.zip'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Backup a Single Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;BACKUP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'backups'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'events_backup.zip'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create an Incremental Backup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;BACKUP&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;
&lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'backups'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'analytics_incremental.zip'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;base_backup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'backups'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'analytics_full_backup.zip'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only the changes since the base backup are stored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Restoring Data
&lt;/h2&gt;

&lt;p&gt;Restoring an entire database is equally simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;RESTORE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'backups'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'analytics_backup.zip'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To restore a single table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;RESTORE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'backups'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'events_backup.zip'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse recreates the metadata and restores the associated data from the backup archive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verify Your Backups
&lt;/h2&gt;

&lt;p&gt;Creating backups is only part of the process.&lt;/p&gt;

&lt;p&gt;A backup that has never been tested shouldn't be assumed to work.&lt;/p&gt;

&lt;p&gt;After every backup, consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifying backup files exist&lt;/li&gt;
&lt;li&gt;Reviewing ClickHouse logs&lt;/li&gt;
&lt;li&gt;Performing periodic test restores&lt;/li&gt;
&lt;li&gt;Validating restored data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regular testing ensures your recovery process works when it matters most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backup Best Practices
&lt;/h2&gt;

&lt;p&gt;A few practices can significantly improve backup reliability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automate backup schedules instead of relying on manual execution.&lt;/li&gt;
&lt;li&gt;Store backups in multiple locations, including remote or cloud storage.&lt;/li&gt;
&lt;li&gt;Test restore procedures regularly.&lt;/li&gt;
&lt;li&gt;Define retention policies based on business and compliance requirements.&lt;/li&gt;
&lt;li&gt;Monitor backup jobs and storage utilization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These practices reduce operational risk and improve recovery readiness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;High availability and backups solve different problems.&lt;/p&gt;

&lt;p&gt;Replication keeps your ClickHouse cluster running during infrastructure failures, while backups protect against data loss, corruption, and human error.&lt;/p&gt;

&lt;p&gt;By combining native &lt;code&gt;BACKUP&lt;/code&gt; and &lt;code&gt;RESTORE&lt;/code&gt; commands with automated scheduling, retention policies, and regular restore testing, you can build a recovery strategy that keeps your analytical platform resilient.&lt;/p&gt;

&lt;p&gt;No backup strategy is complete until you've successfully restored from it. The ability to recover quickly is what ultimately determines the value of every backup.&lt;/p&gt;

&lt;p&gt;Read more... &lt;a href="https://quantrail-data.com/how-to-back-up-your-clickhouse-database/" rel="noopener noreferrer"&gt;https://quantrail-data.com/how-to-back-up-your-clickhouse-database/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Why ClickHouse Merges and Mutations Are Difficult to Track in Production</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Thu, 18 Jun 2026 07:25:40 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/why-clickhouse-merges-and-mutations-are-difficult-to-track-in-production-5aj3</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/why-clickhouse-merges-and-mutations-are-difficult-to-track-in-production-5aj3</guid>
      <description>&lt;p&gt;One of the reasons ClickHouse delivers exceptional analytical performance is its ability to optimize data in the background. While users focus on writing fast SQL queries, ClickHouse is continuously performing maintenance tasks such as merges and mutations to keep storage efficient and queries fast.&lt;/p&gt;

&lt;p&gt;These background operations are essential, but they're also one of the least visible aspects of running ClickHouse in production. Without proper monitoring, they can silently become bottlenecks, leading to slower queries, delayed data processing, and even production errors.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore how merges and mutations work, why they're difficult to monitor, and what teams can do to improve observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Merges
&lt;/h2&gt;

&lt;p&gt;ClickHouse stores data in immutable parts. Every &lt;code&gt;INSERT&lt;/code&gt; creates a new data part instead of modifying existing files.&lt;/p&gt;

&lt;p&gt;As more data is ingested, the number of parts grows. To prevent excessive fragmentation, ClickHouse automatically merges smaller parts into larger ones in the background.&lt;/p&gt;

&lt;p&gt;This process helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce the total number of parts&lt;/li&gt;
&lt;li&gt;Improve query performance&lt;/li&gt;
&lt;li&gt;Lower metadata overhead&lt;/li&gt;
&lt;li&gt;Optimize disk usage&lt;/li&gt;
&lt;li&gt;Keep MergeTree tables healthy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without regular merges, thousands of small parts can accumulate, making both queries and inserts less efficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Mutations
&lt;/h2&gt;

&lt;p&gt;Operations such as &lt;code&gt;UPDATE&lt;/code&gt; and &lt;code&gt;DELETE&lt;/code&gt; work differently in ClickHouse than they do in traditional transactional databases.&lt;/p&gt;

&lt;p&gt;Instead of modifying rows immediately, ClickHouse schedules these operations as &lt;strong&gt;mutations&lt;/strong&gt;, which are processed asynchronously in the background.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;event_date&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'inactive'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;last_login&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This architecture keeps write performance high but means data modifications may take time to complete, especially on large tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Monitoring Is Challenging
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Limited Historical Visibility
&lt;/h3&gt;

&lt;p&gt;ClickHouse provides system tables such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;system.merges&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;system.mutations&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;system.parts&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tables are extremely useful for checking the current state of background operations.&lt;/p&gt;

&lt;p&gt;The limitation is that they primarily provide a snapshot of what's happening now. Once a merge or mutation completes, much of that operational history disappears unless you've collected it yourself.&lt;/p&gt;

&lt;p&gt;This makes post-incident analysis significantly more difficult.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Too Many Parts" Problem
&lt;/h2&gt;

&lt;p&gt;One of the most common production issues is the &lt;strong&gt;"Too many parts"&lt;/strong&gt; error.&lt;/p&gt;

&lt;p&gt;It usually indicates that new parts are being created faster than background merges can combine them.&lt;/p&gt;

&lt;p&gt;When this happens, organizations may experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slower inserts&lt;/li&gt;
&lt;li&gt;Higher query latency&lt;/li&gt;
&lt;li&gt;Increased storage overhead&lt;/li&gt;
&lt;li&gt;Overloaded merge queues&lt;/li&gt;
&lt;li&gt;Reduced cluster stability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unfortunately, by the time this error appears, the underlying problem has often been developing for hours or days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mutation Backlogs
&lt;/h2&gt;

&lt;p&gt;Mutations are executed sequentially.&lt;/p&gt;

&lt;p&gt;A large &lt;code&gt;DELETE&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, or schema-related operation can remain active for a long time, preventing subsequent mutations from being processed.&lt;/p&gt;

&lt;p&gt;As the backlog grows, teams may notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delayed data cleanup&lt;/li&gt;
&lt;li&gt;Growing storage consumption&lt;/li&gt;
&lt;li&gt;Slower maintenance tasks&lt;/li&gt;
&lt;li&gt;Longer processing times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without continuous monitoring, these queues often remain unnoticed until they begin affecting production workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reactive Troubleshooting
&lt;/h2&gt;

&lt;p&gt;Many administrators investigate issues by manually querying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merges&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mutations&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although these queries are useful, they don't provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Historical trends&lt;/li&gt;
&lt;li&gt;Long-term metrics&lt;/li&gt;
&lt;li&gt;Automatic alerts&lt;/li&gt;
&lt;li&gt;Centralized dashboards&lt;/li&gt;
&lt;li&gt;Anomaly detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, troubleshooting often becomes reactive rather than proactive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;To maintain a healthy ClickHouse cluster, consider monitoring background operations alongside traditional infrastructure metrics.&lt;/p&gt;

&lt;p&gt;Useful metrics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active merge count&lt;/li&gt;
&lt;li&gt;Merge duration&lt;/li&gt;
&lt;li&gt;Mutation queue size&lt;/li&gt;
&lt;li&gt;Mutation progress&lt;/li&gt;
&lt;li&gt;Number of table parts&lt;/li&gt;
&lt;li&gt;Background thread utilization&lt;/li&gt;
&lt;li&gt;Resource consumption during merges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Storing these metrics over time enables trend analysis, capacity planning, and faster root-cause analysis.&lt;/p&gt;

&lt;p&gt;Creating dashboards and alerts for merge delays, increasing part counts, or mutation backlogs can help identify issues before they impact users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Merges and mutations are fundamental to ClickHouse's performance and storage efficiency, but they often receive far less attention than query optimization.&lt;/p&gt;

&lt;p&gt;While ClickHouse provides excellent visibility into current background activity, long-term observability requires additional monitoring and historical metrics.&lt;/p&gt;

&lt;p&gt;By treating merges and mutations as first-class operational metrics, teams can reduce downtime, improve cluster health, and avoid many of the production issues that arise from unseen background processes.&lt;/p&gt;

&lt;p&gt;A well-monitored ClickHouse cluster isn't just one that answers queries quickly—it's one where the background maintenance processes are just as visible as the queries themselves.&lt;/p&gt;

&lt;p&gt;Link -&amp;gt; &lt;a href="https://quantrail-data.com/clickhouse-merges-and-mutations-the-hidden-performance-monitoring-challenge/" rel="noopener noreferrer"&gt;https://quantrail-data.com/clickhouse-merges-and-mutations-the-hidden-performance-monitoring-challenge/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>observability</category>
    </item>
    <item>
      <title>Day 22 of 100 Days of ClickHouse: Exploring High-Speed Analytics</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Wed, 17 Jun 2026 09:10:28 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/working-with-json-in-clickhouser-choosing-the-right-approach-32g6</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/working-with-json-in-clickhouser-choosing-the-right-approach-32g6</guid>
      <description>&lt;p&gt;One of the most common questions developers ask when working with ClickHouse® is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How should I store JSON data?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer isn't as simple as choosing between a &lt;code&gt;String&lt;/code&gt; column and the native &lt;code&gt;JSON&lt;/code&gt; data type. It depends on how your application ingests, stores, and queries data.&lt;/p&gt;

&lt;p&gt;JSON has become the standard format for modern applications. Every API request, application log, event stream, telemetry record, and user interaction is typically represented as JSON before it reaches a database. Its schema flexibility makes it ideal for evolving systems where new attributes can appear without requiring immediate database migrations.&lt;/p&gt;

&lt;p&gt;However, analytical databases are built for speed, and speed comes from structure.&lt;/p&gt;

&lt;p&gt;Historically, the most common approach in ClickHouse® was storing JSON as a &lt;code&gt;String&lt;/code&gt; and extracting fields during query execution using functions like &lt;code&gt;JSONExtractString()&lt;/code&gt;, &lt;code&gt;JSONExtractUInt()&lt;/code&gt;, and &lt;code&gt;JSONExtractBool()&lt;/code&gt;. This approach is incredibly flexible because the original document is preserved exactly as it was received.&lt;/p&gt;

&lt;p&gt;But there's a trade-off.&lt;/p&gt;

&lt;p&gt;Every query that accesses a field has to parse the JSON document again. On small datasets, this overhead is negligible. On billions of rows, repeatedly parsing JSON becomes expensive and increases CPU utilization.&lt;/p&gt;

&lt;p&gt;This is one of the reasons why ClickHouse introduced the native &lt;code&gt;JSON&lt;/code&gt; data type.&lt;/p&gt;

&lt;p&gt;Instead of treating JSON as plain text, ClickHouse understands the document's structure internally. More importantly, it uses &lt;strong&gt;lazy parsing&lt;/strong&gt;, meaning only the fields referenced in a query are processed. If your query only needs &lt;code&gt;user_id&lt;/code&gt;, ClickHouse doesn't waste time parsing every nested attribute in the document.&lt;/p&gt;

&lt;p&gt;This significantly improves efficiency for many semi-structured workloads while preserving the flexibility developers expect from JSON.&lt;/p&gt;

&lt;p&gt;That said, native JSON isn't a silver bullet.&lt;/p&gt;

&lt;p&gt;One of the biggest misconceptions is believing that once native JSON is available, every attribute should remain inside a JSON object.&lt;/p&gt;

&lt;p&gt;In reality, query patterns should drive schema design.&lt;/p&gt;

&lt;p&gt;If a field is frequently used in &lt;code&gt;WHERE&lt;/code&gt; clauses, &lt;code&gt;GROUP BY&lt;/code&gt; operations, joins, dashboards, or reports, it usually deserves its own dedicated column. Structured columns allow ClickHouse to optimize storage, indexing, and query execution far better than repeatedly navigating JSON paths.&lt;/p&gt;

&lt;p&gt;This leads to what many production systems adopt: a &lt;strong&gt;hybrid approach&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Core business attributes—such as &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;event_type&lt;/code&gt;, or &lt;code&gt;timestamp&lt;/code&gt;—are stored as dedicated columns because they're queried constantly. Additional metadata that changes frequently or isn't accessed often remains inside a JSON column.&lt;/p&gt;

&lt;p&gt;This provides the best of both worlds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast analytical queries&lt;/li&gt;
&lt;li&gt;Flexible schemas&lt;/li&gt;
&lt;li&gt;Simpler ingestion pipelines&lt;/li&gt;
&lt;li&gt;Lower maintenance as applications evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another important lesson is that ingestion patterns and query patterns are rarely the same.&lt;/p&gt;

&lt;p&gt;Just because data arrives as JSON doesn't mean it should be stored exactly that way forever. Designing your schema around how analysts and applications actually consume data often leads to much better long-term performance.&lt;/p&gt;

&lt;p&gt;As developers, it's easy to focus on making ingestion simple. But in analytical systems, query performance is usually what determines the overall user experience.&lt;/p&gt;

&lt;p&gt;My biggest takeaway from today's learning is that &lt;strong&gt;JSON is a tool—not a schema design strategy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ClickHouse gives us multiple options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store raw JSON as &lt;code&gt;String&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use the native &lt;code&gt;JSON&lt;/code&gt; data type for evolving schemas&lt;/li&gt;
&lt;li&gt;Model frequently accessed attributes as dedicated columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the right combination depends on your workload, your data, and your query patterns.&lt;/p&gt;

&lt;p&gt;Understanding these trade-offs is what separates a database that simply works from one that scales efficiently as your data grows from millions to billions of records.&lt;/p&gt;

&lt;p&gt;How are you handling JSON in your analytics stack? Are you using native JSON, traditional extraction functions, or a hybrid schema?&lt;/p&gt;

&lt;p&gt;I'd love to hear about your experience.&lt;/p&gt;

&lt;p&gt;Read more... &lt;a href="https://quantrail-data.com/working-with-json-in-clickhouse-guide/" rel="noopener noreferrer"&gt;https://quantrail-data.com/working-with-json-in-clickhouse-guide/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Day 21 : Time-Series Data in ClickHouse®</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Wed, 17 Jun 2026 06:49:57 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-21-summary-time-series-data-in-clickhouser-24i9</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-21-summary-time-series-data-in-clickhouser-24i9</guid>
      <description>&lt;p&gt;Time-series data is one of the most common types of data generated by modern applications. Every log entry, API request, metric, transaction, sensor reading, or user interaction is recorded with a timestamp, making time the primary dimension for analysis. As organizations collect billions of these records, efficiently storing and querying them becomes increasingly challenging.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;ClickHouse®&lt;/strong&gt; excels.&lt;/p&gt;

&lt;p&gt;Although ClickHouse is not a dedicated time-series database, its columnar storage architecture, vectorized query execution, high compression ratios, and massively parallel processing make it an excellent choice for time-series analytics at scale. It is capable of ingesting large volumes of data while delivering analytical queries in milliseconds.&lt;/p&gt;

&lt;p&gt;The article begins by explaining the fundamentals of time-series data and highlighting common real-world use cases such as application monitoring, IoT sensor data, financial market analysis, server metrics, user activity tracking, and business analytics. These workloads typically involve continuous data ingestion, time-based filtering, aggregations, and trend analysis.&lt;/p&gt;

&lt;p&gt;One of ClickHouse's biggest strengths is its optimization for analytical workloads. Since data is stored column-wise rather than row-wise, only the required columns are read during query execution. Combined with compression and vectorized processing, this significantly reduces I/O and improves query performance over massive datasets.&lt;/p&gt;

&lt;p&gt;The article also demonstrates how to create an optimized table for time-series workloads using the &lt;code&gt;MergeTree&lt;/code&gt; engine. Proper partitioning by month and ordering data by dimensions and timestamps help ClickHouse prune unnecessary partitions and efficiently locate relevant data during queries.&lt;/p&gt;

&lt;p&gt;Several practical SQL examples are covered, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filtering records within a specific time range&lt;/li&gt;
&lt;li&gt;Aggregating metrics by hour, day, week, or month&lt;/li&gt;
&lt;li&gt;Calculating averages, sums, minimums, and maximums&lt;/li&gt;
&lt;li&gt;Grouping events over time&lt;/li&gt;
&lt;li&gt;Working with ClickHouse date and time functions&lt;/li&gt;
&lt;li&gt;Performing efficient trend and time-window analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A dedicated section introduces essential ClickHouse date and time functions such as &lt;code&gt;toStartOfHour()&lt;/code&gt;, &lt;code&gt;toStartOfDay()&lt;/code&gt;, &lt;code&gt;toStartOfMonth()&lt;/code&gt;, &lt;code&gt;toYYYYMM()&lt;/code&gt;, &lt;code&gt;dateDiff()&lt;/code&gt;, &lt;code&gt;toUnixTimestamp()&lt;/code&gt;, and other utilities that simplify time-based transformations and reporting.&lt;/p&gt;

&lt;p&gt;The article concludes with several best practices for building scalable time-series solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Include the timestamp column in the &lt;code&gt;ORDER BY&lt;/code&gt; key for efficient range scans.&lt;/li&gt;
&lt;li&gt;Partition tables by month or day to improve partition pruning.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;LowCardinality&lt;/code&gt; for string columns with limited distinct values.&lt;/li&gt;
&lt;li&gt;Prefer &lt;code&gt;DateTime64&lt;/code&gt; when millisecond precision is required.&lt;/li&gt;
&lt;li&gt;Choose appropriate partitioning and sorting keys based on query patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, this article demonstrates why ClickHouse has become a popular choice for large-scale time-series analytics. Its combination of SQL compatibility, high ingestion throughput, efficient storage, and exceptional query performance allows developers and data engineers to analyze massive volumes of time-stamped data without the complexity of learning a proprietary time-series database.&lt;/p&gt;

&lt;p&gt;Read more - &lt;a href="https://quantrail-data.com/clickhouse-for-time-series-data-a-quick-introduction/" rel="noopener noreferrer"&gt;https://quantrail-data.com/clickhouse-for-time-series-data-a-quick-introduction/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>devops</category>
      <category>clickhouse</category>
    </item>
    <item>
      <title>Day 20 of 100 Days of ClickHouse® - Introduction to Materialized Views</title>
      <dc:creator>Kanishga Subramani</dc:creator>
      <pubDate>Wed, 17 Jun 2026 05:37:53 +0000</pubDate>
      <link>https://dev.to/kanishga_subramani_49ad73/day-20-of-100-days-of-clickhouser-introduction-to-materialized-views-4f6p</link>
      <guid>https://dev.to/kanishga_subramani_49ad73/day-20-of-100-days-of-clickhouser-introduction-to-materialized-views-4f6p</guid>
      <description>&lt;p&gt;One of the most common performance bottlenecks in analytics is running the same expensive aggregations over and over again.&lt;/p&gt;

&lt;p&gt;Imagine querying billions of rows every time a dashboard loads just to calculate daily events, page views, or user activity. As your data grows, those queries become slower and consume more CPU.&lt;/p&gt;

&lt;p&gt;This is exactly the problem &lt;strong&gt;Materialized Views&lt;/strong&gt; solve in ClickHouse®.&lt;/p&gt;

&lt;p&gt;Instead of performing aggregations during query execution, a Materialized View processes new data as it's inserted and writes the transformed or aggregated results into a separate target table.&lt;/p&gt;

&lt;p&gt;A few important concepts every ClickHouse developer should know:&lt;/p&gt;

&lt;p&gt;• Materialized Views process &lt;strong&gt;only newly inserted data blocks&lt;/strong&gt;—they don't rescan the entire source table.&lt;br&gt;
• Existing data isn't included automatically. While the &lt;code&gt;POPULATE&lt;/code&gt; option exists, it's generally not recommended for large production datasets. A manual backfill is usually the safer approach.&lt;br&gt;
• The target table engine matters:&lt;br&gt;
• &lt;code&gt;MergeTree&lt;/code&gt; for transformed data&lt;br&gt;
• &lt;code&gt;SummingMergeTree&lt;/code&gt; for simple numeric aggregations&lt;br&gt;
• &lt;code&gt;AggregatingMergeTree&lt;/code&gt; for aggregate states and advanced analytics&lt;br&gt;
• Materialized Views aren't limited to aggregations—they can also clean, enrich, and transform incoming data before it's stored.&lt;/p&gt;

&lt;p&gt;Like every optimization, there's a trade-off.&lt;/p&gt;

&lt;p&gt;You're shifting work from &lt;strong&gt;query time&lt;/strong&gt; to &lt;strong&gt;insert time&lt;/strong&gt;, which means inserts do a little more work, but reads become dramatically faster. For reporting systems, dashboards, and analytical workloads, that's often a worthwhile trade.&lt;/p&gt;

&lt;p&gt;In today's article, I cover the architecture of Materialized Views, how they work internally, common use cases, engine selection, limitations, and production best practices.&lt;/p&gt;

&lt;p&gt;Have you used Materialized Views in production? What kind of workloads have they helped you optimize?&lt;/p&gt;

&lt;p&gt;Read more... &lt;a href="https://quantrail-data.com/introduction-to-clickhouse-materialized-views/" rel="noopener noreferrer"&gt;https://quantrail-data.com/introduction-to-clickhouse-materialized-views/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  100DaysOfClickHouse #ClickHouse #DataEngineering #Database #SQL #OLAP #BackendEngineering #Performance #DistributedSystems #BigData
&lt;/h1&gt;

</description>
      <category>clickhouse</category>
      <category>devops</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
