In the previous parts, we saw an introduction to ClickHouse and its features. Furthermore, we learned about its different table engine families and their most usable members. In this part, I will walk through the special keys and indexes in ClickHouse, which can help reduce query latency and database load significantly.
It should be said that these concepts are only applicable to the default table engine family: Merge-Trees.
Primary Key
ClickHouse indexes are based on Sparse Indexing, an alternative to the B-Tree index utilized by traditional DBMSs. In B-tree, every row is indexed, which is suitable for locating and updating a single row, also known as pointy-queries common in OLTP tasks. This comes with the cost of poor performance on high-volume insert speed and high memory and storage consumption. On the contrary, the sparse index splits data into multiple parts, each group by a fixed portion called granules. ClickHouse considers an index for every granule (group of data) instead of every row, and that's where the sparse index term comes from. Having a query filtered on the primary keys, ClickHouse looks for those granules and loads the matched granules in parallel to the memory. That brings a notable performance on range queries common in OLAP tasks. Additionally, as data is stored in columns in multiple files, it can be compressed, resulting in much less storage consumption.
The nature of the spars-index is based on LSM trees allowing you to insert high-volume data per second. All these come with the cost of not being suitable for pointy queries, which is not the purpose of the ClickHouse.
Structure
In the below figure, we can see how ClickHouse stores data:
- Data is split into multiple parts (ClickHouse default or user-defined partition key)
- Parts are split in granules which is a logical concept, and ClickHouse doesn't split data into them as the physical. Instead, it can locate the granules via the marks.
Granules' locations (start and end) are defined in the mark files with the
mrk2
extension. - Index values are stored in the
primary.idx
file, which contains one row per granule. - Columns are stored as compressed blocks in
.bin
files: One file for every column in theWide
and a single file for all columns in theCompact
format. Being Wide or Compact is determined by ClickHouse based on the size of the columns.
Now let's see how ClickHouse finds the matching rows using primary keys:
- ClickHouse finds the matching granule marks utilizing the
primary.idx
file via the binary search. - Looks into the mark files to find the granules' location in the
bin
files. - Loads the matching granules from the
bin
files into the memory in parallel and looks for the matching rows in those granules using binary search.
Case Study
To clarify the flow mentioned above, let's create a table and insert data into it:
CREATE TABLE default.projects
(
`project_id` UInt32,
`name` String,
`created_date` Date
)
ENGINE = MergeTree
ORDER BY (project_id, created_date)
INSERT INTO projects
SELECT * FROM generateRandom('project_id Int32, name String, created_date Date', 10, 10, 1)
LIMIT 10000000;
First, if you don't specify primary keys separately, ClickHouse will consider sort keys (in order by) as primary keys. Hence, in this table, project_id
and created_date
are the primary keys. Every time you insert data into this table, it will sort data first by project_id
and then by created_date
.
If we look into the data structure stored on the hard drive, we face this:
We have five parts, and one of them is: all_1_1_0
. You can visit this link if you're curious about the naming convention. As you can see, columns are stored in bin
files, and we see mark files named as primary keys along with the primary.idx
file.
Filter on the first primary-key
Now let's filter on project_id
, which is the first primary key, and explain its indexes:
As you can see, the system has detected project_id
as a primary key and ruled out 1224 granules out of 1225 using it!
Filter on second primary-key
What if we filter on created_date
: the second PK:
EXPLAIN indexes=1
SELECT * FROM projects WHERE created_date=today()
The database has detected created_date
as a primary key, but it hasn't been able to filter any granules. Why?
Because ClickHouse uses binary search only for the first key and generic exclusive search for other keys, which is much less efficient than the former. So how can we make it more efficient?
If we substitute project_id
and created_date
in the sort keys while defining the table, you will achieve better results in filtering for the non-first keys since the created_date has lower cardinality (uniqueness) than the project_id
:
CREATE TABLE default.projects
(
`project_id` UInt32,
`name` String,
`created_date` Date
)
ENGINE = MergeTree
ORDER BY (created_date, project_id)
EXPLAIN indexes=1
SELECT * FROM projects WHERE project_id=700
If we filter on the project_id
, the second key, now ClickHouse, would use only 909 granules instead of the whole data.
So to summarize, always try to order the primary keys from low to high cardinality.
Order Key
I mentioned earlier that if you don't specify the PRIMARY KEY
option, ClickHouse considers sort keys as the primary keys. However, if you want to set primary keys separately, it should be a subset of the sort keys. As a result, additional keys specified in the sort keys are only utilized for sorting purposes and don't play any role in indexing.
CREATE TABLE default.projects
(
`project_id` UInt32,
`name` String,
`created_date` Date
)
ENGINE = MergeTree
PRIMARY KEY (created_date, project_id)
ORDER BY (created_date, project_id, name)
In this example, created_date
and project_id
columns are utilized in the sparse index and sorting, and name
column is only used as the last item for sorting.
Use this option if you wish to use a column in the ORDER BY
part of the query since it will eliminate the database sorting effort while running it.
Partition Key
A partition is a logical combination of parts in ClickHouse. It considers all parts under no specific partition by default. To find out more, look into the system.parts
table for that projects
table defined in the previous section:
SELECT
name,
partition
FROM
system.parts
WHERE
table = 'projects';
You can see that the projects
table has no particular partition. However, you can customize it using the PARTITION BY
option:
CREATE TABLE default.projects_partitioned
(
`project_id` UInt32,
`name` String,
`created_date` Date
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_date)
PRIMARY KEY (created_date, project_id)
ORDER BY (created_date, project_id, name)
In the above table, ClickHouse partitions data based on the month of the created_date
column:
Index
ClickHouse creates a min-max index for the partition key and uses it as the first filter layer in query running. Let's see what happens when we filter data by a column existent in the partition key:
EXPLAIN indexes=1
SELECT * FROM projects_partitioned WHERE created_date='2020-02-01'
You can see that database has chosen one part out of 16 using the min-max index of the partition key.
Usage
Partitioning in ClickHouse aims to bring data manipulation capabilities to the table. For instance, you can delete or move parts belonging to partitions older than a year. It is way more efficient than an unpartitioned table since ClickHouse has split data based on the month physically on the storage. Consequently, such operations can be performed easily.
Although Clickhouse creates an additional index for the partition key, it should never be considered a query performance improvement method because it loses the performance battle to define the column in the sort keys. So if you wish to enhance the query performance, contemplate those columns in the sort keys and use a column as the partition key if you have particular plans for data manipulation based on that column.
Finally, don't get partitions in ClickHouse wrong with the same term in the distributed systems where data is split on different nodes. You should use shards and distributed tables if you're inclined to achieve such purposes.
Skip Index
You may have recognized that defining a column in the last items of the sort key cannot be helpful, mainly if you only filter on that column without the sort keys. What should you do in those cases?
Consider a dictionary you want to read. You can find words using the table of contents, sorted by the alphabet. Those items are the sort keys in the table. You can simply find a word starting with W, but how can you find pages containing words related to wars?
You can put marks or sticky notes on those pages making your effort less the next time. That's how Skip Index works. It helps the database filter granules that don't have desired values of some columns by creating additional indexes.
Case Study
Consider the projects
table defined in the Order By section. created_date
and project_id
were defined as primary keys. Now if we filter on the name
column, we'll encounter this:
EXPLAIN indexes=1
SELECT * FROM projects WHERE name='hamed'
The result was expected. Now what if we define a skip index on it?
ALTER TABLE projects ADD INDEX name_index name TYPE bloom_filter GRANULARITY 1;
The above command creates a skip index on the name
column. I've used the bloom filter type because the column was a string. You can find more about the other kinds here.
This command only makes the index for the new data. Wishing to create for already inserted, you can use this:
ALTER TABLE projects MATERIALIZE INDEX name_index;
Let's see the query analysis this time:
As you can see, the skip index greatly affected granules' rule-out and performance.
While the skip index performed efficiently in this example, it can show poor performance in other cases. It depends on the correlation of your specified column and sort keys and settings like index granularity and its type.
Conclusion
In conclusion, understanding and utilizing ClickHouse's primary keys, order keys, partition keys, and skip index is crucial for optimizing query performance and scalability. Choosing appropriate primary keys, order keys, and partitioning strategies can enhance data distribution, improve query execution speed, and prevent overloading. Additionally, leveraging the skip index feature intelligently helps minimize disk I/O and reduce query execution time. By considering these factors in your ClickHouse schema design, you can unlock the full potential of ClickHouse for efficient and performant data solutions.
Top comments (1)
๐๐ผ๐๐ผ๐๐ผ