ClickHouse Functions You Should Know: A Practical Guide

#clickhousefunctions

ClickHouse offers a rich set of built-in functions that make working with large datasets faster and more intuitive. From the addDays() function for date arithmetic to generateUUIDv4() for unique identifier generation, these functions simplify complex queries and optimize storage, making data insights more accessible. This guide walks through the most practical categories.

Array Functions

arrayMap() transforms an array by applying an expression to every element and returning a new array with the results. For example, passing [1, 2, 3] through x -> x+1 produces [2, 3, 4]. It is ideal for in-place data transformation without needing joins or subqueries.
groupArray() collects column values within a group into a single array. It supports an optional max_size parameter to cap array length. This is useful for aggregating a list of players per team or actions per user in chronological order when combined with arraySort.
argMax() / argMin() are aggregate functions that return the value of one column corresponding to the maximum or minimum of another. For instance, querying which product had the highest revenue returns the product name alongside that peak revenue value — making it easy to identify top or bottom performers without complex subqueries.

Window Functions

row_number() assigns a sequential rank to rows within a partition, ordered by a specified column. A practical example is ranking NBA players by salary in descending order — Stephen Curry at rank 1 with $48M, LeBron James at rank 2 with $45M, and so on.
runningDifference() calculates the difference between the current row's value and the previous row's value in a column. This function is particularly useful for analyzing trends over time, such as changes in sales, stock prices, or other numerical data. Note that the first row of each group returns 0, as there is no prior value to compare.

Date and Time Functions

toStartOfYear() resets any date or DateTime value to January 1st at 00:00:00 of the same year. This is handy for grouping and summing data by year, such as calculating total annual sales across multiple years.
addDays(date, n) adds a fixed number of days to a date. A typical use case is computing estimated delivery dates by adding 5 days to an order timestamp.
INTERVAL syntax offers an SQL-standard alternative to addDays(). It uses singular unit keywords — INTERVAL 60 DAY, INTERVAL 2 MONTH, INTERVAL 1 YEAR, etc. — and is especially clean in WHERE clause filters like WHERE order_date >= now() - INTERVAL 60 DAY. Use INTERVAL for static, readable date offsets in WHERE clauses; use addDays() when adding a dynamic number of days from a column value.
timeDiff(dateTime1, dateTime2) returns the difference between two DateTime values in seconds, making it well-suited for measuring event durations or process gaps.

Aggregate Functions

quantile() calculates percentile values across a dataset. The 0.5 quantile represents the 50th percentile (median), while 0.9 and 0.99 represent the 90th and 99th percentiles respectively. This is particularly valuable for web server performance analysis — understanding median, p90, and p99 response times per server.
stddevPop() / stddevSamp() measures data spread. stddevPop computes population standard deviation, while stddevSamp computes sample standard deviation. A low value means data points cluster near the mean; a high value indicates wider variation — useful for detecting inconsistent server latencies.
Aggregate combinators extend standard functions with suffixes. The -If combinator (e.g., sumIf) processes only rows matching a condition. The -Array combinator (e.g., sumArray) processes array elements instead of rows. Both can be combined, but Array must always come before If (e.g., uniqArrayIf).

Full-Text Search

match(string, pattern) performs regular expression matching against string data. It is useful for filtering log messages, validating formats, or extracting patterns — for example, retrieving only log entries that contain the word "Error".

UUID Functions

generateUUIDv4() generates a random version-4 UUID with each call. It is commonly used as a default value for primary key columns to ensure uniqueness across distributed inserts.
generateUUIDv7() (available from ClickHouse v24.1+) produces time-ordered UUIDs based on the UUIDv7 specification. Unlike v4, these UUIDs are chronologically sortable, making them well-suited for primary keys in time-series workloads.

Visual Representation

bar() renders ASCII bar charts directly in query output. It accepts the value, a minimum, a maximum, and a bar width, making it easy to visualize relative server loads or sales figures without needing an external tool.

User Defined Functions (UDFs)

UDFs let users define custom reusable logic using SQL lambda syntax: CREATE FUNCTION name AS (params) -> expression. A simple example is an age-group classifier that returns "Child", "Adult", or "Senior" based on an age value.
**Executable UDFs **go further — they call external scripts (such as Python files) to process data. The configuration is defined in XML files and referenced in config.xml. A practical example shown in the blog masks sensitive data like email addresses and phone numbers, replacing characters while preserving just enough for identification.

Readable Formatting Functions

formatReadableSize() converts raw byte counts into human-readable strings (KB, MB, GB), making storage reports from system.tables far easier to scan.
formatReadableQuantity() formats large numbers into compact representations like thousands, millions, and billions.
formatReadableTimeDelta() converts a seconds value into a descriptive duration — for example, 432,546,534 seconds becomes "13 years, 8 months, 17 days, 7 hours, 48 minutes and 54 seconds."

Recent Version Updates

Notable recent additions include generateUUIDv7() in v24.1+, compound INTERVAL support in v23.8+, arrayFold() in v23.4+ for reducing arrays to a single value, and Variant & Dynamic Types in v24.1+. Date functions like addDays() now consistently support the Date32 type for extended date ranges from 1900 to 2299.Together, these functions cover the breadth of what most data engineers need — from array manipulation and time arithmetic to statistical analysis, unique ID generation, and custom business logic — all within ClickHouse's fast, columnar query engine.
For a detailed understanding of each function with examples and query outputs, refer to our blog Clickhouse Functions

DEV Community