Kanishga Subramani

Posted on Jun 27

Day 44 – Exploring ClickHouse® System Tables for Troubleshooting

#clickhouse #devops #database #dataengineering

Introduction

ClickHouse® provides a rich collection of built-in system tables that expose valuable information about the server's internal operations. These tables make it easier to monitor queries, storage, replication, background tasks, and overall server health without relying entirely on external monitoring tools.

Whether you're investigating slow queries, monitoring disk usage, tracking replication, or diagnosing performance bottlenecks, system tables provide real-time visibility into what's happening inside your ClickHouse server.

In this article, we'll explore the most useful system tables and learn how to use them with practical SQL examples for effective troubleshooting.

What Are System Tables?

The system database is a built-in database that contains read-only tables describing the internal state of a ClickHouse server.

These tables automatically expose metadata, runtime statistics, logs, and configuration information.

Common information available includes:

Database and table metadata
Column definitions
Running queries
Historical query logs
Storage parts
Background merges
Mutations
Replication status
Disk usage
Server errors
Configuration details

Unlike user-created tables, system tables are automatically maintained by ClickHouse and can be queried using standard SQL.

Commonly Used System Tables

Below are some of the most useful system tables for troubleshooting and monitoring.

system.tables

Contains metadata about every table.

Useful information includes:

Database name
Table name
Storage engine
Total rows
Storage settings

system.columns

Displays metadata for every column.

Information includes:

Column name
Data type
Default values
Compression codec
Expressions

system.parts

Provides details about MergeTree data parts.

Useful for:

Row counts
Disk usage
Active parts
Storage growth

system.processes

Shows all currently running queries.

Useful for monitoring:

Active SQL statements
Execution time
Memory usage
Current user

system.query_log

Stores historical query execution information.

Useful for:

Slow query analysis
Query history
Performance optimization
User activity

system.errors

Contains server error statistics.

Useful for:

Identifying recurring problems
Debugging server issues
Monitoring error frequency

system.merges

Displays currently running background merge operations.

Useful for:

Monitoring MergeTree maintenance
Identifying merge bottlenecks

system.mutations

Tracks asynchronous UPDATE and DELETE operations.

Useful for:

Mutation progress
Pending updates
Failed mutations

system.replicas

Shows replication status for replicated tables.

Useful for:

Replication lag
Replica health
Queue size
Read-only replicas

system.disks

Displays configured storage disks.

Useful for:

Free space
Total capacity
Disk health

When Should You Use System Tables?

System tables become extremely valuable whenever performance or operational issues occur.

Typical troubleshooting scenarios include:

Finding slow queries
Investigating long-running queries
Monitoring storage growth
Tracking background merges
Checking mutation progress
Verifying replication health
Monitoring disk usage
Investigating server errors

These tables provide direct visibility into the database without requiring additional monitoring software.

Troubleshooting with System Tables

1. Identify Slow Queries

Review the slowest executed queries.

SELECT
    query,
    query_duration_ms
FROM system.query_log
ORDER BY query_duration_ms DESC
LIMIT 10;

This helps identify:

Expensive queries
Poorly optimized SQL
Queries requiring indexes or redesign

2. Monitor Running Queries

View currently executing queries.

SELECT
    query,
    elapsed,
    memory_usage
FROM system.processes
ORDER BY elapsed DESC;

Useful for:

Long-running queries
Memory-heavy queries
Current server activity

3. Monitor Background Merges

View ongoing MergeTree merges.

SELECT *
FROM system.merges;

Useful for understanding whether background merges are affecting server performance.

4. Track Pending Mutations

Check UPDATE and DELETE operations that have not yet completed.

SELECT *
FROM system.mutations
WHERE is_done = 0;

Useful for:

Monitoring asynchronous mutations
Identifying stalled operations

5. Verify Replication Health

Inspect replica status.

SELECT
    database,
    table,
    queue_size,
    is_readonly
FROM system.replicas;

Useful for:

Replication lag
Queue backlog
Replica availability

6. Monitor Disk Usage

View storage utilization.

SELECT
    name,
    free_space,
    total_space
FROM system.disks;

Useful for:

Preventing storage exhaustion
Capacity planning
Monitoring available disk space

Additional Monitoring Queries

View Largest Tables

SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active = 1
GROUP BY
    database,
    table
ORDER BY sum(bytes_on_disk) DESC;

View Recent Query History

SELECT
    event_time,
    query,
    query_duration_ms
FROM system.query_log
ORDER BY event_time DESC
LIMIT 20;

List All Databases

SELECT *
FROM system.databases;

View Configured Storage Disks

SELECT *
FROM system.disks;

Check Server Errors

SELECT *
FROM system.errors;

View ClickHouse Version

SELECT version();

Example Troubleshooting Workflow

Imagine users report that dashboards have become noticeably slower.

A systematic troubleshooting approach could be:

Inspect system.processes to identify long-running queries.
Review system.query_log for recently executed slow queries.
Check system.merges to determine whether background merges are consuming resources.
Verify replica health using system.replicas.
Monitor available disk space using system.disks.

Following this workflow helps isolate performance bottlenecks efficiently.

Best Practices

For effective monitoring and troubleshooting:

Monitor system.processes regularly.
Review system.query_log to optimize slow queries.
Track system.parts to monitor storage growth.
Monitor system.merges and system.mutations to ensure background operations complete successfully.
Check system.replicas regularly in replicated clusters.
Monitor system.disks to avoid storage bottlenecks.
Enable query logging in production environments.
Integrate system tables into dashboards such as Grafana for continuous monitoring.
Configure alerts for abnormal query durations, replication lag, disk usage, and server errors.

Conclusion

ClickHouse® system tables are one of the most powerful built-in tools for monitoring and troubleshooting database performance.

Tables such as system.processes, system.query_log, system.parts, system.mutations, and system.replicas provide deep visibility into query execution, storage, replication, and server health.

By making system tables part of your regular monitoring strategy, you can quickly diagnose issues, optimize performance, and maintain a reliable, high-performing ClickHouse deployment.

DEV Community

Day 44 – Exploring ClickHouse® System Tables for Troubleshooting

Introduction

What Are System Tables?

Commonly Used System Tables

system.tables

system.columns

system.parts

system.processes

system.query_log

system.errors

system.merges

system.mutations

system.replicas

system.disks

When Should You Use System Tables?

Troubleshooting with System Tables

1. Identify Slow Queries

2. Monitor Running Queries

3. Monitor Background Merges

4. Track Pending Mutations

5. Verify Replication Health

6. Monitor Disk Usage

Additional Monitoring Queries

View Largest Tables

View Recent Query History

List All Databases

View Configured Storage Disks

Check Server Errors

View ClickHouse Version

Example Troubleshooting Workflow

Best Practices

Conclusion

Top comments (0)