DEV Community

Cover image for Day 44 – Exploring ClickHouse® System Tables for Troubleshooting
Kanishga Subramani
Kanishga Subramani

Posted on

Day 44 – Exploring ClickHouse® System Tables for Troubleshooting

Introduction

ClickHouse® provides a rich collection of built-in system tables that expose valuable information about the server's internal operations. These tables make it easier to monitor queries, storage, replication, background tasks, and overall server health without relying entirely on external monitoring tools.

Whether you're investigating slow queries, monitoring disk usage, tracking replication, or diagnosing performance bottlenecks, system tables provide real-time visibility into what's happening inside your ClickHouse server.

In this article, we'll explore the most useful system tables and learn how to use them with practical SQL examples for effective troubleshooting.


What Are System Tables?

The system database is a built-in database that contains read-only tables describing the internal state of a ClickHouse server.

These tables automatically expose metadata, runtime statistics, logs, and configuration information.

Common information available includes:

  • Database and table metadata
  • Column definitions
  • Running queries
  • Historical query logs
  • Storage parts
  • Background merges
  • Mutations
  • Replication status
  • Disk usage
  • Server errors
  • Configuration details

Unlike user-created tables, system tables are automatically maintained by ClickHouse and can be queried using standard SQL.


Commonly Used System Tables

Below are some of the most useful system tables for troubleshooting and monitoring.

system.tables

Contains metadata about every table.

Useful information includes:

  • Database name
  • Table name
  • Storage engine
  • Total rows
  • Storage settings

system.columns

Displays metadata for every column.

Information includes:

  • Column name
  • Data type
  • Default values
  • Compression codec
  • Expressions

system.parts

Provides details about MergeTree data parts.

Useful for:

  • Row counts
  • Disk usage
  • Active parts
  • Storage growth

system.processes

Shows all currently running queries.

Useful for monitoring:

  • Active SQL statements
  • Execution time
  • Memory usage
  • Current user

system.query_log

Stores historical query execution information.

Useful for:

  • Slow query analysis
  • Query history
  • Performance optimization
  • User activity

system.errors

Contains server error statistics.

Useful for:

  • Identifying recurring problems
  • Debugging server issues
  • Monitoring error frequency

system.merges

Displays currently running background merge operations.

Useful for:

  • Monitoring MergeTree maintenance
  • Identifying merge bottlenecks

system.mutations

Tracks asynchronous UPDATE and DELETE operations.

Useful for:

  • Mutation progress
  • Pending updates
  • Failed mutations

system.replicas

Shows replication status for replicated tables.

Useful for:

  • Replication lag
  • Replica health
  • Queue size
  • Read-only replicas

system.disks

Displays configured storage disks.

Useful for:

  • Free space
  • Total capacity
  • Disk health

When Should You Use System Tables?

System tables become extremely valuable whenever performance or operational issues occur.

Typical troubleshooting scenarios include:

  • Finding slow queries
  • Investigating long-running queries
  • Monitoring storage growth
  • Tracking background merges
  • Checking mutation progress
  • Verifying replication health
  • Monitoring disk usage
  • Investigating server errors

These tables provide direct visibility into the database without requiring additional monitoring software.


Troubleshooting with System Tables

1. Identify Slow Queries

Review the slowest executed queries.

SELECT
    query,
    query_duration_ms
FROM system.query_log
ORDER BY query_duration_ms DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

This helps identify:

  • Expensive queries
  • Poorly optimized SQL
  • Queries requiring indexes or redesign

2. Monitor Running Queries

View currently executing queries.

SELECT
    query,
    elapsed,
    memory_usage
FROM system.processes
ORDER BY elapsed DESC;
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Long-running queries
  • Memory-heavy queries
  • Current server activity

3. Monitor Background Merges

View ongoing MergeTree merges.

SELECT *
FROM system.merges;
Enter fullscreen mode Exit fullscreen mode

Useful for understanding whether background merges are affecting server performance.


4. Track Pending Mutations

Check UPDATE and DELETE operations that have not yet completed.

SELECT *
FROM system.mutations
WHERE is_done = 0;
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Monitoring asynchronous mutations
  • Identifying stalled operations

5. Verify Replication Health

Inspect replica status.

SELECT
    database,
    table,
    queue_size,
    is_readonly
FROM system.replicas;
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Replication lag
  • Queue backlog
  • Replica availability

6. Monitor Disk Usage

View storage utilization.

SELECT
    name,
    free_space,
    total_space
FROM system.disks;
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Preventing storage exhaustion
  • Capacity planning
  • Monitoring available disk space

Additional Monitoring Queries

View Largest Tables

SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active = 1
GROUP BY
    database,
    table
ORDER BY sum(bytes_on_disk) DESC;
Enter fullscreen mode Exit fullscreen mode

View Recent Query History

SELECT
    event_time,
    query,
    query_duration_ms
FROM system.query_log
ORDER BY event_time DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

List All Databases

SELECT *
FROM system.databases;
Enter fullscreen mode Exit fullscreen mode

View Configured Storage Disks

SELECT *
FROM system.disks;
Enter fullscreen mode Exit fullscreen mode

Check Server Errors

SELECT *
FROM system.errors;
Enter fullscreen mode Exit fullscreen mode

View ClickHouse Version

SELECT version();
Enter fullscreen mode Exit fullscreen mode

Example Troubleshooting Workflow

Imagine users report that dashboards have become noticeably slower.

A systematic troubleshooting approach could be:

  1. Inspect system.processes to identify long-running queries.
  2. Review system.query_log for recently executed slow queries.
  3. Check system.merges to determine whether background merges are consuming resources.
  4. Verify replica health using system.replicas.
  5. Monitor available disk space using system.disks.

Following this workflow helps isolate performance bottlenecks efficiently.


Best Practices

For effective monitoring and troubleshooting:

  • Monitor system.processes regularly.
  • Review system.query_log to optimize slow queries.
  • Track system.parts to monitor storage growth.
  • Monitor system.merges and system.mutations to ensure background operations complete successfully.
  • Check system.replicas regularly in replicated clusters.
  • Monitor system.disks to avoid storage bottlenecks.
  • Enable query logging in production environments.
  • Integrate system tables into dashboards such as Grafana for continuous monitoring.
  • Configure alerts for abnormal query durations, replication lag, disk usage, and server errors.

Conclusion

ClickHouse® system tables are one of the most powerful built-in tools for monitoring and troubleshooting database performance.

Tables such as system.processes, system.query_log, system.parts, system.mutations, and system.replicas provide deep visibility into query execution, storage, replication, and server health.

By making system tables part of your regular monitoring strategy, you can quickly diagnose issues, optimize performance, and maintain a reliable, high-performing ClickHouse deployment.

Top comments (0)