Introduction
When managing a production ClickHouse cluster, you might face numerous challenges. One of them is finding the root cause in case of a crash. During such incidents, querying system tables like system.errors or system.text_log is not possible. Hence, we need to search into the clickhouse-server.err.log
file to identify the root cause and address it effectively. Although using Unix tools is the most efficient approach, we should aim to do it as quickly as possible.
To gain better and faster insights from ClickHouse error logs, I developed an Ansible Playbook named ClickSight. This playbook performs log aggregation on ClickHouse logs from all specified nodes. ClickSight is available under the Apache license on GitHub as the ClickSight project. In the next section, we will see how to use it and customize it to suit your needs.
It's essential to have a monitoring dashboard, such as this one, which can help us identify issues like a high number of mutations or high RAM usage. However, in complex situations, it might not be sufficient to find the root cause, and that's when error logs become crucial.
How ClickSight Works
ClickSight leverages various useful Unix commands, such as grep
, cat
, tail
, cut
, and sed
, for working with text files. Additionally, Unix's ability to pass the output of one command to another using pipes (|
) allows us to chain commands together. For example, we can use grep
to find specific lines in a file and then use cut
to select all text after the >
character with cat myfile.txt | grep "some arbitrary phrase" | cut -d">" -f2
. ClickSight harnesses the power of these Unix commands in an Ansible playbook, making log aggregation a breeze. With ClickSight, you only need to run the playbook, and the results will be available on your local machine.
Setting Up ClickSight
To use ClickSight, you need at least one active ClickHouse server and Ansible installed on your system. It's recommended to install it on a Unix-based OS. Additionally, ensure that you have access to the ClickHouse server as a sudo user, or at least your user should have access to the clickhouse-server.err.log
file.
For detailed setup instructions for ClickSight and Ansible, please refer to the Prerequisites section of the repository.
Running ClickSight
A comprehensive guide on how to run ClickSight can be found in the Run ClickSight section of the repository. Feel free to follow that guide, and if you have any questions, don't hesitate to ask here or on GitHub.
Analyzing ClickHouse Error Logs
ClickSight offers five modes of operation:
- All fatal errors in
clickhouse-server.err.log
with details: In case of a crash, afatal
error is likely present, explaining the cause of the crash. - Timeline of errors: On production systems, there might be numerous errors every minute, making it challenging to track them in log files. ClickSight can help by displaying the timeline of errors extracted from ClickHouse log lines. The error names are based on the logger_name, providing valuable information about the error category.
- Timeline of errors associated with a
query_id
: ClickHouse providesquery_id
for some logs associated with executed queries, denoted by{}
. ClickSight can display the timeline of errors linked to specificquery_id
s. - Timeline of errors associated with a specific
query_id
: ClickSight allows you to view all errors and their occurrences associated with a specificquery_id
. - Timeline of a specific error: Sometimes, we suspect a particular error and want to know when and how often it occurs. ClickSight can provide a detailed timeline for specific errors.
Contributing to ClickSight
I welcome and appreciate contributions to the ClickSight project. Whether you want to report issues, suggest improvements, or submit new features, your input is valuable in making ClickSight even more useful.
1. Reporting Issues or Feature Requests
If you encounter any problems while using ClickSight or have ideas for new features, please don't hesitate to report them. To do so:
- Go to the ClickSight GitHub repository: ClickSight GitHub repository.
- Click on the "Issues" tab.
- Click on the green "New Issue" button.
- Provide a descriptive title and detailed description of the issue you encountered or the feature you want to suggest.
- If it's a bug, include steps to reproduce the problem.
Contributing Code
If you're a developer and want to contribute directly to the ClickSight codebase, follow these steps:
- Fork the ClickSight repository to your GitHub account using the "Fork" button in the top-right corner.
- Clone the forked repository to your local development environment.
- Create a new branch for your contribution:
git checkout -b my-feature
. - Make your changes and improvements.
- Test your changes thoroughly to ensure they work as expected.
- Commit your changes with clear and concise messages:
git commit -m "Add my awesome feature"
. - Push the changes to your forked repository:
git push origin my-feature
. - Create a pull request (PR) by navigating to the original ClickSight repository and clicking on "New Pull Request."
- Describe your changes in the PR, including any relevant information or context.
- I will review your PR, provide feedback, and work with you to merge the changes into the main ClickSight project.
I'm Mohammad Anvaari, a Data Engineer at Snapp! I'm curious about data engineering and often write about my challenges and experiences on my blog.
Top comments (2)
Nice!
Thanks Hamed jan :)