DEV Community

Cover image for Migrating to ClickHouse: Releem's Journey
Roman Agabekov
Roman Agabekov

Posted on • Originally published at releem.com

Migrating to ClickHouse: Releem's Journey

At Releem, the "autopilot" for databases, our mission has always been to streamline MySQL performance tuning, ensuring databases run as efficiently as possible. To accomplish this, we rely heavily on metrics collected from hundreds of database servers across various operating systems and cloud solutions like AWS RDS.

Releem utilized MySQL for storing all metrics. However, as our service grew, the sheer volume of metrics we needed to analyze and store presented a formidable challenge that led us to migrate from MySQL to ClickHouse. This article delves into the details of that migration.

The Challenge: Scaling MySQL for Metrics Storage

Our journey began against the backdrop of an ever-increasing dataset. Our daily metric count ballooned from 500,000 to 22 million within 15 months, amassing over 4.7 billion records, while the number of customer servers increased 5X to a few thousands.

Currently, our database size stands at 300 GB, growing rapidly, with increasing costs to maintain its fast performance. The extensive database required continuous server resource augmentation due to the need to analyze an increasing number of rows for metric aggregation operations, data selection for analysis and training, and user charts.

Metric aggregation for the past hour is performed every hour, calculating minimum, average, and maximum values. With data growth, aggregation time increased to 45 minutes, leading to issues generating aggregated data over extended periods.

The increasing metrics displayed on the Releem Dashboard led to a slowdown in metrics selection and increased disk load because not all data could fit in RAM. The slowdown in metric selection adversely impacted the user experience with the Releem Dashboard.

Throughput of Releem database instance:
Throughput of Releem database instance

The impact was multifaceted:

  • Continuously augmenting server resources to manage growing data became a costly affair.
  • The time required for metric aggregation skyrocketed, affecting our ability to deliver timely insights.
  • Increased load times in the Releem Dashboard.

Choosing Solution and Testing the Waters

Our search for a solution led us to ClickHouse, an advanced DBMS renowned for its exceptional handling of TimeSeries data and some team members had previously worked with ClickHouse, providing us with valuable insights into its capabilities.
The decision to pivot to ClickHouse was driven by its promise of scalability, high-speed query execution, and significant data compression capabilities.

Transitioning to ClickHouse involved deploying a test server, creating the necessary tables, and implementing dual data-saving functionality to both MySQL and ClickHouse in the Releem Platform. We meticulously migrated existing metrics, testing and adjusting until we were confident in the system's performance.

Our tests revealed remarkable improvements: database size was reduced by over 20 times, aggregation times were cut to mere minutes, and a significant decrease in queries latency, enhancing the client portal experience by 25%.

This phase was critical, allowing us to test the waters, so to speak, and understand how ClickHouse could best fit within our ecosystem.

The Migration and Results

In our case, the implementation was relatively fast. Within a week, we deployed a ClickHouse server instance and rolled out new functionality for the platform and the Releem client portal for the production environment. We successfully integrated ClickHouse into our production environment without stopping our services.

The outcome of our migration exceeded expectations:

  • Data compression techniques in ClickHouse reduced our database size by more than 20 times.
  • The metrics aggregation process, which took 45 minutes, now took only ~2 minutes, significantly improving our data processing capabilities.
  • The page load time of the Releem Dashboard was improved by 25%, enhancing the user experience.

This enhancement allowed us to deliver timely, accurate insights to our users, significantly improving the performance and reliability of our service.

Releem Dashboard:
Releem Dashboard

The deployment of ClickHouse into our production environment marked a significant milestone in Releem's journey. We had fully integrated ClickHouse for metric storage while continuing to use MySQL for application logic. This hybrid approach has allowed us to leverage the strengths of both databases, ensuring robust performance and scalability.

For fellow engineers considering a similar migration, our key takeaways would be to thoroughly test of ClickHouse with your tasks in a controlled environment.

Top comments (3)

Collapse
 
rexfordkode profile image
Rexford Koomson

How did you managed the timeout

Collapse
 
drupaladmin profile image
Roman Agabekov

Rexford, thanks for your question.

Releem Dashboard sends request to the database for metrics to display charts for users.
We changed requests from MySQL to ClickHouse in the Releem Dashboard that helped us to decrease response time.

Collapse
 
rexfordkode profile image
Rexford Koomson

This is is really cool