DEV Community

AutoMQ
AutoMQ

Posted on

How to Monitor AutoMQ Cluster using Guance Cloud

Preface
Guance Cloud
Guance Cloud [1] is a unified real-time monitoring application designed for cloud platforms, cloud-native environments, applications, and business-related needs. It integrates three main signals: metrics, logs, and traces, covering testing, prerelease, and production environments to achieve observability across the entire software development lifecycle. Through Guance Cloud, enterprises can build comprehensive application full-link observability, enhancing the transparency and controllability of the overall IT architecture.
As a powerful data analysis platform, Guance Cloud includes several core modules such as the DataKit [2] unified data collector and the DataFlux Func data processing development platform.

Image description

AutoMQ
AutoMQ [3] is a next-generation Apache Kafka distribution redesigned based on cloud-native concepts. It provides up to 10 times the cost and elasticity advantages while maintaining 100% compatibility with the Apache Kafka protocol. Moreover, AutoMQ stores data entirely on S3, allowing it to quickly handle sudden traffic spikes during cluster expansion without the need for data replication. In contrast, Apache Kafka requires substantial bandwidth for partition data replication after scaling, making it difficult to manage sudden traffic surges. With features like automatic scaling, self-balancing, and automatic fault recovery, AutoMQ achieves a high degree of system autonomy, offering higher levels of availability without the need for manual intervention.

Observability Interface of AutoMQ
Due to AutoMQ's full compatibility with Kafka and support for open Prometheus-based metrics collection ports, it can be integrated with Guance Cloud's data collection tool, DataKit. This enables users to monitor and manage the status of AutoMQ clusters conveniently. The Guance Cloud platform also supports user-defined aggregation and querying of metrics data. By utilizing the provided dashboard templates or custom dashboards, we can effectively compile various information about the AutoMQ cluster, such as common Topics, Brokers, Partitions, and Group statistics.
Based on observable data from Metrics, we can also query the errors encountered during the operation of the AutoMQ cluster and various current system utilization metrics, such as JVM CPU usage, JVM heap usage, and cache size. These metrics can help us quickly identify and resolve issues when the cluster encounters anomalies, which is highly beneficial for system high availability and quick recovery. Next, I will introduce how to monitor the AutoMQ cluster status using the Observability Cloud Platform.

Steps to Integrate with the Observability Cloud
Enable Metric Fetch Interface in AutoMQ
Refer to the AutoMQ documentation: Cluster Deployment | AutoMQ [4]. Before deployment and startup, add the following configuration parameters to enable the Prometheus fetch interface. After starting the AutoMQ cluster with the following parameters, each node will additionally open an HTTP interface for fetching AutoMQ monitoring metrics. The format of the metrics will follow Prometheus Metrics format.

bin/kafka-server-start.sh ...\
--override  s3.telemetry.metrics.exporter.type=prometheus \
--override  s3.metrics.exporter.prom.host=0.0.0.0 \
--override  s3.metrics.exporter.prom.port=8890 \
....
Enter fullscreen mode Exit fullscreen mode

Once the AutoMQ monitoring metrics are enabled, you can fetch Prometheus format monitoring metrics from any node via HTTP protocol at the address: http://{node_ip}:8890. A sample response is as follows:

....
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="DescribeDelegationToken"} 0.0 1720520709290
kafka_request_time_mean_milliseconds{otel_scope_name="io.opentelemetry.jmx",type="CreatePartitions"} 0.0 1720520709290
...
Enter fullscreen mode Exit fullscreen mode

For more information on metrics, refer to the official AutoMQ documentation: Metrics | AutoMQ [5].
Install and Configure the DataKit Collection Tool
DataKit is an open-source monitoring collection tool provided by the Observability Cloud, supporting the fetching of Prometheus Metrics. We can use DataKit to fetch monitoring data from AutoMQ and aggregate it into the Observability Cloud platform.
Installation of DataKit Tool
For more details on installing DataKit, refer to the documentation: Host Installation - Observability Cloud Documentation [6].
First, register for an Observability Cloud account and log in. Then, from the main interface, click on "Integration" on the left side and select "DataKit" at the top. You will see the DataKit installation command:

DK_DATAWAY="https://openway.guance.com?token=<TOKEN>" bash -c "$(curl -L https://static.guance.com/datakit/install.sh)" 
Enter fullscreen mode Exit fullscreen mode

Copy the above command and run the DataKit installation command on all nodes in the cluster to complete the installation.
DataKit needs to be installed on all Brokers in the cluster that need to be monitored.

After successfully executing the installation command, use the command datakit monitor to verify whether DataKit was installed successfully.

Image description

AutoMQ Collector Configuration and Activation
In this section, we will configure the AutoMQ collector for DataKit on the server where each data collection node resides. Navigate to the directory /usr/local/datakit/conf.d/prom and create a collector configuration file named prom.conf. The collector configuration includes the open observable data interface, collector name, prom instance name, and important collection interval. You can make adjustments to the configuration on each server as needed:

  [[inputs.prom]]

  urls = ["http://clientIP:8890/metrics"]   # clientIP 为你自己的服务器地址
  source = "AutoMQ"

  ## Keep Exist Metric Name
  ## If the keep_exist_metric_name is true, keep the raw value for field names.
  keep_exist_metric_name = true

  [inputs.prom.tags_rename]
    overwrite_exist_tags = true

  [inputs.prom.tags_rename.mapping]
    service_name = "job"
    service_instance_id = "instance"

  [inputs.prom.tags]
    component="AutoMQ"
  interval = "10s"

Enter fullscreen mode Exit fullscreen mode

Monitor the AutoMQ cluster through the Cloud Visualization Management.
The Observation Cloud platform has integrated AutoMQ and offers multiple default dashboards. You can view them at Dashboard Example [7]. Below are some commonly used templates, with a brief introduction to their functionalities:
Cluster Monitoring
This primarily displays the number of active Brokers, total number of Topics, number of Partitions, etc. Additionally, you can specify which node to query by selecting it in the Cluster_id.

Image description

By monitoring the state of the Kafka cluster, we can promptly detect and resolve potential issues, such as node failures, insufficient disk space, and network latency, to ensure the system remains controllable and stable.
Broker Monitoring
The AutoMQ Broker dashboard on Guance Cloud describes various metrics for all Brokers, such as the number of connections, the number of partitions, the number of messages received per second (ops), and the input/output data volume per second, measured in bytes.

Image description

Topic Monitoring
This section provides an overview of information for all Topics contained within all nodes. As mentioned above, you can specify and query Topic information under a specific node. These metrics mainly include the space occupied by each Topic, the number of messages received, and the Request Throughput, which indicates the ability to process requests per unit time.

Image description

At this point, we have successfully monitored the status of the AutoMQ cluster using Guance Cloud, and the data on the dashboard is obtained by aggregating or querying Metrics indicators.
Conclusion
In this article, we introduced how to perfectly integrate the Guance Cloud platform with AutoMQ to monitor the status information of the AutoMQ cluster. There are also many advanced operations, such as custom alert functions and custom data queries, which can be customized according to the rules provided by the official documentation. You can manually experiment with these operations to find the ones that suit your needs. We also hope that this article will help you when integrating the Guance Cloud platform with AutoMQ!
References
[1] Guance Cloud: https://docs.guance.com/getting-started/product-introduction/
[2] DataKit: https://docs.guance.com/datakit/
[3] AutoMQ: https://www.automq.com
[4] Cluster Deployment of AutoMQ: https://docs.automq.com/en/docs/automq-opensource/IyXrw3lHriVPdQkQLDvcPGQdnNh
[5] Host Installation - Guance Cloud Documentation: https://docs.guance.com/datakit/datakit-install/
[6] Metrics | AutoMQ:https://docs.automq.com/zh/docs/automq-opensource/ArHpwR9zsiLbqwkecNzcqOzXn4b
[7] Dashboard Example: https://console.guance.com/scene/dashboard/createDashboard?w=wksp_63b96920660e4962a07429b65ef163e7&lak=Scene

Top comments (0)