Chameera Dulanga

Posted on Mar 13, 2023

Debugging Kafka and other Message Brokers

#kafkaopentelemetry #kafkatroubleshooting #troubleshootingmessagebrokers #kafka

Message brokers like Kafka help scale microservices by providing a reliable and efficient way to communicate between them asynchronously. However, this same characteristic makes tracking and troubleshooting microservices challenging, especially when the number of messages passing through the message broker increases.

So, this article examines tools like Kafka Owl, Redpanda, and Helios that help to avoid the above difficulties without affecting scalability.

What are message brokers?

A message broker is a software component that allows applications or services to communicate with each other by exchanging messages. They provide a standard communication protocol, ensuring that the messages are delivered consistently and efficiently across the system. Message brokers stand out from other communication techniques like REST and gRPC since they can handle complex message flows through asynchronous communication.

In asynchronous communication, messages sent by a service (producer) are stored in a message queue without requiring immediate consumption. Other services (consumers) will process those messages later based on their requirements or availability. This behavior encourages developers to build decoupled applications and increases the reliability and redundancy of the communication process since it reduces the risk of messages getting lost or becoming stateless.

However, asynchronous communication can cause some significant challenges to developers as the system scales up.

Challenges of using message brokers

When working with message brokers, one of the main challenges that a developer has to face is with identifying and troubleshooting errors. The asynchronous communication behavior of message brokers can make it difficult to track and monitor the flow of messages since there is no immediate response to inform the sender about the result of the request. So, if there is an error, you won't be able to get an error message or a response right away to troubleshoot the issue.

For example, consider a scenario with 2 services named 'Service A' and 'Service B' where Service B fails to process a request from Service A.

In asynchronous communication, Service A won't wait until Service B process the request and will not get immediately notified about the failure. It will continue processing other requests, and the failure response can get mixed between those requests, making it difficult to track and debug the issue.

Please note that Service B will send a synchronous acknowledgement to Service A stating that it received the request. Bu the request processing will happen asynchronously.

On the other hand, synchronous communication will make Service A wait until Service B process the request before proceeding, making it easier to track and monitor the flow of messages. So, it will be easier to troubleshoot since developers can easily map the error to the request.

However, message brokers with asynchronous communication are a great option for building scalable applications. Hence, various solutions were introduced to overcome this root cause identification issue by improving the monitoring support in message brokers.

Solutions for monitoring microservices in message brokers

Kafka Owl, Redpanda, and Helios are a few of the solutions developers use to enable message monitoring in message brokers.

Kafka Owl and Redpanda

Kafka Owl allows developers to explore and fetch messages in Kafka clusters to get better insights into what's happening within the cluster.

Features of Kafka Owl:

Message viewer - Allows to explore topics; messages with ad-hoc queries and dynamic filters. You can use JavaScript functions to filter messages and supports JSON, Avro, XML, Text and Binary (hex view) data encoding types.
Consumer groups overview - Provide detailed information on consumers, including their members, member state & partition assignments.
Cluster overview - Lists available brokers with information like space usage and rack id to provide a high-level overview of the brokers.
Topic overview - Allows exploring Kafka topic lists, checking their configurations, and space usage.

Redpanda is another popular tool for managing and debugging your Kafka/Redpanda workloads.

Features of Redpanda:

Schema registry - Lists all Avro, Protobuf or JSON schemas.
Message viewer - Allows exploring topics' messages with ad-hoc queries and dynamic filters. Supports JSON, Avro, Protobuf, XML, MessagePack, Text and Binary (hex view) encoding types, and you can use JavaScript functions to filter messages.
Consumer groups - Lists active consumer groups and allows editing and deleting consumer groups.
Topic overview - Allows to explore Kafka topics, check their configurations and space usage and list consumers who consume a single topic or watch partition details.
Kafka connect - Allows to manage connectors from multiple connect clusters, patch configs and view the current state or restart tasks.

However, if you wish to observe the full context of the message requests in order to reproduce, troubleshoot, and debug the issue easily (or prevent it from recurring), you will need another tool. Keep on reading.

Trace-based monitoring and troubleshooting solution for message brokers

Trace-based monitoring and troubleshooting solutions like Helios allow developers to see all the operations triggered in their distributed system regardless of the communication type.

Unlike logs, developers do not need to insert traces into the code manually. Traces can be created automatically and provide a complete picture of a message, including its flow and behavior between different components and services. So, when there is an error, developers can see the complete path of the message through the different services and easily identify where they need to debug.

The above image shows how the solution provides visualizations and details of a Kafka message, including connected services, the location of the error, reasons for the error, and more.

Features needed for efficient troubleshooting:

Provide tracing information in full context regardless of the environment.
Trace data visualizations.
Easy integrations with the existing ecosystem, including logs, tests, error monitoring, and more.
Quick workflow reproduction, including HTTP requests, Kafka and RabbitMQ messages, and Lambda invocations, in just a few clicks.
Automatic test generation based on trace data.
Allows to share traces, tests, and triggers with team members easily.
Multi-language support, including Python, JavaScript, Node.js, Java, Ruby, .NET, Go and C++.

In addition to monitoring message brokers, Helios is very useful in distributed tracing, bottleneck analysis, API call automation, serverless applications observability and, multi-language application trace integration.

Conclusion

Message brokers allow developers to design scalable, asynchronous communication between microservices with complex and flexible data flows. However, this asynchronous nature makes troubleshooting and debugging issues in message brokers difficult. Even with tools like Kafka Owl, Redpanda developers struggle to identify the root cause of errors since they do not provide full context.

A trace-based monitoring and troubleshooting solution for message brokers, such as Helios, resolves all these issues. Developers can gain a complete understanding of the flow and behavior of messages between services using the trace visualizations and actionable data provided by the tool.

DEV Community

Debugging Kafka and other Message Brokers

What are message brokers?

Challenges of using message brokers

Solutions for monitoring microservices in message brokers

Kafka Owl and Redpanda

Trace-based monitoring and troubleshooting solution for message brokers

Conclusion

Top comments (0)

Read next

My Journey with AI Agents: Revolutionizing WeDance Development 🚀

Stop Repeating Terminal Command; Ask AI

Функції

Laravel route alternatives