Determine Kafka broker health using Kafka stream application’s JMX metrics and setup Grafana alert

Reading Time: 3 minutes

As we all know, Kafka exposes the JMX metrics whether it is Kafka broker, connectors or Kafka applications. A few days ago, I got the scenario where I needed to determine Kafka broker health with the help of Kafka stream application’s JMX metrics. It looks bit awkward, right? I should use the broker’s JMX metrics to do this, why am I looking to application JMX metrics. Yes, this is a bit awkward, but that was my use case.

So I started looking into it and found some solution. I would say the solution is not straightforward, but yes whatever I found I tried that at my local as well as on Test environment and it worked perfectly. So until and unless we do not get the false results, we can consider it as a good solution.

You should also try it first either at local or some environment and then use it in the production as a full proof solution. I am sharing it with you all just because I thought it might help someone and might fit in some use cases.

So lets’ begin !!!

I run 3 brokers at my local and one Kafka stream application (user-streamapp). Stream application was consuming messages from the topic named user-profiles.

When we run Kafka stream application, it generates metrics for consumer, producer, and stream. Here, we will use consumer metrics for our purpose. See below screenshot for metrics in Graphite:

graphite_metrics

In consumer metrics, we will measure the type_consumer-node-metrics:

user-streamapp → local → kafka → consumer → type_consumer-node-metrics

Under type_consumer-node-metrics, there are metrics for each consumer (application can have multiple streams so as multiple consumers). Here we are monitoring only one consumer which is user-profiles–v1 (application.id of the stream app).

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*

Here, we have added * just because some uuid gets appended to the application.id of the stream and that is random so we can not have a fixed value.

Under the consumer metrics  client-id_user-profiles-v1-*, as there are 3 brokers so there are metrics for all 3 brokers.

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-0

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-1

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-2

For each node, we are monitoring the incoming byte rate.

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-0 → incoming-byte-rate

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-1 → incoming-byte-rate

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-2 → incoming-byte-rate

When the broker is up then the value for incoming-byte-rate is greater than 0.001 else it is 0.000. so we have added alerts as follows:

alerts

If the condition is true then we will get the alerts. We have added the OR condition here because we want the alerts if any of the brokers is down.

That’s it. I hope, it will help you in some way either directly or indirectly.

Stay tuned for more blogs !!!

Knoldus-blog-footer-image

Written by 

Rishi is a tech enthusiast with having around 10 years of experience who loves to solve complex problems with pure quality. He is a functional programmer and loves to learn new trending technologies. His leadership skill is well prooven and has delivered multiple distributed applications with high scalability and availability by keeping the Reactive principles in mind. He is well versed with Scala, Akka, Akka HTTP, Akka Streams, Java8, Reactive principles, Microservice architecture, Async programming, functional programming, distributed systems, AWS, docker.