Determine Kafka broker health using Kafka stream application’s JMX metrics and setup Grafana alert

As we all know, Kafka exposes the JMX metrics whether it is Kafka broker, connectors or Kafka applications. A few days ago, I got the scenario where I needed to determine Kafka broker health with the help of Kafka stream application’s JMX metrics. It looks bit awkward, right? I should use the broker’s JMX metrics to do this, why am I looking to application JMX metrics. Yes, this is a bit awkward, but that was my use case.

So I started looking into it and found some solution. I would say the solution is not straightforward, but yes whatever I found I tried that at my local as well as on Test environment and it worked perfectly. So until and unless we do not get the false results, we can consider it as a good solution.

You should also try it first either at local or some environment and then use it in the production as a full proof solution. I am sharing it with you all just because I thought it might help someone and might fit in some use cases.

So lets’ begin !!!

I run 3 brokers at my local and one Kafka stream application (user-streamapp). Stream application was consuming messages from the topic named user-profiles.

When we run Kafka stream application, it generates metrics for consumer, producer, and stream. Here, we will use consumer metrics for our purpose. See below screenshot for metrics in Graphite:

graphite_metrics

In consumer metrics, we will measure the type_consumer-node-metrics:

user-streamapp → local → kafka → consumer → type_consumer-node-metrics

Under type_consumer-node-metrics, there are metrics for each consumer (application can have multiple streams so as multiple consumers). Here we are monitoring only one consumer which is user-profiles–v1 (application.id of the stream app).

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*

Here, we have added * just because some uuid gets appended to the application.id of the stream and that is random so we can not have a fixed value.

Under the consumer metrics  client-id_user-profiles-v1-*, as there are 3 brokers so there are metrics for all 3 brokers.

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-0

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-1

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-2

For each node, we are monitoring the incoming byte rate.

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-0 → incoming-byte-rate

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-1 → incoming-byte-rate

user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-2 → incoming-byte-rate

When the broker is up then the value for incoming-byte-rate is greater than 0.001 else it is 0.000. so we have added alerts as follows:

alerts

If the condition is true then we will get the alerts. We have added the OR condition here because we want the alerts if any of the brokers is down.

That’s it. I hope, it will help you in some way either directly or indirectly.

Stay tuned for more blogs !!!

Knoldus-blog-footer-image

Written by 

Rishi is a Lead Consultant, with experience of more than 7 years. Rishi is product focused developer who loves developing both front-end user interfaces and scalable back-end infrastructure. He is a good team player, quick learner and a humble person. He has good time management skills, aimed to give best results and fully dedicated towards his work & responsibilities. He is able to work as individual and as well as in team. He loves to share his knowledge, therefore he often writes technical blogs.

Leave a Reply

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!