As we all know, Kafka exposes the JMX metrics whether it is Kafka broker, connectors or Kafka applications. A few days ago, I got the scenario where I needed to determine Kafka broker health with the help of Kafka stream application’s JMX metrics. It looks bit awkward, right? I should use the broker’s JMX metrics to do this, why am I looking to application JMX metrics. Yes, this is a bit awkward, but that was my use case.
So I started looking into it and found some solution. I would say the solution is not straightforward, but yes whatever I found I tried that at my local as well as on Test environment and it worked perfectly. So until and unless we do not get the false results, we can consider it as a good solution.
You should also try it first either at local or some environment and then use it in the production as a full proof solution. I am sharing it with you all just because I thought it might help someone and might fit in some use cases.
So lets’ begin !!!
I run 3 brokers at my local and one Kafka stream application (user-streamapp). Stream application was consuming messages from the topic named user-profiles.
When we run Kafka stream application, it generates metrics for consumer, producer, and stream. Here, we will use consumer metrics for our purpose. See below screenshot for metrics in Graphite:
In consumer metrics, we will measure the type_consumer-node-metrics:
user-streamapp → local → kafka → consumer → type_consumer-node-metrics
Under type_consumer-node-metrics, there are metrics for each consumer (application can have multiple streams so as multiple consumers). Here we are monitoring only one consumer which is user-profiles–v1 (application.id of the stream app).
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*
Here, we have added * just because some uuid gets appended to the application.id of the stream and that is random so we can not have a fixed value.
Under the consumer metrics client-id_user-profiles-v1-*, as there are 3 brokers so there are metrics for all 3 brokers.
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-0
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-1
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-2
For each node, we are monitoring the incoming byte rate.
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-0 → incoming-byte-rate
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-1 → incoming-byte-rate
user-streamapp → local →kafka → consumer → type_consumer-node-metrics → client-id_user-profiles-v1-*-node-id_node-2 → incoming-byte-rate
When the broker is up then the value for incoming-byte-rate is greater than 0.001 else it is 0.000. so we have added alerts as follows:
If the condition is true then we will get the alerts. We have added the OR condition here because we want the alerts if any of the brokers is down.
That’s it. I hope, it will help you in some way either directly or indirectly.
Stay tuned for more blogs !!!