Kafka Broker Down, No worries ! Get Alerts!

Reading Time: 5 minutes

In the previous post, we have monitored our Kafka matrices using Prometheus and visualize the health of Kafka over Grafana. Now we will set an alert, so whenever any of Kafka broker is down, we’ll receive a notification.

For Kafka, a single broker is just a cluster of size one. Usually we don’t create a single broker. If a single broker is down, our Kafka server will also stop and we won’t be able to generate any matrices. So, let’s get started by :

Setting up a multi-broker cluster

First we make a config file for each of the brokers

> cd Downloads/kafka_2.12-2.2.0
> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties

Now edit these new files and set the following properties:

config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs-1
 
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs-2

The broker.id property is the unique and permanent name of each node in the cluster. We have to override the port and log directory only because we are running these all on the same machine and we want to keep the brokers from all trying to register on the same port or overwrite each other’s data.

Since we have set up our multi-broker cluster, let’s start with generating matrices.

  • Start the Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
  • Start first Kafka broker with the JMX exporter running as a Java agent.
KAFKA_OPTS="$KAFKA_OPTS -javaagent:$PWD/jmx_prometheus_javaagent-0.6.jar=7071:$PWD/kafka-0-8-2.yml" \
./bin/kafka-server-start.sh config/server.properties &
  • Now, start second Kafka broker with the JMX exporter running as a Java agent.
KAFKA_OPTS="$KAFKA_OPTS -javaagent:$PWD/jmx_prometheus_javaagent-0.6.jar=7072:$PWD/kafka-0-8-2.yml" \
./bin/kafka-server-start.sh config/server-1.properties &
  • Finally, start third Kafka broker with the JMX exporter running as a Java agent.
KAFKA_OPTS="$KAFKA_OPTS -javaagent:$PWD/jmx_prometheus_javaagent-0.6.jar=7073:$PWD/kafka-0-8-2.yml" \
./bin/kafka-server-start.sh config/server-2.properties &

View Matrices

Visit http://localhost:7071/ to look for the matrices generated for broker one, http://localhost:7072/ for the metrices generated for broker second and http://localhost:7073/ for the matrices generated for broker three.

Start Prometheus for monitoring Kafka matrices

cd prometheus-*
./prometheus

Provide the query in Expression column

query-expression
  • Edit the prometheus.yml file as:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'kafka'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:7071','localhost:7072','localhost:7073']
up{job="kafka"}
  • Click on Execute.
Entering queries as "up{job="kafka"}" in Expression column.

You will be able to see all the active brokers running. You can also look for the same in the Status menu Target option.

Status to see Target menu.
In target menu, status of all the brokers as up or down.

Setting up Grafana

  • Start Grafana
./bin/grafana-server web
  • By default, Grafana will be listening on http://localhost:3000 (visit here). The default login is “admin” / “admin”.

In the previous post, I have explained how to create Prometheus as Data Source. If you haven’t set it up, please refer here: Monitoring Kafka with Prometheus and Grafana.

Now, we are supposed to set configuration of senders for sending alerts in Grafana. For this post, I’ll be sending alert notifications through email and thus we will set the smtp configurations.

cd Downloads/grafana-6.1.4/conf
cd Downloads/grafana-6.1.4/conf$ gedit default.ini

In SMTP, make the following changes

####################### SMTP / Emailing #####################
[smtp]
enabled = true
host = smtp.gmail.com:465
user = <your_email_id>
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
password = <your_password>
cert_file =
key_file =
skip_verify = true
from_address = <your_email_id>
from_name = Grafana
ehlo_identity =

Make the same above changes in sample.ini file.

In Grafana, create new Dashboard and select Queries to as Prometheus.

Grafana Dashboard.
  • Provide Alert Rule name, evaluation time i.e. the time interval it should check for the status of Kafka Brokers.
  • Determine the Condition. As I have taken 3 brokers, I had put the query as
WHEN sum() OF query(A, 5m, now) IS BELOW 3

which means if sum of my active brokers is less than 3 i.e, even if one broker is down send me an alert.

Queries to provide in
  • Provide Error Handling, set State to “Alerting” and set Notification.

Now, in the Alerting menu, configure the Notification Channel. Provide the type of notification, the medium through which you want to deliver alert. I am providing type here as Email. Provide the receivers of the alerts and other necessary configurations accordingly.

Now, let’s make one of our broker down to check the alerting. Either stop the broker form processing or by killing the port on which broker is running. For ex:

fuser -k 7072/tcp

We can see the status of down broker in Prometheus.

Now you can see the email in the inbox whose email id(s) are provided in the notification channel.

Use Case

Suppose your one of our brokers down because of a bug. During that time our producer was not able to produce messages (at least to some partitions). If the offline broker was a leader, a new leader is elected from the replicas that are in-sync.

What happens when a broker is down, depends on your configuration. In case you’re using the synchronous producer(where the ordering of messages is important), you need to implement your own retries. The synchronous producer doesn’t handle this scenario where one broker is down.

It flags its cluster info as stale, the next attempt is to re-fetch it (including new leadership info). But you may need to back off a bit and allow the cluster to recover first. So, rescue the exception, then sleep a bit, then try again.
The async producer does this automatically. Thus, getting an alert on time will be beneficial.

Conclusion

This solution proves to be very efficient in collecting metrics, preventing problems and keeping you alert in case of emergencies.

References

Kafka Quickstart
Grafana with Prometheus

1 thought on “Kafka Broker Down, No worries ! Get Alerts!5 min read

Comments are closed.

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!