What is Apache Kafka
Apache Kafka is a distributed open-source system specially designed for streams. Mostly Kafka is used in real-time streaming data architectures to provide real-time analytics. It is fault-tolerant, high-throughput, horizontally scalable, and allows geographically distributed data streams and stream processing applications.
Basic Componants of kafka
A producer is an entity/application that publishes data to a Kafka cluster.
A broker is responsible for receiving and storing the data when a producer publishes.
A consumer then consumes data from a broker at a specified offset, i.e. position.
A message contains the data and also the metadata. The metadata contains information such as the offset, timestamp, compression type, etc.
These messages are organized into logical groupings or categories which are called a topic, to which producers publish data.
A producer can publish multiple topics. You can define what your topics are and which topics a producer publishes to.
A topic is then divided into partitions, where each contains a subset of a topic’s messages. A broker can have multiple partitions.
The offset is a way of tracking the sequential order in which messages are received by Kafka topics. We can reset the offset value at any time to read the data from a different place.
Basic kafka CLI commands
1. First need to Download the latest Kafka release and extract it.
$tar -xzf kafka_2.13-3.3.1.tgz $cd kafka_2.13-3.3.1
2. Start the Kafka Environment.
To start the Kafka locally first need to start the zookeeper services locally by using the following command.
open another terminal and need to run this command.
Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.
3. Command to create a topic for the producers to publish.
$bin/kafka-topics.sh --create --topic first-events --bootstrap-server localhost:9092
This will create a topic called first events.
4. Command to write some data into the topic and start producers to publish it.
$bin/kafka-console-producer.sh --topic first-events --bootstrap-server localhost:9092
5. Open another terminal and start the consumer to consume or read the messages at the same time, that we have written into our topic by using the producer.sh script.
$bin/kafka-console-consumer.sh --topic first-events --from-beginning --bootstrap-server localhost:9092
6. Command to use offset to read the last N messages from a partition.
$bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --offset 3 --partition 0 --test-topic
7. Command to use the –group command to define a consumer group. A group basically denotes a particular application or service. A consumer group can consume data from multiple topics.
$bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic first-events --group First-group
8. Command to describe the consumer group. It will present the topic list from our topic consuming the data and also offset value and active consumer id and lag.
$bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --describe --group First-group
9. Here is a command to consume messages with the keys. When the producer writes the data by attaching keys to it then at that time of consuming the message we need to provide print.key & key.separator. If the producer writes the data without attaching any key then the key would be null.
$bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic first-events --from-beginning -property print.key=true -property key.seperator=,
10. Command to resetting the offsets. Resetting the offset value means defining the point from where the user wants to read the messages again. It supports only one consumer group at a time, and there should be no active instances for the group.
$bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group First-group --topic first-events --reset-offsets --shift-by 100 --execute
This blog is very useful for those who just started to learn Kafka. This blog describes the basic terms of Kafka and also some of the very important and basic Kafka CLI commands.