What is Apache Kafka?
Apache Kafka is a distributed commit log for fast, fault-tolerant communication between producers and consumers using message based topics.
Kafka provides the messaging backbone for building a new generation of distributed applications capable of handling billions of events and millions of transactions.
Why would you use Kafka?
- Apache Kafka is capable of handling millions of data or messages per second.
- Kafka is use to build real-time streaming data pipelines and real-time streaming applications.
- Kafka is also often use as a message broker solution, which is a platform that processes and mediates communication between two applications.
- It is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system
- Apache Kafka is having extremely high performance.
How does Kafka work?
Kafka combines two messaging models, queuing and publish-subscribe.
In this Kafka messaging system, a pool of Kafka consumers may read from a server. Also, each record goes to one of them here. It has some strengths as well as some weaknesses. Its strength is that it permits us to divide up the processing of data over multiple consumer instances, which help us scale our processing. Its weakness is, it is not a multi-subscriber, as soon as one process reads the data it’s gone
The publish-subscribe approach is multi-subscriber, but because every message goes to every subscriber it cannot be use to distribute work across multiple worker processes.
Kafka Architectural Component
Kafka components include :
Let’s see all of them one by one.
- Kafka cluster typically consists of multiple brokers to maintain load balance.
- Kafka brokers are stateless, so they use ZooKeeper for maintaining their cluster state.
- One Kafka broker instance can handle hundreds of thousands of reads and writes per second and each broker can handle TB of messages without performance impact.
- Kafka broker leader election can be done by ZooKeeper. This means in case of data loss zookeeper decide which broker to make a master and which broker to make a slave.
- Zookeeper plays an important role in Kafka system. it’s used to manage and coordinate with the broker.
- It’s service is mainly used to notify producer and consumer about the presence or failure of any new broker in the Kafka system.
- As per the notification received by the Zookeeper regarding presence or failure of the broker then producer & also consumer take a decision and starts coordinating their task with some other broker.
- Producers push data to brokers.
- When the new broker is start, all the producers search it and automatically sends a message to that new broker.
- Kafka producer doesn’t wait for acknowledgments from the broker and sends messages as fast as the broker can handle.
- For more details about kafka producer internal check out link
- Since Kafka brokers are stateless, which means that the consumer has to maintain how many messages have been consume by using partition offset.
- If the consumer acknowledges a particular message offset, it implies that the consumer has consumed all prior messages.
- The consumer issues an asynchronous pull request to the broker to have a buffer of bytes ready to consume.
- The consumers can rewind or skip to any point in a partition simply by supplying an offset value. The consumer offset value is notified by ZooKeeper.
Where Apache Kafka fits in
To starts basic hands-on with Apache kafka refer the link Example
This much basic information is good to get start with apache kafka & hands-on.
Our next blog will be dedicate to the Kafka operations where we learn Kafka Operations, such as addition and deletion of Kafka Topics, Modifying the Kafka Topics, Distinguishing the Turnoff and many more.