Hello everyone, I know there are lot of blogs present on the kafka you can go through. So that rather than explaing the basic concepts of kafka and architecture, Here we will look into the kafka producer internals.
Will see that what happens internally when producer send the message into topic. Also will see what happens when consumer consumes messages.
Let’s suppose producer wants to send 10 messages as shown in the below diagram to the topic “mytopic” and mytopic has 3 partitions. So here producers does not have any idea that how exactly messages are going to store into the “mytopic”. So what happens internally ?
By default messages are stored in each partition into the round robin fasion.
Let’s suppose producer sends ‘a’ first then a is going to stored in ‘P0’ partition at index ‘0’. Then kafka producer sends ‘b’ , now ‘b’ is not going to stored in partition ‘P0’. ‘b’ will be stored in partition ‘P1’ at index ‘0’. then ‘c’ in ‘P2′->’0’. Again the continues in round robin order i.e ‘d’ in ‘P0’->1, ‘e in ‘P1’->1, ‘f’ in ‘P2′->’1’ and so on as shown below
Ok. so now you can see in diagrams that each partitions have indexing i.e 0,1,2…… . So what is the significance of that, they called as offset .
We can define it as “The records in the partitions are each assigned a sequential id number called offset that uniquely identifies each record within the partition”.
Whenever the messages is stored in topic the message got a sequence_id. This offset are generally use to identify the location of messages. We can consider them just like the array indexing . But this offset provides more functionality.
Note: The most important to remember is that the significance of offset is only at partition level. Not on topic level. You will never see that this topic is stored at this offset and bla bla bla….
Three variations of offset:
- Log-end offset :- Offset of the last message written to a log/partition.
Log end offset gives us that how many number of messages are presnt in our partition. Conside the above diagram i.e 1.0.2 . You can see that in parttion P0 the log-end offset is 3 , it mean we know that the number messages will be 4 i.e “log-end offset +1” as indexing start with ‘0’. Therefore the log-end offset for partitions
P0 = 3
P1 = 2
P2 = 2
- Current Offset :- Ponyer to the last record that kafka has already sent to a consumer in the most recent poll.
- Commited Offset :- Marking an offset as consumed is called commiting an offset (Offset Commited).
When producer sends the messages it will send in the form of key and payload . Key is mainly used when you want send the messages at the partition level. Currently we are sending messages at the topic level.
The default value for key is null. So when the key is null messages are stored in round-robine manner. Just shown in the above dig’s.
When the key has particular value (the value could be any like alphabate or number), it will start sending messages into a specific partition.
So, this was all about the Kafka Producer Internals.
If you are a beginner and you want go for introduction of kafka then refer this