Let us discuss something about Kafka connector and some basic fundamental of it. Before start, we need to have basic knowledge of Kafka or we can go through this Document.
We’ll have a look at:
- What is Kafka Connect
- Why we need Kafka Connect/connector
- How kafka Connect works
- Source Connector
- Sink Connector
- Kafka Connector Framework
- Let’s how we can implement the kafka Connector framework
1- What is Kafka Connect
Kafka Connect is the ecosystem of connectors into or out of Kafka. There are lots of existing connectors, e.g. for databases, key-value stores or file systems. So you can read data from a RDMS and push it to Elasticsearch or flat files.
2- Why we need Kafka Connect/connector
As we know Kafka was initially develop to solve the data integration problems, like you may have independent and running applications. some of them are custom design and developed in house, others may have been purchase from third party application bender, However they also need some additional data which is created and own by other systems.
So as we can see in the diagram the main problem is data integration.
Let’s See how we can solve this problem using Kafka connector,
When you rich this stage it becomes almost impossible to minuteness here we use kafka connector to simplify this pipeline so how they did it.
let’s look the simplify version of the problem, we have a application and the application could be a invoicing application or whatever. but we are going to assume app has a back-end database where this maintained all the generated data now we have a requirement to bring some of the data from here to a snowflake data-where house.
We decided to do kafka as a broker because kafka will keep you data simple,bringing data from your invoicing system to the kafka broker is a one time activity once it’s in the kafka cluster you can bring to the snowflake and if you need to move the same data to the other applications they can also consume it from the broker all this pipelines are going to be a one to one link. and id kafka connector is not here then you have to build many pipelines and still work load is not going to handle.
That’s what for kafka connect is design for , kafka connect is system which you can place in between your data source and the kafka cluster.after that you just need to do the configurations that’s it.
4- How kafka Connect works
So as we already discussed you just need to place the kafka connect in between your source and target systems and the kafka clusters then you configure it. and kafka connect will take care of copying data from one to another you don’t have to write a single line of code.
Here as we can see in the diagram there are two major component of the kafka connector.
- Source Connector
- Sink Connector
5- Source Connector
We use the source Connector to pull data from the source system and send it to the kafka cluster and source connector will internally use kafka producer API.
6- Sink Connector
We use the Sink connector to consume the data from kafka topic and sink it to an external system these sink connector will internally use the kafka consumer API.
7- Kafka Connector Framework
kafka developer makes a smart decision and make a brand new framework for implementing the kafka connector and they named is kafka Connect Framework.and the framework is open source, this framework allows you to write Connectors.
8- Let’s how we can implement the kafka Connector framework
kafka Connector Framework is implemented in two ways
1- Source Connector
- SourceConnector
- SourceTask
2 – Sink Connector
- SinkConnector
- SinkTask
kafka Connect framework take care of all the heavy lifting, scalability,fault tolerance, error handling and bunch of other things.
As a connector developer all you need to do is to implement two java classes 1st one is source connector and sink connector and the 2nd one is sourceTask and sinkTask there are many other details to is but that’s what we do at high level to create a kafka connector framework
SourceConnector.java
package org.apache.kafka.connect.source;
import org.apache.kafka.connect.connector.Connector;
public abstract class SourceConnector extends Connector {
@Override
protected SourceConnectorContext context() {
return (SourceConnectorContext) context;
}
}
SourceTask.java
public void commitRecord(SourceRecord record) throws InterruptedException {
}
public void commitRecord(SourceRecord record, RecordMetadata metadata)
throws InterruptedException {
commitRecord(record);
}
In this blog we had look at basic understanding of kafka connect and we looked at types of connectors, both source and sink. And finally, we learned some key points of kafka framework.
That’s it, folks. I hope you liked the blogs. Thanks!
Reference
https://docs.confluent.io/platform/current/connect/index.html
To read more tech blogs, visit Knoldus Blogs.