What is Kafka Connect?
Kafka Connect is an open-source component and framework to get Kafka connected with the external systems, including the databases. There are connectors that help to move huge data sets into and out of the Kafka system. Kafka Connect is only used to copy the streamed data, thus its scope is not broad.It executes as an independent process for testing and a distributed, scalable service support for an organization.
Kafka Connect provides existing connector implementations for moving some common data.So,there are two type of Kafka connectors:
1.Source Connector: A source connector takes the whole databases and streams table updates to the topics. It is able to collect metrics from the user’s entire application servers into the topics. This makes the data available for stream processing with low latency.
2.Sink Connector: Sink connector is used to deliver data from a topic into the secondary indices like the Hadoop system for offline analysis.
Architecture of Kafka Connect
So,the diagram shows the architecture of kafka connect in which we have:
- Sources– databases, JDBC, MongoDB, Redis, Solr etc., whose data we want to copy to the Kafka cluster
- In between source data and Kafka cluster there is a Kafka Connect cluster, which is combination of multiple Kafka Connect workers where connectors and tasks are running. The tasks are pulling data from the sources and push them safely to the Kafka cluster.
- We can also send our data from our Kafka cluster, to any sink- Amazon S3, Cassandra, Redis, MongoDB, HDFS, etc. The tasks will pull data from the Kafka cluster and write them to the sinks.
Features of Kafka Connect
So,now lets look at some of the features of Kafka connect:
-Common Framework: It works as a common framework for the connectors. The Kafka Connect allows to integrate other systems with Kafka. This makes the connector deployment, management as well as development simple.
-Can work in standalone or distributed modes: Kafka Connect can either scale up to provide centrally managed service support to an organization or scale down for testing, developing, and deploying small productions.
-REST interface: Submits as well as manages Kafka connectors to the Kafka Connect by REST API.
-Manages offset automatically: Kafka connect is able to automatically manage the commit process by getting little information from the Connectors.
-Distributed as well as scalable: By default, Kafka connect is scalable and distributed. Thus, we can extend the number of workers for scaling up the Kafka Connect cluster.
Use cases of Kafka Connect:
–Streaming pipelines from source to target system.
–Writing data from application to data stores from Kafka.
–Processing data from legacy application to new systems from Kafka.
Advantages of Kafka Connect:
-Data centric pipeline: Kafka Connect uses data abstraction to push or pull data to Apache Kafka.
-Flexible and scalable: Kafka Connect is able to execute with streaming and batch-oriented systems on a single node.
-Reusability and extensibility: Kafka Connect extends the existing connectors as per the user needs.
So,In this blog we have learned about kafka connect,its architecture,its features,use cases,advantages.We will learn more in further blogs.Thank you for reading.
For more, you can refer to: https://kafka.apache.org/documentation/
Also,For a more technical blog, you can refer to the knoldus blog: https://blog.knoldus.com/