Kafka Connect Concepts

Reading Time: 3 minutes

Kafka Connect is a data distribution framework within and outside of Apache Kafka®. The Confluent Platform is deployed with a few built-in connectors that can be used to stream data to or from commonly used systems such as related websites or HDFS.

Kafka Connect

To effectively discuss the internal functionality of Kafka Connect, it is helpful to establish a few key concepts:

  • Connectors
  • Tasks
  • Workers
  • Converters
  • Transform
  • Dead letter queue

Connectors: These are used to coordinate and manage data copying between Kafka but also other systems. It describes where data should be copied or from anywhere. The creation of a connector model takes place responsible for controlling the copying of data within Kafka. and other programs. In addition, the classes used or used by the connector are defined in the connector plugin.

Tasks: Tasks are responsible for the primary implementation of data copying. In addition, they are key players in the data connection model. And with minimal configuration, Kafka Connect provides built-in support for duplicate data copying.

Workers: As we know that connectors and work are a logical unit of work. It should therefore be organized to work through a process known as staff. Types of employees –

  • Standalone
  • Distributed

Converters: They are used by functions to change the data format from byte to the internal data format of Connect and vice versa. In addition, they are responsible for serializing and extracting data.

  • When Kafka connects as a source the converter creates a series of data obtained from the connector. Later push serialized data into Kafka collection.
  • When Kafka connects as a sink the converter performs data extraction of data read in the Kafka collection. Also, send data to the connector.

Transform: The main purpose of transform is to change the data so that it is simpler and less expensive. Ideal for small data adjustments and event routes. Accept one record as input and issue a modified record.

Dead letter queue: There are many reasons why an invalid record may occur. The major errors that occur are serialization and deserialization (serde) errors. For example, an error occurs when the record reaches the sink connector in the JSON format, but the setting of the sink connector awaits another format, such as Avro. In the event of serde errors, the connector does not stop. Instead, the connector continues to process the records and send errors to the Dead Character Line. You can use the record titles in the Deadline Line title record to identify and correct an error if it occurs.

The Purpose of Kafka Connect

It consists of two things –

  • Source Connector
  • Sink Connector

Source Connector’s purpose is to pull data from data sources and publish it to the Kafka Cluster. Therefore to achieve this Source connector internally uses Kafka Producer API.

Sink Connector’s purpose is to consume data from the Kafka Cluster and sync it to the target data source. However, this is achieved by internally using Kafka Consumer API.

Kafka Connect Features

  • A common framework for Kafka connectors
  • Distributed and standalone modes
  • REST interface
  • Automatic offset management
  • Distributed and scalable by default

Limitations of Kafka Connect

  • At the current time, there is a very less selection of connectors.
  • The separation of commercial and open-source features is very poor.
  • Also, it lacks configuration tools.
  • To deploy custom connectors (plugins), there is a poor/primitive approach.
  • It is very much Java/Scala-centric.

References

To read more about Kafka connector, visit this blog.

knoldus

Written by 

Hi, I'm Software Consultant with experience in technologies like Core Java, Advance Java, Functional Programming, and looking forward to learn and explore more into this field. I also love competitive programming, solving live problems on Leetcode, CodeChef.