The architecture of Kafka Connect

Reading Time: 3 minutes

Kafka Connect

Kafka Connect is an open-source component and framework to get it connected with external systems, including the databases. There are connectors that help to move huge data sets into and out of the Kafka system. Kafka Connect is only used to copy the streamed data, thus its scope is not broad. It executes as an independent process for testing and a distributed, scalable service support for an organization.

The architecture of Kafka Connect

First, we have to look at the key building blocks of the Kafka Connect API. When using a new connector, you must provide SinkConnector or SourceConnector code and use either SinkTask or SourceTask.

The connector describes the configuration of the task (name of the implementation class and its parameters). Connectors return a set of configuration parameters and can notify Kafka Connect when those functions need to be restructured.

Kafka Connect has 3 major models in its architectural design:-

  • Connector Model
  • Worker Model
  • Data Model

Connector Model

 A connector is defined by specifying a connector class and configuration options to control what data is copied and how to format it. Each Connector instance is responsible for defining and updating a set of Tasks that actually copy the data. Kafka Connect manages the Tasks; the Connector is only responsible for generating the set of Tasks and indicating to the framework when they need to be updated.

There are two types of tasks:

  • Source – Source tasks ingest data from data storage systems and pull records from another system for storage in Kafka.
  • Sink – Sink tasks stream data and delivers data from Kafka topics into other systems, which might be indexed such as Elasticsearch, batch systems such as Hadoop, or any kind of database.

 Source and Sink Connectors/Tasks are distinguished in the API to ensure the simplest possible API for both.

Worker model

The employee model allows Kafka Connect to grow in the app. It can be reduced to a single employee process that also serves as its coordinator, or in an integrated model where connectors and tasks are dynamically organized to employees. However, it takes very little for human resource management, so it can easily work for various cluster managers or use standard service monitoring. This design allows for scale up and down, but the implementation of Kafka can add resources to support both methods effectively. REST’s visual interface for managing and monitoring activities makes it easy to use Kafka Connect as an organization-wide service that uses multi-user functions. Special command line services for temporary jobs make it easy to get up and running in a development area, for testing, or in production areas where an agent-based approach is required.

Data Model

The data model addresses the remaining requirements. Many benefits come from a close association with Kafka. Kafka acts as a natural barrier to both streaming and batch systems, removing heavy data burden and ensuring delivery to connector developers. Additionally, by always needing Kafka as one of the conclusions, a large data pipeline can use many tools that are well integrated with Kafka. This allows Kafka Connect to focus exclusively on copying data because various streaming processing tools are available to further process data, which keeps it simpler, both intellectually and in use. This is very different from other systems where the ETL must be present before hitting the sink. In contrast, It can book an ETL process, leaving any modification to the tools designed for that purpose. Finally, Kafka incorporates the separation into its main take, providing another point of similarity

Architectural Features of Kafka Connect

  • Common Framework: The Kafka Connect allows to integrate of other systems . Therefore works as a common framework for the connectors. This makes the connector deployment, management as well as development simple.
  • Can work in standalone or distributed modes: It can either scale up to provide centrally managed service support to an organization or scale down for testing, developing, and deploying small productions.
  • REST interface: REST API submits as well as manages Kafka connectors to
  • Manages offset automatically: It automatically manages the commit process by getting little information from the Connectors.
  • Distributed as well as scalable: By default, it is scalable and distributed. Therefore, the number of workers can be extended for scaling up the Kafka Connect cluster.
  • Streaming or batch integration: Kafka Connect provides the solution to bridge the streaming and batch systems.

Conclusion

In conclusion, in this blog, we have learned about the basic architecture of Kafka connect and its components. We have also discussed some of the architectural features of Kafka connect. I will cover more topics in the further blogs.

References:-

Confluent DocumentationSnowflake DocumentationOracle Documentation

For a more technical blog, you can refer to the Knoldus blog: https://blog.knoldus.com/

Written by 

Bhavya is a Software Intern at Knoldus Inc. He has completed his graduation from IIMT College of Engineering. He is passionate about Java development and curious to learn Java Technologies.