A Basic understanding of Kafka Connect

Reading Time: 4 minutes

Let us discuss something about Kafka connector and some basic fundamental of it. Before start, we need to have basic knowledge of Kafka or we can go through this Document.

Apache Kafka is a distributed, resilient, fault tolerant platform. Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform.

It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers.Apache Kafka is a distributed, resilient, fault tolerant platform .

Table of content

  1. what is Kafka Connect
  2. Architecture of Kafka connect
  3. Connectors and tasks
  4. Sources ans sinks
  5. Workers
  6. Standalone vs distributed Mode
  7. Features
  8. alternatives
  9. Conclusion

What is Kafka Connect?

Apache Kafka is a distributed streaming platform and kafka Connect is framework for connecting kafka with external systems like databases, key-value stores, search indexes, and file systems, using so-called Connectors. Kafka Connect is only used to copy the streamed data, thus its scope is not broad.It executes as an independent process for testing and a distributed, scalable service support for an organization.

Kafka connect makes our task much easier to connect Kafka to the other systems, without having to write all the glue code yourself.

common Kafka Use Cases:

Source ->KafkaProducer APIKafka Connect Source
Kafka <-> KafkaConsumer API, Producer APIKafka Streams
Kafka <-SinkConsumer APIKafka Connect Sink

Architecture of kafka connect

AMQ Streams With Kafka Connect on Openshift - DZone Big Data
Let’s discuss above architectural structural diagram,
  • It is a separate Cluster.
  • Each Worker contains one or many Connector Tasks.
  • A cluster can have multiple workers and worker runs on the cluster only.
  • Tasks are automatically load-balanced if there is any failure as shown in the picture below.
  • Above all, tasks in Kafka Connect act as Producers or Consumers depending on the type of Connector.
  • Kafka connect cluster has multiple loaded connectors

Connectors and Tasks

Connectors are responsible to manage the tasks that will run. They must decide how data will be splitted to tasks, and provide tasks with specific configuration to perform their job well.

Tasks are responsible to get things in and out of Kafka. They get their context from the worker. Once initialized, they are started with a Properties object, containing connectors configuration. Once started, the tasks poll an external source and return a list of records (and the worker will send those data to a Kafka broker).

Sources and Sinks 

Kafka Connects focused on streaming data to and from kafka, According to direction of the data moved, the connector is classified as:

Source connector – Ingests entire databases and streams table updates to Kafka topics. A source connector can also collect metrics from all your application servers and store these in Kafka topics, making the data available for stream processing with low latency.
Sink connector – Delivers data from Kafka topics into secondary indexes such as Elasticsearch, or batch systems such as Hadoop for offline analysis.

Workers 

Tasks are executed by Kafka connect workers

  • A worker is a single java process
  • Workers run Connectors (each connector is class inside a jar file)
  • A Worker can run in standalone mode or distributed mode
  • If a worker crashes, a rebalance will occur (the heartbeat mechanism in the Kafka consumer’s Protocol is applied here)
  • If a worker joins a Connect cluster, other workers will notice that and assign connectors or tasks to this new worker, in order to balance the cluster.To join a cluster, a worker must have the same group.id property.

Standalone vs Distributed Mode

Standalone

  • Single Process run both connectors and tasks.
  • Configuration use .properties files
  • Very easy to get start with, useful for development and testing.
  • Not fault tolerant, no scalability, hard to monitor

Distributed

  • Multiple workers run connectors and tasks
  • Configuration is performed by a REST API
  • easy to scale, and fault tolerant(rebalancing in case a worker dies)
  • Useful for production deployment of connectors.

Features

Kafka connect features include:

  • Common Framework For Kafka Connectors – makes the connector deployment easy.
  • REST Interface – we can manage connectors using a REST API
  • Automatic Offset management -Kafka Connect helps us to handle the offset commit process, which saves us the trouble of implementing this error-prone part of connector development manually
  • Distributed and Standalone Modes -Scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments.
  • Distributed and Scalable by Default – It builds upon the existing group management protocol. And to scale up a Kafka Connect cluster we can add more workers.
  • Streaming/Batch Integration – Kafka Connect is an ideal solution for bridging streaming and batch data systems in connection with Kafka’s existing capabilities
  • Transformations- these allow us to make simple and lightweight modifications to individual messages

alternatives

If You don’t want to use Kafka Connect to integrate Kafka with your other apps and databases. You can write your own code using the producer and Consumer API, or use the Stream API.

Or you could even use an integration framework that supports Kafka, like Apache Camel or Spring Integration.

Conclusion

In conclusion, In this blog, we have learned basics of Kafka Connector like features, use cases, Architecture etc. and in the next blog we will see how we can setup and Launch kafka connector.

If you want to know more about Apache Kafka, Streams and Connect, then I recommend these articles:

knoldus

Written by 

Anuradha Kumari is a Software consultant at Knoldus Inc. She is a tech enthusiast and likes to play with new technology and writing tech blogs.