A Quick Insight of Kafka Connect/Connector

Reading Time: 3 minutes

Overview

In this blog, we are going to discuss Kafka connect/connectors in detail. If you want a basic introduction to Kafka Connect then you can refer to this blog. Now we will understand concepts such as Kafka Connect, Kafka connectors, and Kafka Convertors.

Kafka Connect

Kafka Connect is a framework to stream data into and out of Apache Kafka®. The Confluent Platform ships with several built-in connectors that can be used to stream data to or from commonly used systems such as relational databases or HDFS.

To understand more let’s discuss the components of Kafka:-

  • Connectors are responsible for the interaction between Kafka Connect and the external technology being integrated with
  • Converters handle the serialization and deserialization of data
  • Transformations can optionally apply one or more transformations to the data passing through the pipeline

Kafka Connector

The Kafka connector is designed to run in a Kafka Connect cluster to read data from Kafka topics and write the data into Snowflake tables. Kafka Connectors are ready-to-use components, which can help us to import data from external systems into Kafka topics and export data from Kafka topics into external systems. We can use existing connector implementations for common data sources and sinks or implement our own connectors. For example:

  • The Debezium MySQL source connector uses the MySQL bin log to read events from the database and stream these to Kafka Connect
  • The Elasticsearch sink connector takes data from Kafka Connect, and using the Elasticsearch APIs, writes the data to Elasticsearch
  • The S3 connector from Confluent can act as both a source and sink connector, writing data to S3 or reading it back in

Here’s an example of creating a connector instance in Kafka Connect using the Elasticsearch sink connector:

curl -X PUT -H  "Content-Type:application/json" http://localhost:8083/connectors/sink-elastic-01/config \
    -d '{
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "topics"         : "orders",
    "connection.url" : "http://elasticsearch:9200",
    "type.name"      : "_doc",
    "key.ignore"     : "false",
    "schema.ignore"  : "true"
}'

The above is a call to Kafka Connect’s REST API. You can also use ksqlDB to manage connectors. The syntax for the above connector looks like this:

CREATE SINK CONNECTOR sink-elastic-01 WITH (
  'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
  'topics'          = 'orders',
  'connection.url'  = 'http://elasticsearch:9200',
  'type.name'       = '_doc',
  'key.ignore'      = 'false',
  'schema.ignore'   = 'true'
);

Workflow of Kafka Connectors

  1. The Kafka connector subscribes to Kafka topics based on the configuration provided by the Kafka configuration file or command line
  2. Then the connector creates objects for a topic as follows:
  3. One internal stage to temporarily store data files for each topic.
  4. One pipe to ingest the data files for each topic partition.
  5. One table for each topic. If the table specified for each topic does not exist, the connector creates it; otherwise, the connector creates the RECORD_CONTENT and RECORD_METADATA columns in the existing table and verifies that the other columns are nullable (and produces an error if they are not).

Kafka Converters

Converters help to change the format of data from one format to another format. Converters are decoupled from connectors to allow the reuse of converters between connectors naturally. The Converter used at Source and Sink can take input and output to a different set of formats.

Kafka Connect is modular in nature, providing a very powerful way of handling integration requirements. Some key components include:

  • Connectors – the JAR files that define how to integrate with the data store itself
  • Converters – handling serialization and deserialization of data
  • Transforms – optional in-flight manipulation of messages

Conclusion

In conclusion, in this blog, we have learned about Kafka Connect, Connectors, and Kafka convertors. I will cover more topics in the further blogs.

References:-

Confluent Documentation, Snowflake Documentation, Oracle Documentation

For a more technical blog, you can refer to the Knoldus blog: https://blog.knoldus.com/

Written by 

Bhavya is a Software Intern at Knoldus Inc. He has completed his graduation from IIMT College of Engineering. He is passionate about Java development and curious to learn Java Technologies.