Streaming Kafka Messages to Google Cloud Pub/Sub

Reading Time: 3 minutes

In this blog post i present an example that creates a pipeline to read data from a single topic/multiple topics from Apache Kafka and write data into a topic in Google Pub/Sub. The example provides code samples to implement simple yet powerful pipelines.also provides an out-of-the-box solution that you can just ” compatiable.This consicutive example is build in Apache Beam.And it can be downloaded here.So, we hope you will find this example useful for setting up data pipelines between Kafka and Pub/Sub.

Example specs

Supported data formats:


Publisher publish messages that subscriber consumed.The message must contain either a non-empty data field or at least one attribute. Note that client libraries represent this object differently depending on the language. Hence,see the corresponding client library documentation for more information.See quotas and limits for more information about message limits.

Also,Google Cloud Pub/Sub is reliable,scalable,fully-managed asynchronous messaging service for exchanging event data among applications and services.By decoupling senders and receivers, it allows for secure and highly available communication between independently written applications. Google Cloud Pub/Sub delivers low-latency/durable messaging and is commonly used by developers in implementing asynchronous workflows,distributing event notifications,and streaming data from various processes or devices.

JSON representation
  "data": string,
  "attributes": {
    string: string,
  "messageId": string,
  "publishTime": string,
  "orderingKey": string

Supported input source configurations:

  • Single/multiple Apache Kafka bootstrap servers
  • Apache Kafka SASL/SCRAM authentication over plaintext/SSL connection
  • Secrets vault service HashiCorp Vault

Supported destination configuration:

  • Single Google Pub/Sub topic

So,in a simple scenario, the example will create an Apache Beam pipeline that will read messages from a source Kafka server with a source topic, and stream the text messages into specified Pub/Sub destination topic. Also, in other scenarios may need Kafka SASL/SCRAM authentication, that can be performed over plaintext or SSL encrypted connection. Hence,the example supports using a single Kafka user account to authenticate in the provided source Kafka servers and topics.We will need an SSL certificate and access to a secret vault service currently using HashiCorp Vault, to support SASLauthentication over SSL in the example.

Create and use topics

To create a topic, follow these steps:

  1. In the Cloud Console, go to the Pub/Sub Topics page.Go to Topics
  2. Click Create topic.
  3. In the Topic ID field, enter an ID for your topic.
  4. Retain the option Add a default subscription.
  5. Do not select the other options.
  6. Click Create topic.

Pub/Sub Topic and Subscription setup

  1. In the Cloud Shell window, set an environment variable to the project identifier.
  1. (a) Firstly, Configure the Pub/Sub topics to communicate with Kafka:
gcloud pubsub topics create to-kafka from-kafka

2. Secondly to create a subscription for the to-kafka topic:

gcloud pubsub subscriptions create to-kafka-sub --topic=to-kafka --topic-project=$PROJECT_ID

Now Pub/Sub is configured with two topics.A subscription has also been created on the “to-kafka” topic using the PROJECT_ID variable.

Although, Pub/Sub can used to consume messages. See Pub/Sub in console.

3. Now create a subscription for traffic published from Kafka:

gcloud pubsub subscriptions create from-kafka --topic=from-kafka

Data exchange between Kafka and Pub/Sub

Moreover, test Kafka to Pub/Sub (producer/consumer) communication by opening a new SSH window where the Kafka commands will be run.

  1. So,now open a new SSH connection to the Kafka VM, this is SSH Window
  2. Enter the following command to initiate a Kafka console: --broker-list localhost:9092 --topic to-pubsub

2. From the Kafka console enter the following data elements, then press “Ctrl+c” to terminate the command entry:

{"message":"Big Data"}

3.Return to Cloud Shell to see the information submitted earlier:

gcloud pubsub subscriptions pull from-kafka --auto-ack --limit=10

Where can I run this example?

There are two method to implement the pipeline.

  1. Internally, this way has many options – run directly from our IntelliJ, to create .jar file and run it in the terminal.Use our favourite method of running Beam pipelines.
  2. In Google Cloud using Google Cloud Dataflow:



In conclusion,i hope this blog will help you understand how pipelines work and look.If you are already using Beam, i hope these samples will be useful for your use cases.

Written by 

He is a Software Consultant at Knoldus Inc. He has done B.Tech from Dr. APJ Kalam Technical University Uttar-Pradesh. He is passionate about his work and having the knowledge of various programming languages like Java, C++, Python. But he is passionate about Java development and curious to learn Java Technologies. He is always impatient and enthusiastic to learn new things. He is good skills of Critical thinking and problem solving and always enjoy to help others. He likes to play outdoor games like Football, Volleyball, Hockey and Kabaddi. Apart from the technology he likes to read scriptures originating in ancient India like Veda,Upanishad,Geeta etc.