In this blog post i present an example that creates a pipeline to read data from a single topic/multiple topics from Apache Kafka and write data into a topic in Google Pub/Sub. The example provides code samples to implement simple yet powerful pipelines.also provides an out-of-the-box solution that you can just ” compatiable.This consicutive example is build in Apache Beam.And it can be downloaded here.So, we hope you will find this example useful for setting up data pipelines between Kafka and Pub/Sub.



Example specs
Supported data formats:
- Serializable plain text formats, such as JSON
- PubSubMessage
PubsubMessage
Publisher publish messages that subscriber consumed.The message must contain either a non-empty data field or at least one attribute. Note that client libraries represent this object differently depending on the language. Hence,see the corresponding client library documentation for more information.See quotas and limits for more information about message limits.
Also,Google Cloud Pub/Sub is reliable,scalable,fully-managed asynchronous messaging service for exchanging event data among applications and services.By decoupling senders and receivers, it allows for secure and highly available communication between independently written applications. Google Cloud Pub/Sub delivers low-latency/durable messaging and is commonly used by developers in implementing asynchronous workflows,distributing event notifications,and streaming data from various processes or devices.
JSON representation |
---|
{
"data": string,
"attributes": {
string: string,
...
},
"messageId": string,
"publishTime": string,
"orderingKey": string
}
Supported input source configurations:
- Single/multiple Apache Kafka bootstrap servers
- Apache Kafka SASL/SCRAM authentication over plaintext/SSL connection
- Secrets vault service HashiCorp Vault
Supported destination configuration:
- Single Google Pub/Sub topic
So,in a simple scenario, the example will create an Apache Beam pipeline that will read messages from a source Kafka server with a source topic, and stream the text messages into specified Pub/Sub destination topic. Also, in other scenarios may need Kafka SASL/SCRAM authentication, that can be performed over plaintext or SSL encrypted connection. Hence,the example supports using a single Kafka user account to authenticate in the provided source Kafka servers and topics.We will need an SSL certificate and access to a secret vault service currently using HashiCorp Vault, to support SASLauthentication over SSL in the example.
Create and use topics
To create a topic, follow these steps:
- In the Cloud Console, go to the Pub/Sub Topics page.Go to Topics
- Click Create topic.
- In the Topic ID field, enter an ID for your topic.
- Retain the option Add a default subscription.
- Do not select the other options.
- Click Create topic.



Pub/Sub Topic and Subscription setup
- In the Cloud Shell window, set an environment variable to the project identifier.
export PROJECT_ID=[PROJECT_ID]
- (a) Firstly, Configure the Pub/Sub topics to communicate with Kafka:
gcloud pubsub topics create to-kafka from-kafka
2. Secondly to create a subscription for the to-kafka topic:
gcloud pubsub subscriptions create to-kafka-sub --topic=to-kafka --topic-project=$PROJECT_ID
Now Pub/Sub is configured with two topics.A subscription has also been created on the “to-kafka” topic using the PROJECT_ID variable.
Although, Pub/Sub can used to consume messages. See Pub/Sub in console.
3. Now create a subscription for traffic published from Kafka:
gcloud pubsub subscriptions create from-kafka --topic=from-kafka
Data exchange between Kafka and Pub/Sub
Moreover, test Kafka to Pub/Sub (producer/consumer) communication by opening a new SSH window where the Kafka commands will be run.
- So,now open a new SSH connection to the Kafka VM, this is SSH Window
- Enter the following command to initiate a Kafka console:
kafka-console-producer.sh --broker-list localhost:9092 --topic to-pubsub
2. From the Kafka console enter the following data elements, then press “Ctrl+c” to terminate the command entry:
{"message":"Hello"}
{"message":"Big Data"}
3.Return to Cloud Shell to see the information submitted earlier:
gcloud pubsub subscriptions pull from-kafka --auto-ack --limit=10
Where can I run this example?
There are two method to implement the pipeline.
- Internally, this way has many options – run directly from our IntelliJ, to create
.jar
file and run it in the terminal.Use our favourite method of running Beam pipelines. - In Google Cloud using Google Cloud Dataflow:
- With
gcloud
command-line tool you can create a Flex Template and execute it in Google Cloud Platform.Therefore, this requires corresponding modifications of the example to turn it into a diagram. - So,this example exists as a Flex Template version within Google Cloud Dataflow Template Pipelines repository and can be run with no additional code modifications.
- With



Refrences:
Conclusion:
In conclusion,i hope this blog will help you understand how pipelines work and look.If you are already using Beam, i hope these samples will be useful for your use cases.