Set-up Kafka Cluster using Kubernetes Statefulset

Reading Time: 3 minutes

Hi readers, In this blog, we will be setting up a Kafka Statefulset cluster using Kubernetes and also get a basic knowledge of Statefulset.

Kafka on K8s


StatefulSet is the workload API object used to manage stateful applications.

Manages the deployment and scaling of a set of Podsand provides guarantees about the ordering and uniqueness of these Pods.


Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java.
Basic Terminology
keeps track of the status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions ,etc.

Kafka producer is an application that can act as a source of data in a Kafka cluster.

The primary role of a Kafka consumer is to take Kafka connection and consumer properties to read records from the appropriate Kafka broker.

Set-up Kubernetes cluster

We will be setting up our Kubernetes cluster on the Goolge Cloud platform. Here are some basic steps which let you set Kafka on google cluster.

Dashboard Google Cloud Platform

Select any project on which you want to set-up clusters. Hover over Kubernetes Engine then select the cluster option.

Kubernetes Cluster Menu

Select the Create Cluster option and set your cluster according to your uses. I have set-up basic cluster for sample Kafka application.

Set-up Cluster -1

Give a name to your Cluster and change setting if have other specific need.

Set-up Cluster-2

I have given the 2 CPU Core for each node for proper resource availability.

Connect to Cluster Command

Install gcloud on your system then run the following command to connect the cluster to your system.

$ gcloud container clusters get-credentials k8 --zone us-central1-a --project knoldus-264306
Fetching cluster endpoint and auth data.
kubeconfig entry generated for k8.

After running the command you will get following statements.Now verify the cluster by checking list of nodes.

$ kubectl get nodes
NAME                                STATUS   ROLES    AGE   VERSION
gke-k8-default-pool-de2de537-4n5g   Ready    <none>   27h   v1.13.11-gke.14
gke-k8-default-pool-de2de537-7dj1   Ready    <none>   27h   v1.13.11-gke.14
gke-k8-pool-1-36b3b91a-j56f         Ready    <none>   27h   v1.13.11-gke.14

As we can see here i have set-up cluster of 3 nodes.

$ kubectl apply -f zookeeper.yaml 
service/zk-cs created
poddisruptionbudget.policy/zk-pdb created
statefulset.apps/zk created

Runing the zookeeper.yaml will create the zookeeper service, poddisruptionbudget and statefulset.

$ kubectl apply -f kafka.yaml 
poddisruptionbudget.policy/kafka-pdb created
statefulset.apps/kafka created

Running the kafka.yaml will create Kafka service, poddisruptionbudget and statefulset.

$ kubectl run -ti\_containers/kubernetes-kafka:1.0-10.2.1 kafka-produce --restart=Never --rm -- --topic test -broker-list kafka-0.kafka-hs.default.svc.cluster.local:9093,kafka-1.kafka-hs.default.svc.cluster.local:9093,kafka-2.kafka-hs.default.svc.cluster.local:9093
If you don't see a command prompt, try pressing enter.


Running the kafka producer with given command .

$ kubectl run -ti\_containers/kubernetes-kafka:1.0-10.2.1 kafka-consume --restart=Never --rm -- --topic test -bootstrap-server kafka-0.kafka-hs.default.svc.cluster.local:9093
If you don't see a command prompt, try pressing enter.


And then we create Kafka consumer and we can see that is ready to consume data. I have input data as Knoldus, welcome to producer and it has been processed to the consumer.


Kafka runs as a cluster of brokers, and these brokers can be deployed across a Kubernetes system and made to land on different workers across separate fault domains. Kubernetes automatically recovers pods when nodes or containers fail, so it can do this for your brokers too.


Written by 

I always love to learn and explore new technologies. Having working skills in Linux, AWS, DevOps tools Jenkins, Git, Maven, CI-CD, Ansible, Scripting language (shell/bash), Docker as well as ELK stack and Grafana for implementing logging and visualization technology.