How to avoid outages in your Kubernetes Cluster using PodDisruptionBudgets:-

kubernetes
Reading Time: 4 minutes

Want to protect your application from getting unavailable due to node disruption. Then you are at the right place. In this blog, I will be showing you how you can avoid disruption of your node from affecting your application. Be it in case of node upgrade, node failure, or unintentional deleting of the pod. With the help of PDB that is, Pod Disruption Budget. But before moving to PDB let’s have a look at how many types of disruptions might occur. So there are two kinds of disruptions that might affect the availability of your application named Voluntary and Involuntary Disruptions.

Types of Disruptions:

Involuntary Disruption

Pods do not disappear until a person or a controller destroys them. Or there is an unavoidable hardware or system software error. Talking about Involuntary disruptions this can happen due to the following cases. A hardware failure of the physical machine backing the node. Cluster administrator deletes VM (instance) by mistake. Cloud provider or hypervisor failure makes VM disappear. A kernel panic. Node disappears from the cluster due to cluster network partition. Or eviction of a pod due to the node being out-of-resources. Then we have Voluntary Disruptions which occurs due to unintentional mistake of the application owner or by a Cluster Administrator. Reasons for the same include: deleting the deployment or other controller that manages the pod. Updating a deployment’s pod template causes a restart, directly deleting a pod. Draining a node for repair or upgrade. Draining a node from a cluster to scale the cluster down. Removing a pod from a node to permit something else to fit on that node.

Voluntary Disruption

We can’t do much about the Involuntary ones but Voluntary ones can be prevented. And yes by using PDB. A Pod Disruption Budget is a Kubernetes resource that defines the budget of voluntary disruption telling our k8s cluster a minimum threshold in terms of available pods that the cluster must maintain every time to ensure a baseline availability or performance. An Application Owner can create a PodDisruptionBudget object (PDB) for each application. A PDB limits the number of pods of a replicated application that are down simultaneously from voluntary disruptions. And will not let the number go below-mentioned the threshold. Now, imagine the cluster administrator draining the node, this will try to evict the pods on that particular node. But guess what the eviction request will be temporarily rejected because as soon as the minimum threshold will reach PDB will not let pods get evicted until minimum pods get scheduled on another node

PodDisruptionBudget:

To create a PDB we write a K8s YAML file which will look something like this. The points to note while creating a PDB resource is. First, we determine a threshold using minAvailbale or maxUnavilable parameters in the spec section. And second, is matchLabels which must match your pod labels through these labels only PDB will look for which pods to check for.

Deployment.yml

So we have a simple nginx deployment shown below with replicas equal to 4.

Pdb.yml

And here is the YAML file for our PodDisruptionBudget with minAvailable equals 2.

Let us apply these configurations using the command kubectl apply -f deploy.yml pdb.yml.

Now if we see the nodes running, we have a two-node minikube cluster. Both nodes are in Ready state and our pods are deployed on the worker node(multinode-demo-m02). And we have 4 replicas as mentioned for our deployment. Finally, we have a PDB budget determining Allowed Disruptions to be 2 only. That is there must be two pods up and running every time in the cluster.

Now let us drain the cluster using kubectl drain command kubectl drain multinode-demo-m02 –ignore-daemonsets. When the node started draining we can see that it successfully evicted two pods. But when it tried to evict the third pod it showed cannot evict the pod. As this violates the pod disruption budget. Getting our pods now with wide-output will show that unless two pods are up and running on the other nodes. Pods on the drain node will not get evicted. And as soon as the other node will be up, all pods will get schedule on the other node.

Now, what if I drain this node as well. Let us try this. As we can see two pods are still running. And PDB will not let the node get drained until a new node is up and any two pods get scheduled on them.

So in this way you can protect your application from getting unavailable by setting a threshold limit. And ensure that a certain number of pods are available every time. Continuously providing service to a functioning application in the face of disruption.

Reference:

You can refer to this blog for the understanding of Aws Node Termination Handler: https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks/070_selfmanagednodegroupswithspot/deployhandler.html

Link to the blog on how to safely upgrade nodes in Kubernetes cluster: https://blog.knoldus.com/safely-upgrading-nodes-in-kubernetes-cluster/

Written by 

I am a person who is positive about every aspect of life. I am determined, hardworking and I enjoy facing challenges.