Safely Upgrading Nodes in Kubernetes cluster

Reading Time: 3 minutes

In this blog, We will look at scenarios where you might have to take down a Node as part of your cluster for maintenance purposes like kernel upgrade, hardware maintenance, upgrading a base software, applying patches like security patches, etc on your cluster and we will see the options available to handle such cases.

So let’s assume that you have a cluster with a few nodes and pods serving applications.


What happens when one of these nodes go down?

Let’s assume that the node with blue and green pods goes down. Now the pods will not be accessible on the node that goes down. Now depending upon how you deployed those PODs your users may be impacted.

For example, since you have multiple replicas of the blue pod, the users accessing the blue application are not impacted as they are being served through the other blue PODS that are online. However, users accessing the green pod, are impacted as that was the only pod running the green application.

Now what kubernetes will do in this case?

If the node came back online immediately, then the kubectl process starts and the pods will come back online. However, if the node was down for more than 5 minutes, then the pods are terminated from that node and Kubernetes considers them as dead.

However, If the PODs are part of a ReplicaSet then they are recreated on other nodes.

Since the blue pod was part of a replica set, it had a new pod created On another node. However, since the green pod was not part of the replica set it’s just gone!

Thus if you have maintenance tasks to be performed on a node and if you know that the pods running on the Node have other replicas and if it’s okay that they go down for a short period of time and if you’re sure the node will come back online within five minutes you can make a quick upgrade and reboot.

However, if you are not sure that a node is going to be back online in five minutes (well you cannot say it is going to be back at all) then there is a safer way to do it.

Safer way to perform patches

We can purposefully drain the node so that the pods are moved to other nodes in the cluster. Well technically they are not moved. When you drain the node the pods are gracefully terminated from the node that they’re on and recreated on another.

We can easily drain our node named node-1 which contains blue and green pods using the following command:

kubectl drain node-1


The node is also cordoned or marked as unschedulable when we drain any particular node.It simply means no pods can be scheduled on this node until you specifically remove the restriction.

Now that the pods are safe on the others nodes, you can reboot the first node(node-1).

When node-1 comes back online it is still unschedulable. You then need to uncordon it, so that pods can be scheduled on it again. You can un cordon the node with the following command:

kubectl uncordon node-1

Now, the pods that were moved to the other nodes, don’t automatically fall back on this node. If any of those pods are deleted or if new pods are created in the cluster, Then they would be created on this node.

Extra 💡

Apart from “drain” and “uncordon” there is another command with the name “cordon“. cordon simply marks a node unschedulable. It does not move or terminate the pods to the existing node. It simply makes sure that new pods are not scheduled on that node.

Conclusion

In this blog, we discussed how we can upgrade the nodes in the Kubernetes cluster safely without affecting the users of an application. Happy Learning!

References

Written by 

DevOps Engineer at Knoldus. Loves to solve new problems. Always eager to learn new technologies.