How to configure horizontal pod autoscaling

Bearded confident maintenance engineer in white shirt is working in database center
Reading Time: 4 minutes

Hello readers, I’ll be covering about the details of what is horizontal pod autoscaling and how to configure horizontal pod autoscaling.

Horizontal Pod Autoscaling

In order to automatically scale the workload to meet demand, a HorizontalPodAutoscaler automatically modifies a workload resource such as a deployment, replica set or statefulset.

The HorizontalPodAutoscaler informs the workload resource (the Deployment, StatefulSet, or other similar resource) to scale back down if the demand drops and the number of Pods is more than the configured minimum.

some implementation steps to perform:

  1. Create a Kubernetes deployment
  2. Create a Kubernetes service
  3. Create the HPA
  4. Increase the Load
  5. Stop the Load

firstly you will have to install a tool called “metric server”.


metric server is a mechanism to collect metrics about your resources like CPU, memory usage, etc.

After installing, you get a YAML file named “components.YAML” 

Open this file and in that file, you have to add “ –kubelet-insecure-tls “ [to skip the certification checks] inside the metric server resource

Create a Kubernetes deployment and service

Creating a kubernetes deployment and service manifest file(deployment-svc.yaml)

apiVersion: apps/v1
kind: Deployment
  name: server
    app: server
  replicas: 1
      app: server
        app: server
      - name: nginx
        image: nginx
        - containerPort: 80
            cpu: 100m
            cpu: 50m

apiVersion: v1
kind: Service
    app: server
  name: server
  namespace: default
  - name: server
    port: 80
    app: server
  sessionAffinity: None
  type: NodePort

apply this manifest file using command

kubectl apply -f deployment-svc.yml

Check the deployment and service by running Command

kubectl get deployments
kubectl get service
kubectl get pods

now, run the command

kubectl autoscale deployment server --cpu-percent=50 --min=1 --max=10

This command will autoscale the “server” deployment that we have created and the minimum number of pods is one and the maximum is 10 and the CPU % is 50% that means the HPA controller will increase and decrease the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.

check the status of the newly-made hpa by running command

kubectl get hpa

you can increase the load to see how the auto scaler reacts. for this, you’ll start a different Pod to act as a client. The container within the client Pod runs in an infinite loop, sending queries to the php-apache type the command

 kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://server; done"

This command will create another pod based on busybox image and inside this image, and running this wget command on this nginx service and put a lot of load on it.

Open another terminal and type Command

kubectl get hpa server --watch

You can see that the cup usage went up and it keeps going up you can also check kubectl get deployment the replica count will also increase and it will increase up to 7 pod replicas.

To stop the monitoring simply do ctrl+c on the terminal where you type the increase load command.

second method

you can also configure hpa by manifest file

Finally, let’s configure our HorizontalPodAutoscaler matching server deployment in autoscaling/v1 API version.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: servers
    apiVersion: apps/v1
    kind: Deployment
    name: server
  minReplicas: 1
  maxReplicas: 15
  targetCPUUtilizationPercentage: 20

apply this manifest file

kubectl apply -f hpa.yaml

now check the status of hpa by typing command

kubectl get hpa

Let’s generate some web traffic that is directed to our web servers and then analyse the results.
We’ll utilise Hey, a simple web load generator, to generate load.

first , forward the port of the service

kubectl port-forward svc/server 5000:80

Run hey from your host terminal with the option -n 10000 to submit 10000 requests with two workers simultaneously

We may observe a significant rise in CPU and memory consumption.also we can see the numbers of replica in the status of hpa.



The blog explained how to configure horizontal pod autoscaling .I hope you enjoyed this practical instruction. Motivate yourself to configure horizontal pod autoscaling, and utilize them while looking up more examples on Google.