Hello readers, I’ll be covering about the details of what is horizontal pod autoscaling and how to configure horizontal pod autoscaling.
Horizontal Pod Autoscaling
In order to automatically scale the workload to meet demand, a HorizontalPodAutoscaler automatically modifies a workload resource such as a deployment, replica set or statefulset.
The HorizontalPodAutoscaler informs the workload resource (the Deployment, StatefulSet, or other similar resource) to scale back down if the demand drops and the number of Pods is more than the configured minimum.

some implementation steps to perform:
- Create a Kubernetes deployment
- Create a Kubernetes service
- Create the HPA
- Increase the Load
- Stop the Load
firstly you will have to install a tool called “metric server”.
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
metric server is a mechanism to collect metrics about your resources like CPU, memory usage, etc.

After installing, you get a YAML file named “components.YAML”
Open this file and in that file, you have to add “ –kubelet-insecure-tls “ [to skip the certification checks] inside the metric server resource

Create a Kubernetes deployment and service
Creating a kubernetes deployment and service manifest file(deployment-svc.yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
name: server
labels:
app: server
spec:
replicas: 1
selector:
matchLabels:
app: server
template:
metadata:
labels:
app: server
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
resources:
limits:
cpu: 100m
requests:
cpu: 50m
---
apiVersion: v1
kind: Service
metadata:
labels:
app: server
name: server
namespace: default
spec:
ports:
- name: server
port: 80
selector:
app: server
sessionAffinity: None
type: NodePort

apply this manifest file using command
kubectl apply -f deployment-svc.yml

Check the deployment and service by running Command
kubectl get deployments
kubectl get service
kubectl get pods

now, run the command
kubectl autoscale deployment server --cpu-percent=50 --min=1 --max=10

This command will autoscale the “server” deployment that we have created and the minimum number of pods is one and the maximum is 10 and the CPU % is 50% that means the HPA controller will increase and decrease the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.
check the status of the newly-made hpa by running command
kubectl get hpa

you can increase the load to see how the auto scaler reacts. for this, you’ll start a different Pod to act as a client. The container within the client Pod runs in an infinite loop, sending queries to the php-apache service.so type the command
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://server; done"
This command will create another pod based on busybox image and inside this image, and running this wget command on this nginx service and put a lot of load on it.
Open another terminal and type Command
kubectl get hpa server --watch
You can see that the cup usage went up and it keeps going up you can also check kubectl get deployment the replica count will also increase and it will increase up to 7 pod replicas.
To stop the monitoring simply do ctrl+c on the terminal where you type the increase load command.
second method
you can also configure hpa by manifest file
Finally, let’s configure our HorizontalPodAutoscaler matching server deployment in autoscaling/v1 API version.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: servers
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: server
minReplicas: 1
maxReplicas: 15
targetCPUUtilizationPercentage: 20

apply this manifest file
kubectl apply -f hpa.yaml
now check the status of hpa by typing command
kubectl get hpa

Let’s generate some web traffic that is directed to our web servers and then analyse the results.
We’ll utilise Hey, a simple web load generator, to generate load.
first , forward the port of the service
kubectl port-forward svc/server 5000:80

Run hey from your host terminal with the option -n 10000 to submit 10000 requests with two workers simultaneously

We may observe a significant rise in CPU and memory consumption.also we can see the numbers of replica in the status of hpa.

Reference:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
https://blog.knoldus.com/introduction-to-google-kubernetes/
Conclusion:
The blog explained how to configure horizontal pod autoscaling .I hope you enjoyed this practical instruction. Motivate yourself to configure horizontal pod autoscaling, and utilize them while looking up more examples on Google.