When Kubernetes schedules a Pod, it’s important that the containers have enough resources to actually run. If you schedule a large application on a node with limited resources, it is possible for the node to run out of memory or CPU resources and for things to stop working!
Requests and Limits
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get.
If a container requests a resource, kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
It is important to remember that the limit can never be lower than the request. If you try this, Kubernetes will throw an error and won’t let you run the container.
Requests and limits are on a per-container basis. While Pods usually contain a single container, it’s common to see Pods with multiple containers as well. Each container in the Pod gets its own individual limit and request, but because Pods are always scheduled as a group, you need to add the limits and requests for each container together to get an aggregate value for the Pod.
There are two types of resources:
- CPU and Memory.
The Kubernetes scheduler uses these to figure out where to run your pods.
A typical Pod spec for resources might look something like this. This pod has two containers:
containers: - name: container1 image: busybox resources: requests: memory: "32Mi" cpu: "200m" limits: memory: "64Mi" cpu: "200m" - name: container2 image: busybox resources: requests: memory: "96Mi" cpu: "300m" limits: memory: "192Mi" cpu: "750m"
Each container in the Pod can set its own requests and limits, and these are all additive. So in the above example, the Pod has a total request of 500 m CPU and 128 MiB of memory, and a total limit of 1 CPU and 256MiB of memory.
CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”.
One thing to keep in mind about CPU requests is that if you put in a value larger than the core count of your biggest node, your pod will never be scheduled. Let’s say you have a pod that needs four cores, but your Kubernetes cluster is comprised of dual core VMs—your pod will never be scheduled!
Unless your app is specifically designed to take advantage of multiple cores (scientific computing and some databases come to mind), it is usually a best practice to keep the CPU request at ‘1’ or below, and run more replicas to scale it out. This gives the system more flexibility and reliability.
Memory resources are defined in bytes. Normally, you give a mebibytes value for memory (this is basically the same thing as a megabyte), but you can give anything from bytes.
Just like CPU, if you put in a memory request that is larger than the amount of memory on your nodes, the pod will never be scheduled.
Unlike CPU resources, memory cannot be compressed. Because there is no way to throttle memory usage, if a container goes past its memory limit it will be terminated. If your pod is managed by a Deployment, StatefulSet, DaemonSet, or another type of controller, then the controller spins up a replacement.
It is important to remember that you cannot set requests that are larger than resources provided by your nodes. For example, if you have a cluster of dual-core machines, a Pod with a request of 2.5 cores will never be scheduled! You can find the total resources for Kubernetes Engine VMs.
After creating Namespaces, you can lock them down using ResourceQuotas. ResourceQuotas are very powerful, but let’s just look at how you can use them to restrict CPU and Memory resource usage.
A Quota for resources might look something like this:
apiVersion: v1 kind: ResourceQuotas metadata: name: sample spec: hard: requests.cpu: 500m requests.memory: 100Mib limits.cpu: 700m limits.memory: 500Mib
Looking at this example, you can see there are four sections. Configuring each of these sections is optional.
maximum combined CPU requests in millicores for all the containers in the Namespace. In the above example, you can have 50 containers with 10m requests, five containers with 100m requests, or even one container with a 500m request. As long as the total requested CPU in the Namespace is less than 500m!
maximum combined Memory requests for all the containers in the Namespace. In the above example, you can have 50 containers with 2MiB requests, five containers with 20MiB CPU requests, or even a single container with a 100MiB request. As long as the total requested Memory in the Namespace is less than 100MiB!
maximum combined CPU limits for all the containers in the Namespace. It’s just like requests.cpu but for the limit.
maximum combined Memory limits for all containers in the Namespace. It’s just like requests.memory but for the limit.
If you are using a production and development Namespace (in contrast to a Namespace per team or service), a common pattern is to put no quota on the production Namespace and strict quotas on the development Namespace. This allows production to take all the resources it needs in case of a spike in traffic.
You can also create a LimitRange in your Namespace. Unlike a Quota, which looks at the Namespace as a whole, a LimitRange applies to an individual container. This can help prevent people from creating super tiny or super large containers inside the Namespace.
A LimitRange might look something like this:
apiVersion: v1 kind: LimitRange metadata: name: sample spec: limits: - default: cpu: 600m memory: 100Mib defaultRequest: cpu: 100Mib memory: 50Mib max: cpu: 1000m memory: 200Mib min: cpu: 10m memory: 100Mib type: Container
Looking at this example, you can see there are four sections. Again, setting each of these sections is optional
The default section sets up the default limits for a container in a pod. If you set these values in the limitRange, any containers that don’t explicitly set these themselves will get assigned the default values.
The defaultRequest section sets up the default requests for a container in a pod. If you set these values in the limitRange, any containers that don’t explicitly set these themselves will get assigned the default values.
The max section will set up the maximum limits that a container in a Pod can set. The default section cannot be higher than this value. Likewise, limits set on a container cannot be higher than this value. It is important to note that if this value is set and the default section is not, any containers that don’t explicitly set these values themselves will get assigned the max values as the limit.
The min section sets up the minimum Requests that a container in a Pod can set. The defaultRequest section cannot be lower than this value. Likewise, requests set on a container cannot be lower than this value either. It is important to note that if this value is set and the defaultRequest section is not, the min value becomes the defaultRequest value too.