explicitClick to confirm you are 18+

Creating Elastic Web Applications with Kubernetes Horizontal Pod Autoscaling

Zack WynneApr 11, 2022, 8:47:09 AM
thumb_up50thumb_downmore_vert

As resource requirements for our application change over time, we can encounter a few challenges when trying to provision an appropriate amount of CPU and memory. If we simply provision enough resources for our application to support the maximum expected capacity, we run the risk of wasting these resources during periods of low utilization. On the other hand, we also run the risk of the server failing to accommodate our user base if we under provision.

 

In order to solve this problem, we need the ability to automate the process of adding and removing compute resources to a workload in response to changing demand.

 

Enter the HorizontalPodAutoscaler

One of the major benefits of a container orchestration platform like Kubernetes is the ability to autoscale workloads based upon key metrics. There are many different features provided by Kubernetes in this category, although here we'll focus on Horizontal Pod Autoscaling (HPA)

 

HPA specifically refers to autoscaling by adding or removing Pods to/from a given Deployment. This can be in relation to hardware metrics like CPU and memory utilization, or even in relation to custom metrics defined by us and reported by our application. HPA relies upon the Kubernetes Metrics Server being installed, which serves as the source for container metrics. The autoscaler itself consists of a Kubernetes API resource (HorizontalPodAutoscaler) as well as a Controller.

 

In a typical scenario, HPA works something like this:

Local HPA Demo with k3d

For a quick hands-on demo, we'll make use of k3d. K3d allows us to easily create k3s (a lightweight Kubernetes distribution suitable for IoT, local development, etc.) clusters within Docker containers. In order to follow along, you can follow the official documentation for installing k3d on your chosen platform.

 

First, let's create a cluster:

$ k3d cluster create hpa-demo

 

This should proceed relatively quickly, and after k3d has finished provisioning our cluster we can ensure we're connected with this command:

$ kubectl get node
---
NAME                    STATUS   ROLES                  AGE     VERSION
k3d-hpa-demo-server-0   Ready    control-plane,master   2m22s   v1.22.7+k3s1

 

k3d automatically installs the metrics server. If you're using another distribution of Kubernetes that does not install the metrics server by default, you can install it by following the official installation instructions. We can confirm the metrics server is running with this command:

$ kubectl get pods -l k8s-app=metrics-server -n kube-system
---
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-ff9dbcb6c-6zrvc   1/1     Running   0          3m26s

 

We can also use the kubectl top command for tracking resource utilization of pods and nodes:

$ kubectl top node
---
NAME                    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k3d-hpa-demo-server-0   113m         2%     755Mi           9%

 

Now that we have the necessary infrastructure in place, let's create an example Deployment and Service.

$ kubectl create deployment nginx --image=nginx:stable-alpine --port=80 && \
  kubectl set resources deployment/nginx -c=nginx --limits=cpu=200m,memory=128Mi
---
deployment.apps/nginx created
deployment.apps/nginx resource requirements updated
$ kubectl create service clusterip nginx --tcp=80
---
service/nginx created
$ kubectl get deployment,service
---
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           2m4s

NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.43.0.1      <none>        443/TCP   32m
service/nginx        ClusterIP   10.43.53.169   <none>        80/TCP    61s

 

Once we have these resources in place, we can create an autoscaling policy:

$ kubectl autoscale deployment nginx --min=1 --max=5 --cpu-percent=60
---
horizontalpodautoscaler.autoscaling/nginx autoscaled

 

This policy tells the HPA controller to target a CPU utilization of 60% for each of our pods, while going no fewer than 1 replica and no greater than 5. We can check the status of our policy with:

$ kubectl get hpa
---
NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   0%/60%    1         5         1          2m59s

 

Let's then create another Deployment that will call our Service and generate some activity:

$ kubectl create deployment load-test --image=busybox --replicas 6 -- \
  /bin/sh -c 'while true; do wget -qO- http://nginx.default.svc; done'

 

After a minute or two of waiting, we should see HPA start to take action and add more pods to the Deployment:

$ kubectl get events | grep Rescale
---
15m         Normal    SuccessfulRescale        horizontalpodautoscaler/nginx     New size: 2; reason: cpu resource utilization (percentage of request) above target
$ kubectl get pods -l app=nginx
---
NAME                     READY   STATUS    RESTARTS   AGE
nginx-64d84d6958-w82mq   1/1     Running   0          31m
nginx-64d84d6958-hr222   1/1     Running   0          15m

 

Finally, let's remove the load-test Deployment:

$ kubectl delete deployment/load-test
---
deployment.apps "load-test" deleted

 

We should eventually see the Deployment scale back down:

$ kubectl get hpa nginx
---
NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   0%/60%    1         5         1          38m
$ kubectl get events | grep Rescale
---
27m         Normal    SuccessfulRescale        horizontalpodautoscaler/nginx     New size: 2; reason: cpu resource utilization (percentage of request) above target
4m9s        Normal    SuccessfulRescale        horizontalpodautoscaler/nginx     New size: 1; reason: All metrics below target
$ kubectl get pods -l app=nginx
---
NAME                     READY   STATUS    RESTARTS   AGE
nginx-64d84d6958-w82mq   1/1     Running   0          43m

 

Scaling on Custom Metrics with Prometheus Adapter

In addition to scaling on container resource metrics, we also have the option of scaling based on custom metrics. If you're already monitoring your infrastructure with Prometheus, you can utilize the Prometheus Adapter for exposing your Prometheus metrics as custom metrics for use with HPA policies. For the sake of brevity in this article, we'll assume that you already have Prometheus configured for your cluster. In order to install the Prometheus Adapter, you can follow the official instructions. Note that you can also use the Prometheus Adapter for reporting resource metrics in place of Kubernetes Metrics Server.

 

With both Prometheus and Prometheus Adapter added to our cluster, we have something like this:

As a practical example of how you might use this, at Minds we use PHP-FPM and have a custom metric defined for scaling based on the number of PHP processes being utilized relative to the maximum allowed processes we've specified per container. 

 

The exporter we use for PHP-FPM exposes two metrics of interest: phpfpm_active_processes and phpfpm_total_processes. We can use PromQL to query Prometheus for the current utilization of the processes with something like this:

(sum(phpfpm_active_processes) / sum(phpfpm_total_processes)) * 100

 

With this query, we can add a custom rule to Prometheus Adapter in order to execute this query on our behalf and present the result as a custom metric for use in our autoscaling policies (see the official documentation for more details on the rule configuration syntax for Prometheus Adapter):

rules:
- seriesQuery: 'phpfpm_total_processes'
  resources:
    template: '<<.Resource>>'
    overrides:
      kubernetes_namespace: { resource = 'namespace' }
      app_minds_io_name: { group = 'apps', resource = 'deployment' }
  name:
    matches: 'phpfpm_total_processes'
    as: 'engine_process_utilization'
  metricsQuery: '(sum(phpfpm_active_processes{<<.LabelMatchers>>}) by (<<.GroupBy>>) / sum(phpfpm_total_processes{<<.LabelMatchers>>}) by (<<.GroupBy>>)) * 100'

 

We can confirm that our custom metric is available for use by asking the custom metrics API:

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq '.resources[] | select(.name == "deployments.apps/engine_process_utilization")'
---
{
  "name": "deployments.apps/engine_process_utilization",
  "singularName": "",
  "namespaced": true,
  "kind": "MetricValueList",
  "verbs": [
    "get"
  ]
}

 

Now that we have a custom metric in place, we can use it in our HPA policies:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: engine_process_utilization # Custom metric name
      target:
        type: AverageValue
        averageValue: 50 # 50% utilization

 

Conclusion

With HPA in place, the service should now be more resilient in the face of heavy utilization as well as less likely to waste resources during times of lower utilization. HPA is a powerful tool, especially when used in combination with custom metrics that allow us to be very detailed about the conditions under which we autoscale. 

 

Of course, this is only a high-level overview of HPA and how it can be used with Prometheus Adapter. If this interests you, I encourage your to explore the official documentation for these projects that I've linked throughout the blog.