One of the star features of the Kubernetes world is, without a doubt, the Horizontal Pod Autoscaling. With it, we can benefit from a superior performance capacity of our service at specific moments, where the original dimensioning of our cluster may not be sufficient, avoiding degradation or even loss of service.
Kubernetes offers us out-of-the-box autoscaled by resource consumption, such as CPU and Memory. Although a temporary load situation in the service can increase the consumption of these resources in our pods and therefore trigger our autoscaling, sometimes and more in relation to memory autoscaling, it may not be of much help and more in Java applications how the JVM manages memory. For these reasons and from the experience with customers, sometimes horizontal autoscaling is better suited for other types of metrics, such as monitoring the http pool of the application server, since normally when the system begins to stress, it begins to manifest itself by saturation of the application server pools or the pool of connections to Elasticsearch, in the case of Liferay Portal / DXP.
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
The last thing we'll need to do is configure the HorizontalPodAutoscaler to link our Liferay Portal / DXP Deployment to it in order to scale when the metric reaches a desired threshold (and do scaleDown when it decreases):
- type: Pods
With the previous manifest, we'll be using our "tomcat_threadpool_threadcount_avg" metric exported through the custom.metrics endpoint and when it reaches the value of 100 (half the maximum value, 200 for the default pool in Tomcat), Kubernetes will increase the number of Deployment replicas up to a maximum of 5.
Since Kubernetes version 1.18, it is possible to configure the behavior of the HPA in terms of scaleUp, scaleDown and the stabilization windows to perform them:
- type: Percent
- type: Percent
- type: Pods
With the above configuration, downscaling is performed with a stabilization window of 5 minutes. Only one policy is configured to perform downscale which allows 100% of the additional replicas to be deleted.
A stabilization window (0) is not configured for scaling. When the metric reaches the threshold, the number of replicas is increased immediately. In policies, 2 policies are configured with which 2 pods or 100% of the replicas that are currently running will be added every 15 seconds until the HPA reaches its stable state again.
Executing kubectl describe hpa -n = liferay we can check the status of our new HPA:
kubectl describe hpa -n = liferay
With JMeter, we will inject load into our Liferay Portal/DXP cluster to test autoscaling. We can use our Prometheus instance to monitor the threadpool of our pods while injecting load:
Once the threshold is reached, we see how the number of Liferay Portal / DXP instances increases:
After 5 minutes of stopping the load, we can check how the additional instances are deleted until leaving our Liferay Portal / DXP cluster with our 2 nodes at least:
Autoscaling is a valuable functionality on a Kubernetes infrastructure, but this functionality has to be adapted to the use of each project. The decision of scaling by one metric or another has to be analyzed to check that it fits in how we use Liferay Portal / DXP and later, with the help of performance tests, adjust the scaleUp and scaleDown behaviors, as well as the thresholds, of the best way to benefit our solution built on Liferay Portal / DXP and Kubernetes.