Kubernetes Pod Scaling: How to Scale Down Pods

Introduction
Step-by-Step Guide
Code Example
1. Manually Scaling Down Pods in a Namespace
2. Using Horizontal Pod Autoscaler (HPA)
2.1. Creating an HPA based on CPU Utilization
2.2. Applying the HPA Configuration
3. Utilizing Lifecycle Hooks for Graceful Termination
3.1. Defining a preStop Hook in a Deployment
3.2. Setting terminationGracePeriodSeconds
Additional Notes
Summary
Conclusion
References

Introduction

In Kubernetes, managing the number of pods in a deployment is crucial for resource optimization and application stability. While Kubernetes offers automated scaling through Horizontal Pod Autoscaler (HPA), there are situations where you might need to manually scale down pods, even to zero. This article provides a quick way to scale down deployments in a specific namespace to zero replicas and delves into the concepts of scaling in Kubernetes, including HPA, manual scaling with kubectl, and the lifecycle of pods during scaling down.

Step-by-Step Guide

To manually scale down pods in a specific namespace to zero, you can use the following command:

kubectl get deploy -n <namespace> -o name | xargs -I % kubectl scale % --replicas=0 -n <namespace>

This command first retrieves all deployments in the specified namespace and then uses xargs to scale each deployment down to zero replicas.

Kubernetes offers a feature called Horizontal Pod Autoscaler (HPA) that can automatically adjust the number of pods in a deployment based on observed CPU utilization, memory usage, or other custom metrics.

HPA scales deployments horizontally, meaning it adds or removes pods to meet demand. This differs from vertical scaling, which involves adjusting the resources allocated to existing pods.

You can interact with your Kubernetes cluster using kubectl, a command-line interface. For instance, to scale a deployment named "my-deployment" to three replicas, you would execute:

kubectl scale deployment my-deployment --replicas=3

When HPA scales down a deployment, it terminates pods to reach the desired replica count. The pods are generally chosen randomly.

However, you can influence the termination process using lifecycle hooks like preStop and terminationGracePeriodSeconds. These allow you to define actions to be taken before a pod is terminated and specify a grace period for the pod to shut down gracefully.

Note that scaling down doesn't directly relate to deleting a pod. Scaling down reduces the number of replicas in a deployment, which might lead to pod termination. You can delete a pod without changing the deployment's replica count, and scaling down might not result in pod deletion if the current replica count matches the desired state.

Code Example

This document provides code examples for scaling pods in Kubernetes. It includes examples for manually scaling down pods in a namespace, using Horizontal Pod Autoscaler (HPA) based on CPU utilization, and utilizing lifecycle hooks for graceful termination. The examples cover creating and applying HPA configuration, defining preStop hooks in deployments, and setting terminationGracePeriodSeconds for pods. These examples demonstrate how to control the number of pod replicas, adjust scaling based on resource utilization, and ensure smooth shutdown of pods.

This document provides code examples illustrating various aspects of scaling pods in Kubernetes, including manual scaling, Horizontal Pod Autoscaler (HPA), and lifecycle hooks.

1. Manually Scaling Down Pods in a Namespace

#!/bin/bash

# Define the namespace
NAMESPACE="your-namespace"

# Scale down all deployments in the namespace to zero replicas
kubectl get deploy -n $NAMESPACE -o name | xargs -I % kubectl scale % --replicas=0 -n $NAMESPACE

echo "All deployments in namespace '$NAMESPACE' scaled down to zero replicas."

This script first defines the target namespace. Then, it retrieves all deployments within that namespace and pipes their names to xargs. Finally, xargs executes kubectl scale for each deployment, setting the replica count to zero.

2. Using Horizontal Pod Autoscaler (HPA)

2.1. Creating an HPA based on CPU Utilization

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-deployment-hpa
  namespace: your-namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

This YAML file defines an HPA named "my-deployment-hpa" targeting the deployment "my-deployment". It sets the minimum and maximum replicas to 1 and 10, respectively. The HPA will scale the deployment based on CPU utilization, aiming to maintain an average utilization of 50%.

2.2. Applying the HPA Configuration

kubectl apply -f my-deployment-hpa.yaml

This command applies the HPA configuration defined in the "my-deployment-hpa.yaml" file to your Kubernetes cluster.

3. Utilizing Lifecycle Hooks for Graceful Termination

3.1. Defining a `preStop` Hook in a Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  # ... other deployment configurations ...
  template:
    spec:
      containers:
      - name: my-container
        # ... other container configurations ...
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5 && echo 'Performing cleanup tasks...'"]

This snippet demonstrates adding a preStop hook to a container within a deployment. This hook executes a command that sleeps for 5 seconds and then prints a message. You can replace this with your desired cleanup logic.

3.2. Setting `terminationGracePeriodSeconds`

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  # ... other deployment configurations ...
  template:
    spec:
      terminationGracePeriodSeconds: 30

This example sets the terminationGracePeriodSeconds to 30 seconds for pods in the "my-deployment" deployment. This allows pods 30 seconds to complete any ongoing operations before being forcefully terminated.

These code examples provide a starting point for understanding and implementing pod scaling in Kubernetes. Remember to adapt these examples to your specific needs and environment.

Additional Notes

Scaling Down to Zero:

Use with caution: Scaling down a deployment to zero replicas means your application will become unavailable. Ensure this is the desired outcome before executing the command.
Alternatives to zero replicas: Consider using a replica count of 1 as a minimum to maintain a baseline level of availability, even with minimal traffic. This can help avoid cold starts when traffic picks up again.

Horizontal Pod Autoscaler (HPA):

Metrics: HPA can be configured with various metrics beyond CPU and memory, such as custom metrics exposed by your application. This allows for more fine-grained scaling based on application-specific indicators.
Scaling behavior: HPA includes parameters like stabilizationWindowSeconds and scalingPolicy to fine-tune the scaling behavior and prevent rapid oscillations in replica count.

Manual Scaling:

Granularity: Manual scaling offers precise control over the number of replicas. This is useful for specific scenarios like deployments, rollouts, or when automated scaling is not suitable.
Monitoring: When manually scaling, closely monitor your application's performance and resource utilization to ensure the desired state is maintained.

Lifecycle Hooks:

Customization: Lifecycle hooks provide a powerful mechanism to execute custom logic during pod termination. This can include tasks like deregistering from service discovery, flushing buffers, or gracefully closing connections.
Timeouts: Be mindful of the terminationGracePeriodSeconds when implementing lifecycle hooks. Ensure your cleanup logic completes within the grace period to avoid forceful termination.

General Considerations:

Resource limits: Always define resource requests and limits for your containers to ensure predictable resource allocation and prevent resource starvation.
Pod Disruption Budgets (PDB): Use PDBs to define how many replicas of your application can be unavailable during voluntary disruptions like scaling down or node draining. This helps maintain a minimum level of availability during maintenance operations.

Summary

This article provides a concise guide on manually and automatically scaling down pods in Kubernetes:

Manual Scaling:

Command: kubectl get deploy -n <namespace> -o name | xargs -I % kubectl scale % --replicas=0 -n <namespace> scales down all deployments in a specific namespace to zero replicas.
Explanation: This command first retrieves all deployments within the specified namespace and then leverages xargs to scale each deployment down.

Automatic Scaling with HPA:

Horizontal Pod Autoscaler (HPA): Automatically adjusts the number of pods in a deployment based on metrics like CPU utilization and memory usage.
Horizontal vs. Vertical Scaling: HPA performs horizontal scaling by adding or removing pods, unlike vertical scaling, which modifies resources allocated to existing pods.

Key Concepts:

kubectl: Command-line interface for interacting with Kubernetes clusters.
Lifecycle Hooks: preStop and terminationGracePeriodSeconds allow for controlled pod termination during scaling down.
Scaling vs. Deletion: Scaling down reduces replicas in a deployment, potentially leading to pod termination. It differs from directly deleting a pod, which doesn't affect the desired replica count.

This summary provides a quick overview of scaling down pods in Kubernetes, covering both manual and automatic approaches.

Conclusion

In conclusion, effectively managing the number of pods in your Kubernetes deployments is vital for achieving optimal resource utilization and ensuring your applications run smoothly. While Kubernetes offers powerful automated scaling capabilities through the Horizontal Pod Autoscaler, understanding how to manually scale your deployments provides you with greater control and flexibility. Whether you need to precisely adjust replica counts, troubleshoot scaling issues, or implement custom scaling logic, mastering the concepts and techniques outlined in this article will empower you to confidently manage your Kubernetes deployments and keep your applications running at peak performance.

References

Horizontal Pod Autoscaling | Kubernetes | In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand. Horizontal scaling means that the response to increased load is to deploy more Pods. This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.
Scale Up/Down Kubernetes Pods | Baeldung on Ops | A quick and practical guide to scaling Kubernetes pods.
How to Scale Kubernetes Pods with Kubectl Scale Deployment | kubectl is a command-line interface (CLI) that allows users to interact with their Kubernetes clusters.
Can we scale down Pods in Kubernetes HPA only when a validation ... | Posted by u/cloudgeek09 - 7 votes and 7 comments
GKE scaled-down nodes won't terminate - Discuss Kubernetes | Hello, (This is a GKE-specific question, if there’s a better forum please let me know). I have a cluster that can horizontally autoscale , I’ve used both regular and preemptible node pools and in both cases, after the cluster scales up properly due to CPU workload and then scales down after the workload is done, there’s always 1-3 nodes that stay alive “forever” (several days) with no pods on it, only systems resources, exactly these three: kube-proxy, metrics-server, metadata-agent, fluentd,...
Choosing which pods are scaled down : r/kubernetes | Posted by u/beskucnik_na_feru - 4 votes and 12 comments
Handling Long running request during HPA Scale-down - General ... | I am exploring HPA using custom pod metrics. HPA is able to scale-up and scale-down based on metrics exposed by the application. During the scale-down triggered by HPA, the pods are terminated randomly is metrics falls below the average targetvalue. How are long-running requests handled in the field during the scale-down? I know there exists prestop and terminationGracePeriodSeconds, but these are values that are pre-defined. If the long-running request exceeds the terminationGracePeriodSec...
Question, is it possible to scale down without deleting the pod ... | Posted by u/[Deleted Account] - 1 vote and 5 comments
kube-dns-autoscaler preventing GKE standard cluster to scale down | Asking for help? Comment out what you need so we can get more information to help you! Cluster information: Kubernetes version: 1.23.17-gke.1700 Cloud being used: gke Installation method: standard GKE cluster, not autopilot Host OS: Container-Optimized OS by Google The question Hi everyone, I need advice on GKE autoscaler not scalingDown a nodepool. I’ve got a bunch of noScaleDown events with their details and I was thinking about the better way to handle them. Specifically I can see th...