Kubernetes Pods Stuck Terminating: Troubleshooting Guide

Introduction
Step-by-Step Guide
Code Example
Additional Notes
Summary
Conclusion
References

Introduction

Troubleshooting a Kubernetes pod stuck in "Terminating" status can be tricky. This guide provides a step-by-step approach to identify and resolve the issue. We'll use various kubectl commands to inspect the pod's status, events, and logs, and explore potential solutions.

Step-by-Step Guide

To troubleshoot a Kubernetes pod stuck in "Terminating" status:

Verify the pod status:

kubectl get pods <pod-name> -n <namespace>

Check the pod's events:
```
kubectl describe pod <pod-name> -n <namespace>
```
Look for any error messages or warnings related to termination.
Inspect the kubelet logs on the node where the pod is running:
```
journalctl -u kubelet -f
```
This might reveal issues with the container runtime or network.
Check for processes running inside the pod:
```
kubectl exec -it <pod-name> -n <namespace> -- ps aux
```
A process might be preventing the pod from terminating gracefully.
Force delete the pod:
```
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force
```
Caution: This bypasses the graceful termination process and can lead to data loss.

If the issue persists:

Investigate network connectivity issues between the node and the Kubernetes control plane.
Check for resource constraints on the node, such as CPU, memory, or disk space.
Examine the pod's lifecycle hooks for potential issues.
Consult the Kubernetes documentation and community forums for specific error messages or scenarios.

Code Example

This script helps troubleshoot Kubernetes pods stuck in "Terminating" status. It verifies pod status, checks events and kubelet logs, inspects processes inside the pod, and provides an option for force deletion as a last resort. Remember to replace placeholders with actual values and exercise caution with force deletion. This script is a starting point and further debugging might be needed based on specific scenarios.

This script demonstrates how to troubleshoot a Kubernetes pod stuck in "Terminating" status.

#!/bin/bash

# Set the pod name and namespace
POD_NAME="your-pod-name"
NAMESPACE="your-namespace"

# 1. Verify the pod status
kubectl get pods "$POD_NAME" -n "$NAMESPACE"

# 2. Check the pod's events
kubectl describe pod "$POD_NAME" -n "$NAMESPACE"

# 3. Inspect the kubelet logs on the node where the pod is running
# Replace <node-name> with the actual node name
NODE_NAME=$(kubectl get pod "$POD_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.nodeName}')
ssh <node-name> "journalctl -u kubelet -f"

# 4. Check for processes running inside the pod
kubectl exec -it "$POD_NAME" -n "$NAMESPACE" -- ps aux

# 5. Force delete the pod (use with caution!)
kubectl delete pod "$POD_NAME" -n "$NAMESPACE" --grace-period=0 --force

# If the issue persists:

# - Investigate network connectivity issues between the node and the Kubernetes control plane.
# - Check for resource constraints on the node, such as CPU, memory, or disk space.
# - Examine the pod's lifecycle hooks for potential issues.
# - Consult the Kubernetes documentation and community forums for specific error messages or scenarios.

Before running the script:

Replace your-pod-name and your-namespace with the actual pod name and namespace.
Replace <node-name> with the actual node name where the pod is running.
Use the force delete option (kubectl delete ... --grace-period=0 --force) with extreme caution. It bypasses the graceful termination process and can lead to data loss.

This script provides a starting point for troubleshooting. You might need to adapt it based on your specific environment and the error messages you encounter.

Additional Notes

This section provides additional context and considerations for the troubleshooting steps outlined in the main article:

General Considerations:

Timeouts: Kubernetes has default timeouts for pod termination. If a pod exceeds this timeout, it will remain in "Terminating" status. The default grace period is 30 seconds, but this can be configured.
Underlying Issues: A "Terminating" pod is often a symptom of a larger issue. Don't just focus on killing the pod; investigate the root cause to prevent recurrence.
Control Plane Health: Ensure the Kubernetes control plane (API server, etcd, etc.) is healthy and reachable from the affected node.

Specific Notes for Each Step:

Verify Pod Status:
- The kubectl get pods command can also show the pod's status reason (e.g., "Evicted", "NodeLost", etc.), providing clues about the termination issue.
Check Pod's Events:
- Events provide a timeline of what happened to the pod. Look for events related to container failures, image pulls, resource limits, etc.
- The --show-all flag for kubectl describe can reveal older events that might be relevant.
Inspect Kubelet Logs:
- Kubelet is responsible for managing containers on a node. Its logs can reveal issues with container runtime (Docker, containerd), network, or resource constraints.
- Use journalctl on systemd-based systems or check the appropriate log file location for your system.
Check Processes Inside the Pod:
- Identify any processes that might be preventing the pod from exiting gracefully. This could be a process stuck in a loop, waiting for a resource, or ignoring SIGTERM signals.
- Use kubectl logs to check application logs for errors or warnings related to shutdown.
Force Delete the Pod:
- Use this as a last resort! Force deletion can lead to data loss and inconsistencies if the application is not designed for it.
- After force deletion, monitor the node for any orphaned containers or resources.

Additional Tools and Techniques:

Network Troubleshooting: Use tools like ping, traceroute, nslookup, and tcpdump to diagnose network connectivity issues between the node and the control plane or other services.
Resource Monitoring: Monitor node resources (CPU, memory, disk) using tools like top, free, df, or Kubernetes-specific monitoring solutions.
Debugging Pods: Use kubectl debug to launch an ephemeral container in the pod's namespace for interactive troubleshooting.

Remember:

Consult the Kubernetes documentation and community forums for specific error messages or scenarios.
Document your findings and resolution steps for future reference.

Summary

Step	Description	Command
1. Verify Pod Status	Confirm the pod is indeed stuck in "Terminating" status.	`kubectl get pods <pod-name> -n <namespace>`
2. Check Pod Events	Look for error messages or warnings related to termination in the pod's events.	`kubectl describe pod <pod-name> -n <namespace>`
3. Inspect Kubelet Logs	Examine kubelet logs on the pod's node for container runtime or network issues.	`journalctl -u kubelet -f`
4. Check for Running Processes	Identify any processes inside the pod that might be preventing graceful termination.	`kubectl exec -it <pod-name> -n <namespace> -- ps aux`
5. Force Delete Pod (Caution!)	Bypass graceful termination and forcefully delete the pod. Warning: Potential data loss!	`kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force`

Persistent Issues:

Investigate network connectivity between the node and Kubernetes control plane.
Check for resource constraints (CPU, memory, disk space) on the node.
Examine the pod's lifecycle hooks for potential issues.
Consult Kubernetes documentation and community forums for specific error messages or scenarios.

Conclusion

By following these troubleshooting steps, you can identify the root cause of a "Terminating" pod and resolve the issue. Remember to investigate potential network problems, resource constraints, and lifecycle hook issues. If the problem persists, consult the Kubernetes documentation and community forums for help. Always exercise caution when using the force delete option, as it can lead to data loss. Understanding the intricacies of Kubernetes pod lifecycle and employing systematic troubleshooting techniques are crucial for maintaining a healthy and efficient Kubernetes cluster.

References

Kubernetes Pods Stuck in Terminating: A Resolution Guide | In this article, you will discover the common causes of the “Kubernetes Pod Stuck In Termination” problem, the commands to troubleshoot it, and the best practices to fix it and avoid it in the future.
azure aks - Kubernetes Pods stuck with in 'Terminating' state - Stack ... | May 1, 2019 ... Kubernetes Pods are stuck with a STATUS of Terminating after the Deployment (and Service) related to the Pods were deleted. Currently they have been in this ...
Pod Stuck In Terminating State – Runbooks | This runbook matches if pods have been deleted and remain in a Terminated state for a long time, or a time longer than is expected.
Kubernetes Pods Stuck in Terminating: A Resolution Guide - DEV ... | Do you know why a Pod takes too much time to get deleted or even hangs on the Terminating...
[Help] Pod is stuck in terminating state, even though containers are ... | Posted by u/NxtCoder - 3 votes and 15 comments
How to Fix "Pods stuck in Terminating status" Error - DEV Community | Kubernetes, a powerful container orchestration platform, has revolutionized the deployment and...
Pods stuck in terminating state · Issue #1369 · actions/actions-runner ... | We experience that some runner pods are stuck in terminating state. The pod is still registered as a runner in github. No change if the runner is deleted forcefully in gha api. The actions-controll...
Pods stuck in Terminating status. When Kubernetes pods get stuck ... | When Kubernetes pods get stuck in the “Terminating” status, it can be frustrating and disruptive. This issue often arises when the…
Kubernetes pod stuck in terminating state | Contributor: Bilal Ahmad