Learn how to fix the "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" error and successfully upgrade your system.
The error message "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" is a common issue encountered when working with Helm, Kubernetes' package manager. This error typically arises when a previous Helm operation is interrupted or fails to complete successfully, leaving the system in an inconsistent state. This guide provides a step-by-step approach to troubleshoot and resolve this error, ensuring smooth Helm operations in your Kubernetes environment.
The error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" typically occurs when a Helm operation is interrupted or doesn't complete successfully. Here's a breakdown of how to troubleshoot and resolve this issue:
Check Helm History:
helm history <RELEASE_NAME>
This command shows the revision history of your release. Look for any pending or failed operations.
Rollback to a Previous Revision: If a previous operation is stuck, try rolling back:
helm rollback <RELEASE_NAME> <REVISION_NUMBER>
Replace <REVISION_NUMBER>
with the desired revision from the helm history
output.
Verify and Resolve Underlying Issues:
Manual Cleanup (Use with Caution): If the above steps don't work, you might need to manually clean up resources. Proceed with extreme caution, as this can lead to data loss.
kubectl get pods
and other relevant kubectl
commands to find resources associated with the failed release.kubectl delete
.Prevent Future Occurrences:
If the issue persists, consult the documentation for your specific Helm version and Kubernetes distribution.
This bash script helps troubleshoot Helm upgrade failures. It first checks the Helm history for failed or pending operations and allows rolling back to a previous revision. If no issues are found in the history, it guides users to manually inspect their Kubernetes cluster for resource conflicts, network problems, or insufficient resources. As a last resort, it provides instructions for manually cleaning up stuck resources, emphasizing caution. The script concludes with tips on preventing similar issues in the future, such as ensuring stable network connectivity, providing sufficient cluster resources, and avoiding interrupting Helm processes.
This script demonstrates how to troubleshoot the "UPGRADE FAILED" error in Helm.
Note: This script is for illustrative purposes only. Always back up your data and proceed with caution, especially when manually deleting resources.
#!/bin/bash
# Set the release name
RELEASE_NAME="my-release"
# 1. Check Helm History
echo "Checking Helm history for release: $RELEASE_NAME"
helm history "$RELEASE_NAME"
# Prompt user to continue if a failed or pending operation is found
read -p "Do you see any failed or pending operations in the history? (y/n) " -n 1 -r
echo # (move to next line)
if [[ $REPLY =~ ^[Yy]$ ]]
then
# 2. Rollback to a Previous Revision (User Input Required)
read -p "Enter the revision number to rollback to: " REVISION_NUMBER
echo "Rolling back $RELEASE_NAME to revision $REVISION_NUMBER..."
helm rollback "$RELEASE_NAME" "$REVISION_NUMBER"
# Check if rollback was successful
if [[ $? -eq 0 ]]; then
echo "Rollback successful. Check your application."
else
echo "Rollback failed. Proceeding to further troubleshooting."
fi
else
echo "No failed or pending operations found in history. Proceeding to resource checks."
fi
# 3. Verify and Resolve Underlying Issues (Manual Inspection Required)
echo "
# --- Manual Inspection Required ---
# Check for:
# - Resource Conflicts: kubectl get pods, kubectl get events
# - Network Problems: Connectivity between your machine and the cluster
# - Insufficient Resources: kubectl describe nodes, kubectl top nodes
# ---
"
# 4. Manual Cleanup (Use with Extreme Caution - User Input Required)
read -p "Have you identified and resolved any underlying issues? (y/n) " -n 1 -r
echo # (move to next line)
if [[ $REPLY =~ ^[Nn]$ ]]
then
echo "
# --- Manual Cleanup (Proceed with Extreme Caution) ---
# 1. Identify Stuck Resources:
# - kubectl get pods -l 'app.kubernetes.io/instance=<RELEASE_NAME>'
# - kubectl get services -l 'app.kubernetes.io/instance=<RELEASE_NAME>'
# - ... other relevant kubectl commands ...
# 2. Delete Stuck Resources (Carefully!):
# - kubectl delete <resource_type> <resource_name>
# ---
"
fi
# 5. Prevent Future Occurrences
echo "
# --- Prevent Future Occurrences ---
# - Ensure stable network connectivity during Helm operations.
# - Provide sufficient resources for your cluster.
# - Avoid interrupting Helm processes.
# ---
"
Explanation:
helm history
.This script provides a starting point for troubleshooting the "UPGRADE FAILED" error. Remember to adapt it to your specific environment and use caution when performing any manual operations.
helmfile
or CI/CD pipelines to manage sequential deployments.helm list --uninstalled
: In some cases, releases marked as uninstalled might still hold locks. Use helm list --uninstalled
to check for such releases and remove them with helm delete --purge <RELEASE_NAME>
.kubectl get events
) for more context on why the Helm operation might have failed. Events often provide valuable clues about resource conflicts, pod failures, or other issues.--debug
flag with Helm commands (e.g., helm upgrade --debug
) to get more verbose output, which can help pinpoint the root cause of the problem.helm delete --purge <RELEASE_NAME>
to remove the release from Helm's history, even if the resources are already deleted.This table summarizes how to troubleshoot the Helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress":
Step | Description | Command | Caution |
---|---|---|---|
1. Check Helm History | Identify pending or failed operations for the release. | helm history <RELEASE_NAME> |
|
2. Rollback to Previous Revision | Revert to a working state if a previous operation is stuck. | helm rollback <RELEASE_NAME> <REVISION_NUMBER> |
|
3. Verify and Resolve Underlying Issues | Investigate and address potential root causes. | ||
* Resource Conflicts | Check for conflicting resources in the cluster. | kubectl get ... |
|
* Network Problems | Ensure stable network connectivity to the cluster. | ||
* Insufficient Resources | Verify adequate cluster resources (CPU, memory). | ||
4. Manual Cleanup | Delete stuck resources associated with the failed release. | kubectl delete ... |
Proceed with extreme caution! Data loss possible. |
5. Prevent Future Occurrences | Take preventative measures to avoid similar errors. | ||
* Stable Network | Maintain a reliable network connection during operations. | ||
* Sufficient Resources | Ensure the cluster has enough resources. | ||
* Avoid Interruptions | Avoid interrupting Helm processes. |
Note: If the issue persists, consult the documentation for your specific Helm version and Kubernetes distribution.
In conclusion, encountering the "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" error in Helm can be disruptive but is usually resolvable. By systematically checking Helm history, attempting rollbacks, and verifying your Kubernetes cluster's state, you can often pinpoint the issue. Remember to proceed with caution, especially when manually manipulating resources. Prioritizing a stable network, sufficient resources, and uninterrupted Helm operations will minimize the likelihood of encountering this error in the future. If problems persist, leverage the wealth of knowledge available in the Kubernetes and Helm communities and their respective documentation.
After a few weeks I finally decided to make the move to 22.12... All apps seem to be running and working, however I had to do a reboot at one point and I can't figure out why (when I change the specifications/settings) I get this error all the time:
[EFAULT] Failed to...