Auto-Delete Completed Kubernetes CronJob Jobs

Introduction
Step-by-Step Guide
Code Example
2. Configuring History Limits in a CronJob:
3. Custom Cleanup Solution with a CronJob:
Additional Notes
Summary
Conclusion
References

Introduction

Managing completed Kubernetes Jobs is crucial for maintaining a clean and efficient cluster. This article outlines three methods to automatically clean up finished Jobs, preventing resource accumulation and improving your cluster's health.

Step-by-Step Guide

Use ttlAfterFinished: This is the recommended way to automatically remove completed Jobs. Add the ttlAfterFinished field to your Job spec, specifying the duration after which the Job should be deleted.
```
apiVersion: batch/v1
kind: Job
spec:
  ttlAfterFinished: 1h  # Delete the Job 1 hour after completion
```

Configure successfulJobsHistoryLimit and failedJobsHistoryLimit in CronJobs: These settings in your CronJob spec control how many completed and failed Jobs are kept.

apiVersion: batch/v1
kind: CronJob
spec:
  successfulJobsHistoryLimit: 3  # Keep the last 3 successful Jobs
  failedJobsHistoryLimit: 1     # Keep the last failed Job

Implement a custom cleanup solution: For more complex scenarios, you can create a separate CronJob that periodically lists and deletes old Jobs based on your criteria. You can use kubectl or client libraries to interact with the Kubernetes API.
```
kubectl delete jobs --field-selector=status.succeeded=1 --older-than=24h
```

Remember that deleting a Job also deletes its associated Pods. Choose the method that best suits your needs and always test your cleanup strategy thoroughly.

Code Example

The provided code snippets demonstrate how to manage the lifecycle of Kubernetes Jobs and CronJobs. The first example shows how to automatically delete a Job one hour after it completes using the ttlAfterFinished field. The second example illustrates how to configure history limits for CronJobs, specifying how many successful and failed Job executions should be retained. The final example presents a custom cleanup solution using a CronJob to periodically delete old Jobs based on their age and status.

apiVersion: batch/v1
kind: Job
metadata:
  name: example-job
spec:
  ttlAfterFinished: 1h # Delete the Job 1 hour after completion
  template:
    spec:
      containers:
      - name: my-container
        image: busybox
        command: ["sleep", "30"]
      restartPolicy: Never

This example defines a Job named "example-job" that runs a simple "sleep" command for 30 seconds. The ttlAfterFinished field is set to 1h, meaning the Job will be automatically deleted one hour after it completes successfully.

2. Configuring History Limits in a CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: example-cronjob
spec:
  schedule: "*/5 * * * *" # Run every 5 minutes
  successfulJobsHistoryLimit: 3  # Keep the last 3 successful Jobs
  failedJobsHistoryLimit: 1     # Keep the last failed Job
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: my-container
            image: busybox
            command: ["echo", "Hello from CronJob"]
          restartPolicy: OnFailure

This example defines a CronJob named "example-cronjob" that runs a simple "echo" command every 5 minutes. The successfulJobsHistoryLimit is set to 3, meaning only the last 3 successful Job executions will be kept. Similarly, failedJobsHistoryLimit is set to 1, keeping only the last failed Job execution.

3. Custom Cleanup Solution with a CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: job-cleanup
spec:
  schedule: "0 0 * * *" # Run daily at midnight
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleanup-job
            image: bitnami/kubectl:latest
            command:
            - /bin/bash
            - -c
            - |
              kubectl delete jobs --field-selector=status.succeeded=1 --older-than=24h
          restartPolicy: OnFailure

This example defines a CronJob named "job-cleanup" that runs daily at midnight. It uses a kubectl command to delete all Jobs that have succeeded and are older than 24 hours.

Remember:

Replace placeholders like <image-name> with your actual values.
Adapt the code examples to your specific needs and environment.
Thoroughly test your cleanup strategy before deploying it to production.

Additional Notes

Resource Limits: While the methods described help manage completed Jobs, consider setting resource limits (CPU, memory) for your Jobs and CronJobs to prevent resource exhaustion in case of unexpected behavior.
Monitoring: Monitor your cluster's resource usage and Job history to ensure your cleanup strategy is effective. Tools like Prometheus and Grafana can help visualize this data.
Alternatives to kubectl delete: For production environments, consider using the Kubernetes API directly or tools like Kustomize or Helm to manage Job cleanup more robustly.
Job Retention Policies: For auditing or debugging purposes, you might need to retain Jobs longer than your default cleanup settings. Implement a separate strategy for archiving or exporting important Job logs and data.
Namespace Considerations: Apply cleanup strategies at the appropriate namespace level. You might have different cleanup needs for different namespaces based on their purpose and workload.
Security Context: When using a custom cleanup CronJob, ensure the Pod has the necessary permissions to list and delete Jobs in the target namespaces. Define appropriate Service Accounts and Role-Based Access Control (RBAC) rules.
Testing: Always test your cleanup strategy in a non-production environment to avoid accidental deletion of critical Jobs. Simulate different scenarios, including successful and failed Job executions, to validate its behavior.

Summary

This table summarizes different methods for automatically cleaning up completed Kubernetes Jobs:

Method	Description	Configuration	Granularity	Complexity
`ttlAfterFinished`	Automatically deletes a Job after a specified duration post-completion.	Set the `ttlAfterFinished` field in the Job spec (e.g., `ttlAfterFinished: 1h`).	Per Job	Simple
CronJob History Limits	Limits the number of successful and failed Job history entries retained for CronJobs.	Configure `successfulJobsHistoryLimit` and `failedJobsHistoryLimit` in the CronJob spec.	Per CronJob	Easy
Custom Cleanup Solution	Provides flexibility to define custom criteria for deleting old Jobs.	Create a separate CronJob that uses kubectl commands or client libraries to list and delete Jobs.	Highly customizable	More complex

Note: Deleting a Job also deletes its associated Pods. Choose the method that best suits your needs and test your cleanup strategy thoroughly.

Conclusion

By implementing these strategies, you can ensure efficient resource utilization, prevent clutter, and maintain a healthy and well-organized Kubernetes environment for your applications. Remember to tailor the chosen method to your specific requirements and always test your implementation thoroughly in a non-production environment before deploying it to production.

References

CronJob pods is not getting clean-up when Job is completed · Issue ... | CronJob pods is not getting clean-up automatically when Job is completed.
Automatic Cleanup for Finished Jobs | Kubernetes | A time-to-live mechanism to clean up old Jobs that have finished execution.
How To Automatically Remove Completed Kubernetes Jobs From ... | Learn how to automatically remove completed Kubernetes jobs from CronJobs, keeping your cluster clean and efficient. This guide covers practical steps and best practices.
Kubernetes Job Cleanup - Stack Overflow | Apr 3, 2016 ... Stack Overflow Jobs is expanding to more countries. Linked. 176 · How to automatically remove completed Kubernetes Jobs created by a CronJob? 7.
(CronJob) PODs are deleted immediately when a job fails - General ... | I have a question regarding the CronJob resource. In the spec I have set both successfulJobsHistoryLimit and failedJobsHistoryLimit to 3 and backoffLimit=0 to basically always keep history of minimum last 3 jobs and not retry a job if it fails for any reason (e.g. OOM). This works, but what I am seeing is that the PODs are terminated (deleted) immediately as they fail even though the jobs are kept according to the history limits. According to the Job docs: “When a Job completes, no more Pods a...
Automatically Remove Completed Kubernetes Jobs | Baeldung on ... | Explore different methods to clean up Kubernetes Job resources to prevent cluster clutter.
ROADSWEEPER - Simple cronjob to clean up completed jobs and ... | Posted by u/cminion2030 - 5 votes and 4 comments
Kubernetes Cron Job Recipes | Feb 1, 2024 ... CronJobs are meant to automatically create ... Kubernetes doesn't remove Jobs and their Pods immediately after their run completes.
Restart / delete completed pods : r/kubernetes | Posted by u/s3rgb - 2 votes and 5 comments