How to Automatically Delete S3 Objects Older Than 30 Days Using Boto: Fixing Expiration Issues When Objects Persist
Amazon S3 (Simple Storage Service) is a cornerstone of cloud storage, but unmanaged object accumulation can lead to ballooning costs, compliance risks, and storage inefficiencies. While AWS offers S3 Lifecycle Policies to automate object expiration, many users encounter scenarios where objects persist beyond their intended lifespan—even with policies in place. Common culprits include versioning, object locks, misconfigured rules, or incomplete multipart uploads.
In this blog, we’ll explore how to programmatically delete S3 objects older than 30 days using Boto3 (AWS’s Python SDK), addressing these "stubborn object" issues head-on. We’ll walk through scripting, troubleshooting, and automating the process to ensure your S3 buckets stay lean and cost-effective.
Table of Contents#
- Understanding S3 Object Expiration: Why Policies Might Fail
- Prerequisites
- Step-by-Step Guide: Deleting Old S3 Objects with Boto3
- Troubleshooting Persistent Objects: Common Issues & Fixes
- Automating the Process: Scheduling with Lambda or Cron
- Best Practices
- Conclusion
- References
Understanding S3 Object Expiration: Why Policies Might Fail#
S3 Lifecycle Policies are designed to automate actions like transitioning objects to cheaper storage classes or deleting them after a set period. However, they often fall short in real-world scenarios:
- Versioning: If versioning is enabled, Lifecycle Policies only delete current versions by default. Non-current versions (old revisions) persist unless explicitly targeted.
- Object Lock: Objects with legal holds or retention periods (via S3 Object Lock) cannot be deleted until the lock expires.
- Incomplete Multipart Uploads: Lifecycle Policies for objects don’t apply to incomplete multipart uploads, which linger and consume storage.
- Misconfiguration: Typos in prefixes, incorrect time units (e.g., "days" vs. "years"), or overlapping rules can render policies ineffective.
- Replication Delays: If objects are replicated to another bucket, deletions on the source may not sync immediately with the replica.
For these cases, a Boto3 script offers granular control to target and delete stubborn objects.
Prerequisites#
Before diving in, ensure you have the following:
- AWS Account: With access to an S3 bucket (test with a non-production bucket first!).
- IAM Permissions: An IAM user/role with:
s3:ListBucket(to list objects)s3:ListBucketVersions(if versioning is enabled)s3:DeleteObject(to delete current versions)s3:DeleteObjectVersion(to delete non-current versions)
- Python & Boto3: Python 3.x installed, plus Boto3 (AWS SDK for Python):
pip install boto3 - AWS Credentials: Configured via
~/.aws/credentialsfile, environment variables, or IAM roles (for Lambda).
Step-by-Step Guide: Deleting Old S3 Objects with Boto3#
We’ll create a Python script to:
- List objects in an S3 bucket (including versions, if enabled).
- Filter objects older than 30 days.
- Delete eligible objects.
Step 1: Set Up Boto3 and Initialize the S3 Client#
First, import Boto3 and initialize an S3 client. Boto3 uses your AWS credentials (from ~/.aws/credentials or environment variables) to authenticate.
import boto3
from datetime import datetime, timedelta
# Initialize S3 client
s3 = boto3.client('s3') Step 2: Define Parameters#
Specify your bucket name and the age threshold (30 days):
BUCKET_NAME = 'your-bucket-name' # Replace with your bucket
DAYS_THRESHOLD = 30 # Delete objects older than 30 days Step 3: List Objects and Filter by Age#
S3 returns objects in pages (max 1000 per request), so we’ll handle pagination. We’ll also check the LastModified timestamp to filter old objects.
Case 1: Non-Versioned Bucket#
For buckets without versioning, use list_objects_v2 to list current objects:
def get_old_objects(bucket, days_threshold):
"""List objects in bucket older than days_threshold."""
old_objects = []
# Calculate cutoff date (30 days ago from now)
cutoff_date = datetime.now().astimezone() - timedelta(days=days_threshold)
# List objects with pagination
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket):
if 'Contents' not in page:
continue # No objects in bucket
for obj in page['Contents']:
# Check if object is older than cutoff
if obj['LastModified'] < cutoff_date:
old_objects.append({
'Key': obj['Key'],
'LastModified': obj['LastModified']
})
return old_objects
# Get old objects
old_objects = get_old_objects(BUCKET_NAME, DAYS_THRESHOLD)
print(f"Found {len(old_objects)} old objects to delete.") Case 2: Versioned Bucket#
If versioning is enabled, you must delete both current and non-current versions. Use list_object_versions to list all versions:
def get_old_versions(bucket, days_threshold):
"""List versions (current + non-current) older than days_threshold."""
old_versions = []
cutoff_date = datetime.now().astimezone() - timedelta(days=days_threshold)
paginator = s3.get_paginator('list_object_versions')
for page in paginator.paginate(Bucket=bucket):
# Handle current versions (Versions) and delete markers (DeleteMarkers)
for version in page.get('Versions', []) + page.get('DeleteMarkers', []):
if version['LastModified'] < cutoff_date:
old_versions.append({
'Key': version['Key'],
'VersionId': version['VersionId'],
'IsDeleteMarker': version.get('IsDeleteMarker', False),
'LastModified': version['LastModified']
})
return old_versions
# Get old versions (for versioned buckets)
old_versions = get_old_versions(BUCKET_NAME, DAYS_THRESHOLD)
print(f"Found {len(old_versions)} old versions to delete.") Step 4: Delete Old Objects/Versions#
Use delete_object (for current objects) or delete_object with VersionId (for versions) to remove old entries.
Deleting Non-Versioned Objects#
def delete_objects(bucket, objects):
"""Delete list of objects (non-versioned)."""
if not objects:
print("No old objects to delete.")
return
# Delete objects in batches (max 1000 per request)
for i in range(0, len(objects), 1000):
batch = objects[i:i+1000]
delete_request = {'Objects': [{'Key': obj['Key']} for obj in batch]}
response = s3.delete_objects(Bucket=bucket, Delete=delete_request)
if 'Deleted' in response:
print(f"Deleted {len(response['Deleted'])} objects.")
# Delete old objects
delete_objects(BUCKET_NAME, old_objects) Deleting Versioned Objects#
For versions, include VersionId in the delete request:
def delete_versions(bucket, versions):
"""Delete list of versions (current + non-current)."""
if not versions:
print("No old versions to delete.")
return
for i in range(0, len(versions), 1000):
batch = versions[i:i+1000]
delete_request = {
'Objects': [
{'Key': ver['Key'], 'VersionId': ver['VersionId']}
for ver in batch
]
}
response = s3.delete_objects(Bucket=bucket, Delete=delete_request)
if 'Deleted' in response:
print(f"Deleted {len(response['Deleted'])} versions.")
# Delete old versions
delete_versions(BUCKET_NAME, old_versions) Full Script (Combined)#
Here’s the complete script, with a flag to toggle versioning handling:
import boto3
from datetime import datetime, timedelta
BUCKET_NAME = 'your-bucket-name'
DAYS_THRESHOLD = 30
HANDLE_VERSIONING = True # Set to False for non-versioned buckets
def get_old_objects(bucket, days_threshold):
# [Same as Case 1 above]
def get_old_versions(bucket, days_threshold):
# [Same as Case 2 above]
def delete_objects(bucket, objects):
# [Same as non-versioned delete above]
def delete_versions(bucket, versions):
# [Same as versioned delete above]
if __name__ == "__main__":
if HANDLE_VERSIONING:
old_entries = get_old_versions(BUCKET_NAME, DAYS_THRESHOLD)
delete_versions(BUCKET_NAME, old_entries)
else:
old_entries = get_old_objects(BUCKET_NAME, DAYS_THRESHOLD)
delete_objects(BUCKET_NAME, old_entries)
print("Cleanup complete!") Troubleshooting Persistent Objects: Common Issues & Fixes#
Even with the script, objects may persist. Here’s how to diagnose:
Issue 1: Versioning Enabled but Script Isn’t Handling Versions#
If HANDLE_VERSIONING is False but your bucket has versioning, only current versions are deleted (non-current versions remain).
Fix: Set HANDLE_VERSIONING = True to delete non-current versions.
Issue 2: Object Lock is Enabled#
S3 Object Lock prevents deletion of locked objects (via legal holds or retention periods).
Check: Use get_object_lock_configuration to verify:
response = s3.get_object_lock_configuration(Bucket=BUCKET_NAME)
print(response.get('ObjectLockConfiguration', {}).get('Status')) # Returns 'Enabled' if locked Fix: Remove legal holds or wait for retention periods to expire before deleting.
Issue 3: Insufficient Permissions#
If you see AccessDenied, ensure your IAM role has s3:DeleteObjectVersion (for versioned buckets) and s3:ListBucketVersions.
Issue 4: Incomplete Multipart Uploads#
These are not listed in list_objects_v2 or list_object_versions. Use list_multipart_uploads to find and abort them:
def abort_old_multipart_uploads(bucket, days_threshold):
cutoff = datetime.now().astimezone() - timedelta(days=days_threshold)
paginator = s3.get_paginator('list_multipart_uploads')
for page in paginator.paginate(Bucket=bucket):
for upload in page.get('Uploads', []):
if upload['Initiated'] < cutoff:
s3.abort_multipart_upload(
Bucket=bucket,
Key=upload['Key'],
UploadId=upload['UploadId']
)
print(f"Aborted multipart upload: {upload['Key']}")
# Add this to your script to clean up multipart uploads
abort_old_multipart_uploads(BUCKET_NAME, DAYS_THRESHOLD) Automating the Process: Scheduling with Lambda or Cron#
To run the script automatically, use:
Option 1: AWS Lambda (Serverless)#
- Package the Script: Zip the Python script (no dependencies needed if using Lambda’s built-in Boto3).
- Create Lambda Function: Use Python 3.x runtime, and paste the script into the Lambda code editor.
- IAM Role: Attach a policy with the required S3 permissions (listed in Prerequisites).
- Schedule with CloudWatch Events: Trigger Lambda daily via a CloudWatch Event rule (e.g.,
rate(1 day)).
Option 2: Cron (On-Prem/EC2)#
For servers, use cron to run the script daily:
# Edit crontab
crontab -e
# Add: Run daily at 2 AM
0 2 * * * /usr/bin/python3 /path/to/your/script.py >> /var/log/s3_cleanup.log 2>&1 Best Practices#
- Test First: Run the script with
printstatements (instead ofdelete) to preview deletions. - Enable Versioning Temporarily: For critical buckets, enable versioning before cleanup to recover accidentally deleted objects.
- Log Deletions: Add logging to track deleted objects (e.g., write to CloudWatch Logs or a CSV file).
- Handle Errors: Add retry logic for transient failures (e.g.,
botocore.exceptions.ClientError). - Avoid Throttling: Add delays between delete requests if deleting 10k+ objects (S3 has request limits).
Conclusion#
S3 Lifecycle Policies work for simple cases, but persistent objects due to versioning, locks, or misconfigurations require a more hands-on approach. With Boto3, you can build a flexible script to delete old objects, handle edge cases, and automate cleanup—keeping your S3 costs in check.
By following this guide, you’ll ensure no object overstays its welcome!