DEV Community

Habil BOZALİ
Habil BOZALİ

Posted on • Originally published at habil.Medium on

How to List AWS S3 Directory Contents Using Python and Boto3


Photo by Luke Chesser on Unsplash

When working with AWS S3, you might need to get a list of all files in a specific bucket or directory. This is particularly useful for inventory management, backup operations, or content synchronization. In this article, we’ll explore how to use Python and boto3 to list directory contents in an S3 bucket.

Common Use Cases

  • Creating inventory reports of S3 bucket contents
  • Verifying uploaded files after bulk transfers
  • Monitoring content changes in specific folders
  • Synchronizing content between different environments
  • Automated file management and cleanup operations

Prerequisites

  • Python 3.x installed
  • AWS account with appropriate permissions
  • boto3 library installed (pip install boto3)
  • AWS credentials configured

Understanding the Code

Let’s break down a simple yet effective solution:

import boto3
import csv
def list_bucket():
  # Configuration
  bucket = "your-bucket-name"
  folder = "path/to/folder"
  # Initialize S3 client
  s3 = boto3.resource("s3",
    aws_access_key_id="YOUR_ACCESS_KEY",
    aws_secret_access_key="YOUR_SECRET_KEY"
  )
  # Get bucket reference
  s3_bucket = s3.Bucket(bucket)

  # List files and extract relative paths
  files_in_s3 = [
      f.key.split(folder + "/")[1] 
      for f in s3_bucket.objects.filter(Prefix=folder).all()
  ]

  # Write results to file
  with open('bucket-contents.txt', 'w', encoding='UTF8') as file:
      file.write(str(files_in_s3))
if name == 'main':
  list_bucket()
Enter fullscreen mode Exit fullscreen mode

Code Breakdown

  1. Client Initialization
s3 = boto3.resource("s3",
    aws_access_key_id="YOUR_ACCESS_KEY",
    aws_secret_access_key="YOUR_SECRET_KEY"
)
Enter fullscreen mode Exit fullscreen mode

This section establishes a connection to AWS S3 using your credentials. For better security, consider using AWS CLI profiles or environment variables instead of hardcoded credentials.

  1. File Listing
files_in_s3 = [
    f.key.split(folder + "/")[1] 
    for f in s3_bucket.objects.filter(Prefix=folder).all()
]
Enter fullscreen mode Exit fullscreen mode

This part:

  • Uses filter(Prefix=folder) to list only files in the specified folder
  • Splits the full path to get relative file paths
  • Creates a list using list comprehension
  1. Output Generation
with open('bucket-contents.txt', 'w', encoding='UTF8') as file:
    file.write(str(files_in_s3))
Enter fullscreen mode Exit fullscreen mode

Writes the results to a text file using proper UTF-8 encoding.

Enhanced Version

Here’s an improved version with additional features:

import boto3
import csv
from datetime import datetime
def list_bucket_contents(bucket_name, folder_prefix, output_format='txt'):
    try:
        # Initialize S3 client using AWS CLI profile
        session = boto3.Session(profile_name='default')
        s3 = session.resource('s3')
        bucket = s3.Bucket(bucket_name)

        # Get file listing
        files = []
        for obj in bucket.objects.filter(Prefix=folder_prefix):
            files.append({
                'name': obj.key.split(folder_prefix + '/')[1],
                'size': obj.size,
                'last_modified': obj.last_modified
            })

        # Generate output filename
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        output_file = f'bucket_contents_{timestamp}.{output_format}'

        # Write output
        if output_format == 'csv':
            with open(output_file, 'w', newline='', encoding='UTF8') as f:
                writer = csv.DictWriter(f, fieldnames=['name', 'size', 'last_modified'])
                writer.writeheader()
                writer.writerows(files)
        else:
            with open(output_file, 'w', encoding='UTF8') as f:
                f.write(str(files))

        return True, output_file

    except Exception as e:
        return False, str(e)
if __name__ == ' __main__':
    success, result = list_bucket_contents(
        'your-bucket-name',
        'path/to/folder',
        'csv'
    )
    print(f"Operation {'successful' if success else 'failed'}: {result}")
Enter fullscreen mode Exit fullscreen mode

This enhanced version includes:

  • Support for multiple output formats (CSV/TXT)
  • Additional file metadata (size, last modified date)
  • Error handling
  • Timestamp-based output files
  • AWS CLI profile support

Best Practices

  1. Never hardcode AWS credentials in your code
  2. Use error handling to manage potential failures
  3. Consider pagination for large buckets
  4. Include relevant metadata in the output
  5. Use appropriate file encoding (UTF-8)

Conclusion

Directory listing in AWS S3 using Python and boto3 is a powerful tool for managing your cloud storage. Whether you’re doing inventory management, migrations, or routine maintenance, this script provides a solid foundation that you can build upon.

See you in the next article! 👻

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay