Originally published at https://gist.github.com/neo01124/dc31d0b08bd7ac6906d06197e20dc9b6
This must be at least the 5th time I've written this kind of code for different projects and decided to make a note of it for good.
This might seem like a very trivial task until you realise that S3 has no concept of folder hierarchy. S3 only has the concept of buckets and keys. Buckets are flat i.e. there are no folders. The whole path (folder1/folder2/folder3/file.txt) is the key for your object. S3 UI presents it like a file browser but there aren't any folders. Inside a bucket there are only keys. From the S3 docs
The Amazon S3 data model is a flat structure: you create a bucket, and the bucket >stores objects. There is no hierarchy of subbuckets or subfolders; however, you >can infer logical hierarchy using key name prefixes and delimiters as the Amazon >S3 console does.
The challenge in this task is to essentially create the directory structure (folder1/folder2/folder3/) in the key before downloading the actual content of the S3 object.
Option 1 - Shell command
Aws cli will do this for you with a sync operation
aws s3 sync s3://yourbucket /local/path
Option 2 - Python
- Install boto3
- Create IAM user with a similar policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:GetObject",
"s3:GetBucketLocation",
],
"Resource": [
"arn:aws:s3:::your_bucket_name"
]
}
]
}
- Create a profile in ~/.aws/credentials with access details of this IAM user as explained in the boto documentation
- Code
import boto3, errno, os
def mkdir_p(path):
# mkdir -p functionality from https://stackoverflow.com/a/600612/2448314
try:
os.makedirs(path)
except OSError as exc: # Python >2.5
if exc.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise
def get_s3_path_filename(key):
key = str(key)
return key.replace(key.split('/')[-1],""), key.split('/')[-1]
def download_s3_bucket(bucket_name, local_folder, aws_user_with_s3_access):
session = boto3.Session(profile_name=aws_user_with_s3_access)
s3_client = session.resource('s3')
s3_bucket = s3_client.Bucket(bucket_name)
for obj in s3_bucket.objects.all():
s3_path, s3_filename = get_s3_path_filename(obj.key)
local_folder_path = os.path.join(*[os.curdir,local_folder, s3_path])
local_fullpath = os.path.join(*[local_folder_path, s3_filename])
mkdir_p(local_folder_path)
s3_bucket.download_file(obj.key, local_fullpath)
download_s3_bucket(bucket_name = your_bucket_name, local_folder = "/tmp/s3_bucket", aws_user_with_s3_access = profile_name)
I'd make a package, if there is enough interest :)
Top comments (1)
Hi! I know that I might be a little late on this one, but i am new to aws. What is my profile_name supposed to be? Is it my user name on aws? How do I retrieve it? Sorry for the (probably) dumb question.