Ever heard of a Requester Pays Amazon S3 bucket? Great!
Ever heard of an Amazon Pre-signed URL? Great!
Question: Can you combine them together?
If you're interested to know, read on...
Requester Pays buckets
First, let's cover some terminology. Imagine you are NASA and you have some very detailed scientific pictures of the Earth. NASA would like to make these pictures available to everybody (since it is financed by the government) but bandwidth is costly. If lots of people downloaded the images, the Data Transfer costs could be eye-watering.
Enter Requester Pays buckets, which allow you to avoid paying Data Transfer charges by transferring that cost to the person requesting the photos. Such buckets cannot provide 'anonymous access' because AWS needs to know who is requesting the objects (so they can be charged for the Data Transfer component). Thus, it's not possible to simply put the URL of a photo in a browser to access it (eg http://my-bucket/photo.jpg
). Instead, the request must come from an IAM Role or IAM User that the related AWS account can be appropriately charged.
Pre-signed URLs
Second, we have the concept of a pre-signed URL. The best way to explain this is by way of an example.
Imagine that you have a photo-sharing website using Amazon S3 and you want to keep users' photos private. A user should be able to login and view their own pictures, but the pictures should not be publicly accessible. In addition, you want the ability for users to share selected photos with other users.
In such a scenario, it would not be easy to create an IAM policy that grants access to a user's own photos and also those that have been shared with them, since there could be hundreds of shared photos. Instead, the ability to access photos in S3 would work like this:
- A user authenticates to the application
- A user requests a photo, or they navigate to a page that displays photos on the web page (via an
<img>
tag) - The application then checks whether they are authorized to view the image.
- If they are permitted access, the application generates a pre-signed URL that grants time-limited access to the object in Amazon S3
- The user's browser then uses that URL to request access to the object and Amazon S3 grants access to the object
A pre-signed URL looks like this:
https://my-bucket.s3.ap-southeast-2.amazonaws.com/foo.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXXXX%2F20200414%2Fap-southeast-2%2Fs3%2Faws4_request&X-Amz-Date=20200414T034309Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=e89331d191...
The pre-signed URL contains:
- A URL to the actual object in Amazon S3
- The time that the URL expires
- The authorized user's Access Key (which is okay to be public)
- A hash signature (based on the authorized user's Secret Access Key)
When Amazon S3 receives such a URL, it recalculates the signature and checks that the time has not expired. If everything seems okay, it grants access to the object in Amazon S3. If anything in the URL has been changed or falsified (such as the expiry time), the request will fail.
A pre-signed URL can be created in a number of ways.
Using the AWS CLI:
aws s3 presign s3://my-bucket.s3.amazonaws.com/foo.jpg
Or, using the boto3
AWS SDK in Python:
s3_client = boto3.client('s3')
response = s3_client.generate_presigned_url(
'get_object',
Params={
'Bucket': bucket_name,
'Key': object_name
},
ExpiresIn=expiration
)
Combining Request Pays + Pre-Signed URLs
Here's the fun bit.
A question on StackOverflow had a scenario where:
- Account A is providing a requester-pays bucket (
Bucket-A
) - An application in Account B wants to allow users to view/access an object in Bucket-A
- These users do not have their own AWS account, so the application will need to generate a pre-signed URL (which will send the Data Transfer costs to Account B)
However, they were getting an AccessDenied
error.
The reason for this error turned out to be that requests going to a Requester-Pays bucket needs an acknowledgement that the requester is willing to pay for the Data Transfer costs (otherwise they might access a URL without knowing it would cost them more).
To do this, a URL to a requester-pays bucket requires an additional header in the request:
http://...&x-amz-request-payer=requester
However, adding this to a pre-signed URL results in a SignatureDoesNotMatch
error because the request URL no longer matches the signature that was originally generated.
The solution, it turned out, was to specify this additional parameter within the signing request:
s3_client = boto3.client('s3')
response = s3_client.generate_presigned_url(
'get_object',
Params={
'Bucket': bucket_name,
'Key': object_name,
'RequestPayer': 'requester' // Additional parameter
},
ExpiresIn=expiration
)
Resulting flow
The final flow for this solution is:
- User wishes to access object in Requester Pays bucket
- Application generates a pre-signed URL (since anonymous access is not permitted), but also specifies the additional parameter to accept charges
- The pre-signed URL is sent to the user's web browser
- The web browser requests the object from Amazon S3
- Amazon S3 verifies the signature and expiry time on the pre-signed URL
- Amazon S3 then provides the contents of the object and charges the Data Transfer portion to the AWS Account that created the pre-signed URL
It's not Rocket Science, but does help the Rocket Scientists.
Top comments (0)