Michael Hausenblas

Posted on Dec 4, 2018 • Originally published at dev.to

A Lesson Learned In Going Serverless

#serverless #aws #lambda #apigateway

When I was working on the serverless variant of imgn, a simple image sharing demo app, I ran into an issue around handling multipart form-data POST requests with the API Gateway and AWS Lambda. This post shows how I solved it.

The imgn app is really very simple: it allows you to upload image files and then you can view them in a public gallery. In the background, it extracts some image metadata (just the dimensions, for now) and displays it along with the images:

During the development of the serverless variant of imgn I ran into an issue in the context of uploading the image to S3 using a Lambda function via the API Gateway. You can read up on the issue description in greater detail on StackOverflow, but the essence is: the payload that the API Gateway hands over to the Lambda function gets somehow butchered, so the images end up being corrupted when they land in the respective S3 bucket.

To narrow down where this SNAFU happens, I first replaced my initial code that would create a http.request and use the ParseForm method to get to the image data with my own implementation of parsing the multipart form-data. Still, the corruption was the same.

Next, in order to exclude the "upload to the S3 bucket" part as the culprit for the data corruption, I returned the parsed image data I got from the API Gateway as a result of the Lambda call. This allowed me to use hexdump to compare the resulting file with the original, but still no bueno. The problem persisted.

The only thing left in the request path was the handover of the (binary) image data from the API Gateway to the Lambda function.

After trawling StackOverflow and reading many seemingly relevant posts in AWS fora, I decided it was time for a different approach. I read up on pre-signed URLs and decided to give this a try. These pre-signed URLs enable you to allow people to manipulate objects in S3 buckets and you can set restrictions in terms of what operation, on which object, and for how long you allow it to happen.

So, the idea would be that when a user issues a multipart form-data POST request via the UI, the UploadImageFunction would not directly upload the image to S3 but create a pre-signed URL for a PUT request into said S3 bucket, valid for the file referenced in the previous form-data POST request and restricted to 5 minutes:

req, _ := s3c.PutObjectRequest(&s3.PutObjectInput{
        Bucket: aws.String(gallerybucket),
        Key:    aws.String(imgname),
})
presurl, err := req.Presign(5 * time.Minute)

… and the JavaScript code in the UI would then issue a second PUT request, using the pre-signed URL to directly upload the image into the S3 bucket. And that works like a charm!

I'm still unsure if I should qualify this strategy as a hack or a good practice and would certainly appreciate to hear from other serverless practitioners what they think.

Top comments (11)

David J. Felix 🔮 • Dec 4 '18 • Edited

Just to validate that this isn't a "hack" and you did the right thing -- this is a pretty common workflow with S3 and pretty weird until you grok it fully. We use it for much larger files. You can also initiate a multipart s3 upload from the lambda and then upload individual segments via presigned put part requests. You'll still have to call the API to get those presigned put part urls.

Michael Hausenblas • Dec 4 '18

Thanks a lot for confirming this, David! Really appreciate your feedback here. Still wondering why the Gateway causes the issue but I'm glad to learn this is indeed the proper way doing it.

David J. Felix 🔮 • Dec 4 '18 • Edited

API gateway has payload limits. docs.aws.amazon.com/apigateway/lat... 10MB for input payloads. Also lambda will have similar payload limits, so usually you want to sidestep this by only passing metadata about big files, similar to how you'd want to model databases for files.

An alternative to this URL-passing strategy is to utilize AWS cognito. docs.aws.amazon.com/IAM/latest/Use... Cognito users can be granted IAM permissions allowing them to write to restricted areas of S3 buckets. I'm not a huge fan of cognito user pools, and getting custom auth working through cognito for IAM is a pretty huge task.

One more thing to note: I typically tell people "don't request the presigned URL unless you are going to use it literally ASAP". This is important for when people present download links or upload links. The link should not pre-populate with presigned urls, but instead should be JS that fetches the URL and immediately fires it. I use a 300s time limit on all presigned URLs and have no issues. It's important to know that the time limit is only on the START of the request, not the end -- so people with slow connections only need to start the call in 300s, not finish it (good to know for uploads). Of course, when you use put-part, you should get each part URL right before putting the part -- not all at the beginning.

Hope this helps!

johnny-cw-chan • Nov 7 '19 • Edited

did you mean 300ms? i'm having the same problem with downloading large S3 objects via Api gateway and we are currently using a pre-signed URLs returned from the backend to the UI. However we were concerned with potential security breach if somebody malicious managed to get hold of the pre-signed URL. We currently set the time limit expiry to 5 minutes.

The alternative approach of using Cognito seemed like a large task to undertake given this should help mitigate the security risk

David J. Felix 🔮 • Nov 7 '19 • Edited

No, I meant 300s (5m). 300ms is far too low and you'd certainly have issues with people not having time to execute it. Keep in mind you're minting this URL and then sending it over network to a client who is expected to then call S3 within the time you have allotted, which includes the wire time to send it to them and their wire time to request it from s3. If I'm remembering correctly, S3 doesn't even deal in milliseconds, the lowest value you can set is 1 second which I think is still too low.

I think if your front end is using a pre-signed URL and it's never shown to a user you should ask yourself, what is the attack vector that you're worried about? A hacker that can get ahold of a 5min URL can certainly get ahold of a 1min URL or even a 30s URL. The closer you get to real world latencies on cell networks, the more likely your users are going to see failures when following the presigned URL. We decided 5minutes was low enough to mitigate risk, but I'd be cautious recommending anything less than 1 minute or like 30 seconds if you have cell users.

From experience though, it does seem like the timeout is until the user calls S3, not finishes the s3 call, so s3 won't hang up on users who have slow connections, which was something we worried would happen when we selected 5min. It is possible that their connection naturally times out or disconnects, which may have some recovery implications with lower presigned URL durations.

iwaduarte • Dec 5 '18

One of the things that you could consider is configuring your api gateway endpoint. Is just a question of encapsulating the content in a buffer in the s3 side (I have used encoding base64) and treating the http request as a binary media type by putting the tag. "multipart/form-data" in the settings -> media type in the api console (or at least that it is what I did recently to solve the same problem). I do not know if this solution is the best practice either but surely I believe it is the easiest.

Michael Hausenblas • Dec 5 '18

Thank you for the follow-up here and your suggestion. Indeed, I've seen this method documented, for example here and looks sensible to me as well as an alternative.

iwaduarte • Dec 5 '18

Although it seems like a solution is different from what I did. My request went directly to lambda (via mock not integration at all) and the binary media types setting is just an input text on the api settings section.

Ravaka Razafimanantsoa • Dec 6 '18

Hi !

I think the problem was that in AWS API Gateway, you need to activate the handling of binary types (Binary Media types) manually and then redeploy your API (do not remember exactly where is setting is). If is is not done, your binary data get corrupted.

Bonus story: One other thing I met when sharing my bucket with other IAM users was that I did not put ACL and ended up not being able to access the data in the bucket and not being able to remove them (even though I am the bucket owner). Adding the ACL bucket-owner-full-control granted me all the rights I needed on these files. But I don't think you will meet this problem with Lambda or Pre signed links.

Michael Hausenblas • Dec 6 '18

Thanks for the suggestion! If you look at the SO question I linked to in the post, that was indeed one of the things I tried (without success). Or maybe you mean something else?