Sydney Schreckengost

Posted on Dec 18, 2018

Building a super cheap transcoder using AWS Lambda

#aws #lambda #functions #scaling

Have you ever looked at the price of Amazon's Elastic Transcoder? If you haven't I'll spare you the lookup - it's 3 cents (USD) per minute of video transcoded, if the video is considered "HD". If you have a handful of videos, or you need a lot of flexibility, Elastic Transcoder is pretty cool.

Just one problem - if that's not your use case, Elastic Transcoder can get really expensive. For example, the company I work for processes hundreds of thousands of videos every month, and our needs are very simple - we need to turn a specific type of video into a different specific type of video. This is all we need to do, nothing more and nothing less.

Using Elastic Transcoder, our average monthly bill is high enough to really worry about. While there's no reason not to use the right tool for the job, Elastic Transcoder is overkill for our needs.

After this is done, we'll have a usable, cheap transcoder that we can use. While this does have some limitations, it is still very cost-effective for what it's good for, and understanding this can help understand a lot of really interesting aspects of using S3 and Lambda.

Creating our Lambda function

First, we need to create the Lambda function that we will be using. Log in to AWS, and navigate to Lambda. Click Create Function and create your function. Any language with the ability to call system commands should work fine, though I used Go here because, well, I like it. It's that simple, it doesn't provide a major advantage here.

Name your transcoder, select a language, and create a custom role - if you're not familiar with AWS, this part might get confusing. Something like this should be right for your custom role, if you're looking to get started:



{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:GetBucketAcl",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::testBucketName",
                "arn:aws:s3:::testBucketName/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:HeadBucket",
            "Resource": "*"
        }
    ]
}

In the above example, note that you do need both testBucketName and testBucketName/* - one set of the policies is for the bucket itself, the other is the files themselves.

Coding our function

After we've created the function, we need to tell it how to operate. This will vary from language to language, but the general gist is like this:

Download the existing file from S3. The filename and bucket are sent in as part of a an S3 Event in most cases, depending on the library used. As a side note on this, because of the way Lambda works, make sure you have a unique identifier for the video, or you can end up with weird errors. Go on, ask me how I know.
Transcode using ffmpeg - we'll make use of a newer feature to make this simple and portable across any function we might need ffmpeg for. For now, just expect to use /opt/ffmpeg/ffmpeg.
Upload the video to S3.

That's really about it. This will vary from language to language, but AWS fortunately provides pretty usable documentation for this.

Setting up S3 events

Now that we've written our function, the last major thing we need to do is set up S3 to actually send events to Lambda.

Go to the S3 bucket you want to use, and click "Properties", and under "Advanced", click "Events". Now click "Add Notification", and select the object actions that should trigger a transcode (most likely "All Object Create Events").

If there are specific folders you want to include, that can go into prefix; if you are looking for a specific extension, set that in the suffix. Click the dropdown for Send To: and select "Lambda Function". Another dropdown will be created; in here, select the function you created previously.

Creating a layer

Layers are .zip files that are included in the filesystem of a Lambda function. In this case, we are using a statically compiled (that is, it doesn't require anything outside the binary, generally speaking) version of ffmpeg. You can find the one I used here: https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz. Extract this, and you should get an ffmpeg directory; add this to a zip file.

Go to your Lambda function, and under the code part at the top of a tree, click "Layers".

Click "Add Layers" and upload the zip file containing ffmpeg. Attach it to the function, and you're done.

Done!

Mostly, at least. Some minor changes to make here and there once it's up, and it can take a little while to figure out the right way to do this in a given language.

When a video gets uploaded, it will be sent to the Lambda function, which will then transcode it and upload it.

This project was designed as part of a way to reduce costs, so let's take a look at that aspect.

A 10-minute video in this setup takes about 105 seconds to transcode. With Lambda, we can give it more RAM to work with, and for longer or higher resolution videos we absolutely do need more; I have our current configuration set to 1280MB.

If our transcoder takes 1280MB and 105 seconds, we can easily calculate our cost to transcode a 10-minute video:

$0.00001667 * (1280/1024) gives us our cost to run the function for a second. This works out to ~$0.000021 per second. Multiplying this by 105, the run-time, gives us a figure of $0.0022. Keep in mind, that with Elastic Transcoder, a 10-minute video would be $0.30. This means that our transcoder, running on Lambda, can provide a greater than 99% discount over Elastic Transcoder.

This does have some limitations:

This will work much better on shorter videos; due to the run time limitation on Lambda, I doubt anything over 45 minutes to an hour is worth transcoding this way.
Longer videos will require more RAM, so the savings do drop a little bit - but even with the RAM maxed out to 4GB, it's still 97+% cheaper than Elastic Transcoder.
It is not very flexible - though this can be overcome with a little creativity.

For our needs, this is perfect, but remember that all engineering is knowing your constraints and getting the best for what you have.

Top comments (17)

voltrus • Sep 12 '19 • Edited

Hi Harold, Thanks for the life saving post.Im a newbie to coding, I got stuck at the tweaking part that you're saying after extracting the ffmpeg directory. I couldnt change any ffmpeg parameters and also couldnt find where the compressed files are going. Can you help me with that?

Sydney Schreckengost • Sep 12 '19 • Edited

That's handled by the script that actually calls ffmpeg. So, in this case, all I'm really after is to change codecs/container formats, so I just have the script/program that calls it do the output options. It never changes, so I can hard code most of the options, but you can also set it dynamically depending on your needs.

You'll need to download the file to /tmp, and also transcode it to their. This introduces the big issue with this setup - large files will fail, plainly.

onlinecheckwriter • Aug 17 '20

Coding our function:

Can you give the code too? We are struggling on this

Sydney Schreckengost • Sep 10 '20

I can't yet, but I can give you the basics of it (the details are dependent on language, but the general gist):
1) Receive the lambda message and use the details to download the input file
2) Run ffmpeg with the options you need
3) Upload the file to S3

jackson007 • Mar 3 '20

Hi Harold:
Thank you for your great instrutions.
As I know, it is only 512MB for the /tmp directory, so you save the video in the memory?Does it take much time for downloading video from S3， and how you speed it up?
Thanks again!

Sydney Schreckengost • Dec 18 '18

If I missed anything, please do mention it so I can update it.

Edithouse • Oct 9 '19

Hey Harold, Thank you for your great instructions, does it mean that the Lambda limitation will unable to encode full length film? e.g. feature film of 1hour and 45mins or longer for example?

Thanks again!

Sydney Schreckengost • Jan 19 '20

Yeah, that's the big downside. It can't really do that. In fact, I found out that anything with loads of motion will break, because it only has so much room. But it's pretty good for the things we really need it for.

Timothy McJak • May 21 '19

New to Layers -- how do I import / reference the ffmpeg layer from within Lambda?

Sydney Schreckengost • Jun 2 '19

Ah! My apologies - from within the function page, under "Designer", immediately under the function you should see a layers button. Everything you need is in there.

Thiago Cardoso • Jul 3 '20

I'd like to know how you overcome errors like 'moov atom not found', which I find very often in my process; I'm doing a similar job, but I'm using node.js in my case

James Dixon • Aug 1 '20

Unrelated question but have you found transcoding video to be especially slow with Node on Lambda? Harold mentioned a 10 minute video taking 105 seconds. I have a 30 second video that's taking over a minute to transcode. On my local system it takes seconds...

Sydney Schreckengost • Sep 10 '20

If you're doing it using ffmpeg, it should be about the same across any runtime.

Are you accounting for the time to download, etc? Also, it will always run faster on a local machine with real resources behind it, and the videos I'm transcoding are going to be relatively low-motion by nature.