From $40 to $7: How We Slashed Video Transcoding Costs with Serverless and Spot Instances

Jiju Thomas Mathew — Fri, 24 May 2024 23:21:23 +0000

We all know the struggle of managing cloud costs, especially when it comes to workloads with fluctuating demands. In this blog post, I'll share how we successfully migrated our video transcoding system from a costly EC2 on-demand setup to a serverless architecture with spot instances, achieving a dramatic reduction in monthly bills while significantly improving performance.

The Challenge: Expensive and Inflexible Transcoding

Our previous video transcoding system utilized a continuously running EC2 instance to process video files uploaded via FTP. This approach had several drawbacks:

High Cost: Even during idle periods with no video uploads, the EC2 instance incurred a recurring monthly cost of around $40.
Limited Scalability: The single instance couldn't automatically scale to handle bursts of video uploads, leading to processing delays.
Slow Processing Times: Videos could take up to an hour to process due to the fixed resources of the EC2 instance.
High Error Rate: We experienced a concerning 10% error rate during video processing.

The Solution: Serverless and Spot Instances to the Rescue
To address these challenges, we adopted a serverless architecture leveraging AWS Lambda functions and cost-effective EC2 Spot Instances.

FTP File Processing & S3 Upload Lambda: This Lambda is on scheduled trigger checks for video file upload to the FTP server. It retrieves the file, copies it to a designated S3 bucket, and adds metadata for tracking.
Process Decider Lambda: This Lambda analyzes the uploaded video file. If the file size is below a certain threshold, it triggers the Transcoder Lambda for processing. For larger files, it initiates the Spot Bidder Lambda.
Transcoder Lambda: This Lambda utilizes a pre-compiled ffmpeg binary from a lambda layer to process smaller video files directly within the serverless environment.
Spot Bidder Lambda: This Lambda estimates the processing time for the large video file and analyzes spot instance pricing trends across different regions. The file information is pushed into an SQS Queue. It then requests a spot instance with the most competitive pricing to handle the transcoding task.
EC2 Spot Instance: Upon launch, the spot instance retrieves processing details from an SQS queue and utilizes ffmpeg to transcode the large video file. Once completed, it updates the SQS queue and terminates itself.
DynamoDB: This NoSQL database stores detailed metadata about each processed video file, including processing time and completion status.
SNS Topics: These topics are used for sending notifications regarding Lambda execution, spot instance launch/termination, and processing completion.
IAM Roles: Granular IAM roles are assigned to each component, ensuring least privilege access and enhanced security.

Benefits Achieved: A Cost-Effective and High-Performance System

The migration to a serverless architecture with spot instances yielded significant benefits:

Drastic Cost Reduction: Our monthly bill plummeted from $40 to a mere $7, a remarkable 82.5% cost saving!
Automatic Scaling: Lambda functions and spot instances automatically scale based on workload, eliminating idle costs and ensuring efficient resource utilization.
Blazing-Fast Processing: Video processing times are now down to a single minute, a significant improvement over the previous one-hour wait.
Near-Zero Error Rate: Our error rate has practically vanished thanks to the inherent reliability of serverless functions and spot instances.

Lessons Learned: The Power of Serverless and Spot Instances
This migration project highlighted the power of serverless architectures and spot instances for cost-effective and scalable cloud solutions.

Event-driven architecture: Utilizing event triggers for Lambda functions streamlines the workflow and ensures resources are used only when needed.
Process Splitting: Dividing the work into smaller, independent Lambdas enhances modularity and facilitates individual scaling for each process.
Intelligent Decision Making: Leveraging Lambdas for file size analysis and spot instance cost optimization automates decision-making and minimizes resource costs.

Future Considerations: Continuous Improvement

We're constantly striving to refine our system:

Error Handling and Retries: Implementing robust error handling mechanisms with retries for failed processing attempts will further enhance system reliability.
Monitoring and Logging: Granular monitoring and logging across all components will provide valuable insights for troubleshooting and performance optimization.
Testing and Scalability: Regular stress testing under high loads ensures the system scales effectively and maintains performance during peak workloads.
Conclusion: A Winning Transformation
The migration of our video transcoding system from EC2 on-demand to a serverless architecture with spot instances proved to be a resounding success. We achieved significant cost savings, improved processing speed and reliability, and gained a highly scalable solution. This case study demonstrates the potential of serverless architectures and spot instances for optimizing cloud resource utilization and managing costs effectively.

If you're considering a similar migration, feel free to leave a comment below or contact me directly for further details. We're happy to share our learnings and help you embark on your own cloud optimization journey.

Refactored a Lambda Heap to use Layers

Jiju Thomas Mathew — Sat, 10 Jul 2021 05:54:10 +0000

Till recently, in fact, till last week, was not too worried about writing all code into single code folder, and mapping multiple AWS::Serverless::Function into individual named handlers. Till I stumbled on this article, where I started wondering how my folder structure and sam templates were going into the stack. A detailed inspection was not required, though this was the time when I used the GUI ( after a long time ). But the outcome showed how pathetic the condition was.

Well, it is clear that the whole mess is being uploaded into all the function code. What does this mean – holy grail, any one small change here or there, would update all the functions – last modified is the same, all functions will have the node_modules and other artifacts like templates and custom modules.

Well I started today to do the optimization, armed with the credits earned from the AWS Community Builder Program, duplicated the codebase to another folder, and started the refactoring. I might have copied the template code from here, which explains my layer with a different description in the screenshot. It took me about 3 or 4 layer versions to strike the correct structure.

Starting with lib/ then adding a nodejs/ beneath that, and finally moving node_modules into this folder. The structure is simple that our layer should have the runtime folder in which we can have the modules. Now was confused about how to make my custom modules, which were not available on any registry, to be available for the handlers to import using require. Just thought that by creating a folder inside node_modules and putting all my custom modules into that should work. Well, it worked, though some relative paths started breaking. Finally these also were resolved by using /opt/nodejs/node_modules/custom-module/ instead of ./

Basically, the size difference is one big advantage, and when we run deploy for a minor change in any handler, instead of packing the whole whopping 500kb, the single handler is zipped and uploaded for the deployment. Also change in one handler will not affect the container version for other functions which could be a better tradeoff for heavily active applications taking advantage of reuse of active containers.

Now considering to add one more layer with our custom modules, since these can get evolved and require slight modifications during further development. Currently since these are along with the main layer, changes could bring in the upload of the whole node_modules folder.

After a round of smoke testing, the whole codebase was migrated back into the production system and deployed there. Thereafter deploy was completed, a full round of functional tests was run.

Read the article at jijutm.com

DEV Community: Jiju Thomas Mathew

From $40 to $7: How We Slashed Video Transcoding Costs with Serverless and Spot Instances

Refactored a Lambda Heap to use Layers