Discussion on: Upload file to S3 -> Launch EC2 instance

View post

Thanks for the good article. I'm curious what are the advantages to using an EC2 instance instead of running the data processing in a Lambda function i.e. completely serverless?

Ambar • Dec 13 '19 • Edited

Some of the main factors that can help you decide between the two:

how long your processing job is going to take
what kind of resources it needs
what is the frequency of such jobs.

Lambda has a hard limit of 15 min execution time. If your job will ever need to run longer than that (which is often the case in "big data" scenarios), it automatically eliminates Lambda from your choices.

If your job needs more than 3 gigs of memory, or heavy compute power, e.g. cluster-compute, then again, Lambda won't cut it for you.

If your jobs need to run very frequently (say hundred of times per day), or need persistent up-time, Lambda might become very expensive, even despite the generous free tier.

Lambda is much better suited for sporadic, event-driven compute tasks that don't need state and typically finish executing in a few seconds or minutes. Especially ad-hoc jobs whose invocation is sporadic (e.g. one-off service requests). Why keep an EC2 instance running 24/7 when you only need it to handle 1 minute executions that occur unpredictably?

This is why micro-services are often suited to Lambda functions - smallish workloads that run quickly (with small cold-start times of typically < 5 seconds) and finish quickly.

Otherwise running your own fleet of auto-scaling containers (as opposed to Lambda's containers with their own limitations) with something like AWS Fargate or even Kubernetes (e.g. EKS or ECS) can also be a good option.