DEV Community

saifmomin
saifmomin

Posted on

Right Scaling AWS Lambda

AWS Lambda is a serverless compute service that revolutionised the serverless movement. Lambda allows you to run your application at scale without managing servers.

So how does your Lambda function scale? Lambda scales out automatically in response to incoming requests; and it scales down to zero when the traffic stops. Lambda manages the infrastructure and the scaling for you, but it does give you a few ways to control the scaling and that in turn helps you with cost and performance benefits.

Before we look at how the scaling controls work, let's understand a few concepts first - Execution Environment & Concurrency.

Behind the scenes, your Lambda functions run inside an isolated runtime environment which is called an Execution Environment. When your function is invoked, Lambda spins up an instance of the execution environment to run your function code.

An Execution Environment is a secure and isolated container that runs your function code.

What happens when the requests arrive at a faster rate invoking your function even before it has finished running? In such a case, Lambda creates additional execution environments, which run the function concurrently.

Thus Lambda Concurrency is the number of execution environments of your function that are active and serving requests at any given time.

exec env

So how many functions can you scale concurrently? There is a limit to how many execution environments can be provisioned concurrently. Lambda has a soft limit of running 1000 concurrent execution environments per Region. The function continues to scale until the concurrency limit for the function in the Region is reached. After the limit is reached all additional requests fail with a throttling error.

Note that the common pool (quota limit) is shared among all the functions running in the Region. So your function may be throttled when the pool is consumed by other functions in the Region. How do we mitigate the throttling? This can be avoided using Reserved Concurrency control.

You can reserve execution environments capacity from the common pool for your functions using Reserved concurrency. Reserved Concurrency guarantees the maximum number of concurrent execution environments for the function.

Use Reserved concurrency for your business-critical functions to ensure guaranteed scaling.

For example, if you set Reserved Concurrency to 4, your function is guaranteed to scale to 4 instances (however, beyond that it will be throttled). There are many other reasons to use Reserved concurrency beyond just guaranteed scaling. You can use it to control the cost of Lambda, or you can protect a backend resource from being overwhelmed by your function scale, or you can use it as a kill switch (set the reservation to zero and all invocations to the function stop).

resv con

But... how fast can the function scale? To understand this we need to know what happens behind the scenes when a Lambda function is invoked.

func life

When the Lambda service receives a request to run a function, it does a few things -

  • downloads function code
  • creates a new execution environment
  • runs initialization code
  • runs handler code

So you see before your function handler runs other steps must complete. This adds latency and is referred to as Cold Start. Cold start is not a desirable property because of course you want your function to scale fast.

cold start

So how do we reduce cold starts? There are a few ways to mitigate the impact of cold starts, like Execution environment reuse, Function warmers and Provisioned concurrency. Other factors like language runtime, memory size and optimised code also help improve startup latency.

Execution environment reuse: What happens after the function execution completes? The Lambda service retains the execution environment for a non-deterministic period, instead of destroying it immediately after execution. During this time, if another request arrives for the same function, the service may reuse the environment, which makes this second request typically complete faster. However, to reduce cold starts, you should not depend on execution environment reuse for several reasons: like, when your function scales up due to traffic or when you update your function code or configuration, your next function invocation would result in a new execution environment. Also, AWS runs Lambda in multiple Availability Zones (AZ) for high availability, so a function can be invoked in different AZ resulting in a new execution environment.

env reuse

Function warmers: A simple hack is to use ping to keep the function warm, like this open source lambda warmer. However, this again is not a guaranteed way to reduce cold starts. For example, it does not work if the Lambda service runs your function in another AZ.

Provisioned Concurrency: The recommended solution to reduce cold starts is to use Provisioned Concurrency. The Provisioned Concurrency feature prepares execution environments in advance keeping functions initialised and ready to respond.

Use Provisioned Concurrency to solve the cold start issue for your latency-sensitive application.

For example, if you set Provisioned Concurrency to 4, your function will scale to 4 instances without experiencing fluctuations in latency.

prov con

Note that Provisioned Concurrency incurs charges. You pay a price for each warm container. So is there a way to control the cost of Provisioned Concurrency? Yes, you can use Application Auto Scaling with Provisioned Concurrency. Application Auto Scaling will add or remove the number of warm environments just for the necessary moments, like when traffic starts or during peak usage.

Combine Application Auto Scaling with Provisioned Concurrency to control the cost of Provisioned Concurrency.

Closing

I hope this post helps you understand how AWS Lambda scales and how concurrency controls can be applied to optimise cost and performance.

Thank you for reading!

Top comments (0)