How to Optimize Lambda Memory and CPU

#awslambda #lambda #serverless #optimization

Using AWS Lambda dramatically changes how we think about computing resources.

On EC2, we rent servers by the hour, choosing among types of machines with a different combination of resources. Lambda model is a bit different and this article shows how to benefit from some interesting optimizations possibilities we have.

The Lambda Model

Lambda resource allocation model is dead simple: choose how much memory your function will need and boom, you’re done. Currently, Lambda provides options ranging from 128 MB to 3,008 MB.

One advantage is that you don’t have to account for memory used by the OS or anything else other than your function and the runtime you need (Java Machine, Python interpreter, etc).

There are two important caveats to this model, though, that many developers usually do not pay close attention.

CPU allocation

Lambda will allocate CPU power linearly in proportion to the amount of memory configured. This will have relevant implications for many optimization strategies, so keep this in mind.

It is known that at 1,792 MB we get 1 full vCPU. Did you notice the “v” in front of “CPU”? The definition of a vCPU is “a thread of either an Intel Xeon core or an AMD EPYC core”. This is valid for the compute-optimized instance types, which are the underlying Lambda infrastructure (not a hard commitment by AWS, but a general rule).

Thus, if you assign 1,024 MB to a function, you will get roughly 57% of a vCPU (1,024 / 1,792 ~= 0,57). Of course, it’s not possible to “divide” a CPU thread, so what AWS is doing in the background is dividing the CPU’s time. At 1,024 MB, your function will receive 57% of the processing time. The CPU may switch to perform other tasks on the remaining 43% of the time.

Obviously, the result of this CPU allocation model is: the more memory you allocate to your function, the faster it will accomplish a given task.

Multi-core (kind of)

Provided that 1,792 MB gets 1 full vCPU, then how does AWS keep increasing CPU power as we move along the configuration scale, up until 3,008 MB? Obviously, it has to increase the number of vCPUs.

The catch here is that, for single-threaded programs, you will see no speed gains from increasing memory above this threshold. The only way to reap the benefits from more CPU power above 1,792 MB is by writing your code to run in two threads simultaneously. This is obviously not possible in all cases, so be sure to use this wisely.

In our tests, we observed this threshold at 2,048 MB, not 1,792 MB. We kept increasing memory for a single-threaded CPU-intensive task, and we observed speed gains above 1,792 MB, up until 2,048 MB, where memory increases did not translate to increased speed anymore. This leads us to believe that 2,048 MB is the actual threshold that gets 1 full vCPU.

Time-sensitive workloads

In some cases, we will want a Lambda function to respond as quickly as possible. Say there’s a real person waiting for the response and we don’t want to see them growing impatient at our application.

In those cases, the best we can do is allocate more memory to our Lambda function, but not more than 1,792 MB in case it’s a single-threaded program. Whenever possible, we can parallelize the execution and take advantage of the dual-vCPU setup with higher memory settings.

You can use CloudWatch metrics and compare the average duration time before and after the memory tweak, to see how much your function benefited from a memory increase.

Memory-bound

For workloads that are not time-sensitive, we should allocate memory as low as the function requires, because the more memory we assign, the more we pay per 100 milliseconds.

Here there's another caveat: since we pay per duration time and more memory makes our program faster, we can actually save money by increasing memory - even if we don't really need it. Check this article I wrote recently with more details and a benchmark of which memory settings optimize Lambda costs.

Now, the question we need to answer is: for each of our Lambda functions, how much memory is required? Although it should be simple to answer, CloudWatch metrics don't help us here, since memory usage is not monitored by AWS.

We have two options to find the answer, which are outlined below. The first is cumbersome and tedious, but there’s a second one that is much easier.

Extracting Memory Usage from CloudWatch Logs

CloudWatch Logs records the data we need, but not in a way that is easy to consume.

At the end of each Lambda invocation log, you will find a line similar to this:

Notice two data points there:

Memory Size: 512 MB
Max Memory Used: 50 MB

By reading this line from all invocations in all of our Lambdas logs, we would be able to compile statistics to answer the question above. We could use RegEx, for example, to extract the values we need. Start allocating the maximum memory possible, run some sample tests or run the function in production for some time, check the statistics and then adjust memory allocation according to how much the function really needs at peak usage.

The downside of this solution is that it’s cumbersome. It will take time to implement, and we actually would need to support this as a custom monitoring service, which should be readily available whenever we create a new Lambda function.

EDIT: as suggested by Rehan van der Merwe in the comments, you could also use CloudWatch Logs Insights to come up with aggregated functions and compile a performance time-series to support your analysis. Requires a bit of work but theoretically, it should work.

Tracking Memory with Third-party Services

Serverless has reached a reasonably matured state by now, and there are several third-party services to fill the gaps left by AWS Lambda and CloudWatch offerings.

Dashbird, for example, is a monitoring and debugging platform built from the ground up for AWS Lambda. One of the key measures it will provide is memory usage (average, minimum, maximum and 99^th percentile). With these datapoints, it is easy to setup the ideal memory size for each function. The service is free up to 1 million invocations per month, so you may be able to benefit from it at no additional cost.

Wrapping up

We covered the AWS Lambda resource allocation model and how it differs from traditional server models. Lambda model has implications for how we should optimize our functions in CPU and Memory bound workloads. There are some caveats we need to be aware of in any optimization strategy. By combining CloudWatch Metrics, Logs and third-party services, we have all the data we need to find the perfect resource allocation setting for any Lambda function.

In case you want to read more about AWS Lambda best practices, I would recommend reading this free e-book, covering from microservices approaches to winning debugging techniques.