Taavi Rehemägi for Dashbird

Posted on May 13, 2021

What AWS Lambda metrics should you definitely be monitoring?

#aws #serverless #devops #sre

What are the crucial AWS Lambda metrics you should definitely be monitoring?

Your application does not need to be "huge" for it to have enough functions and abstraction to get lost in it. As a DevOps engineer, you can't cover every single factor. Showing relevant facts and asking the right questions is crucial! So when there's a fire, you can troubleshoot in no time.

Every organization is unique, and every workload has its own utility. That said, we can still have a generalized approach and start by listing a few desirable qualities that you may want from your AWS application:

Performance
Responsiveness
Cost-effectiveness

AWS Lambda Pricing

Lambda pricing is very straightforward, and the billable factors include:

Number of requests
Compute time
Amount of memory provisioned

Compute time and memory provision are coupled. We'll mention this in more detail further below. Let's start with the number of requests. The first one million requests are free every month. After that, you will be charged $0.20 per million requests for the remainder of that month. That's stupid cheap.

In addition to the number of requests, you also pay for the memory allocated to your function along with the computing time that the function takes to do its job. You specify the memory allocation when you deploy a function. The computing time can vary from one invocation to the next but is limited by the maximum timeout which you configured.

Suppose your function runs for two seconds (2000 milliseconds) and has been allocated 512 MB of memory, then you will be billed for 2 * 512 MB = 1 GB-second of utilization. This is called GB-seconds billing, where you pay for the compute time-based on the memory allocation and time your function runs. If your function has 1 GB of memory allocated and runs for one second (1,000 milliseconds), you pay 1 GB-second.

The first 400,000 GB-seconds are free every month. After the free quota is reached, you pay $0.0000166667 per GB-second (not worth scratching your head over, use our cost calculator instead) for the rest of that month. On top of this, you may get some additional charges for resources like an AWS S3 bucket, VPC, DynamoDB, etc.

Amazon's Pay-for-what-you-use business model does not rely on your needs but the success of your business. If your apps are accessed more often, your organization benefits more, along with a slightly greater AWS bill. This, in turn, benefits Amazon.

Which Metrics Does Lambda Track by Default?

The Lambda service comes with seven metrics for your functions out of the box. Invocations, duration, error count, throttles, async delivery failures, iterator age, and concurrent executions.

Setting them up to alert you when it's needed is a challenge we can easily solve. Let's see what these metrics mean in practice.

The invocations metric tells you how often a function was called in a specific time period.

The duration metric tells you about the runtime of your function for a specific time period. It splits them up into the slowest, fastest, and average duration.

The error count is about how often your function crashed in a specific time period. It also includes a success rate in percentage because a hundred errors with a success rate of 99% do mean something different than hundred errors with a success rate of 1%.

The throttled metric shows how many function invocations couldn't be handled because your function was already scaled up to the concurrency limit of Lambda. The default is 1,000 concurrent executions, but it can be increased if needed.

Async delivery failures occur when your function tries to write to a destination or dead-letter queue but fails.

The iterator age metric is helpful when your function reads from a stream (i.e., Kinesis or Kafka). Such streams are often used, so upstream services aren't overwhelmed by Lambda's ability to scale really fast. If your consuming Lambda function is too slow, it can be that the data stays longer and longer in the stream, and thus the iterator age metric rises.

The concurrent execution metric checks how many parallel containers for your functions are loaded in a specific time period. If you get near the region limit of 1,000, you should try either to improve the performance of your functions or talk to AWS about increasing that limit for your account.

Enter Lambda Cold Starts

Each Lambda function runs inside a Docker container. When it's invoked for the first time, the Lambda service first spins up a new container and then executes the function inside it. This increases latency and may make your application seem slow to the user initially. After this initial latency, the function is kept loaded for a period of time. During that period, new invocations don't suffer similar latencies and feel much more responsive to the client.

Think of it as accessing files within a high-speed cache; when the files are not used for long periods of time, they're flushed out of the cache. When you request those files again, it takes longer to open them.

This process is called a cold start and it can be solved in various ways.

Dashbird introduced the cold start monitoring to show you which invocations were cold starts.

The period for which the function stays warm is speculated to be between 5 and 30 minutes, but Amazon has not confirmed anything. If the function relies on ENI to talk to other services, then that adds another layer of latency.

There's one additional complication, and that's the concurrency issue. If you receive a burst of traffic simultaneously, AWS scales the function by spinning up more containers to handle all the new requests. This causes a whole different sequence of cold starts which has nothing to do with left idle resources for too long.

Optimizing the right resource

Okay, so we have listed quite a few problems, now let's solve a few of them. We hope that combining the following strategies can help you achieve a desirable balance between responsive applications and lower bills. After all, if your system becomes sluggish, nobody will use it even if it is cheap.

1. Increasing Memory Allocation

Increasing the memory allocation is accompanied by an implicit increase in CPU allocation. This may very well result in faster execution of your function. Reducing actual execution time to accommodate the latencies caused by cold starts can directly improve the user experience. Moreover, this is the easiest hypothesis to test in higher latencies---the first line of defense for your DevOps team.

AWS allocates more CPU power, if you allocate more memory to your Lambda function.

2. Keeping Functions Warm

Another way to tackle the latency issue is to buy provisioned concurrency for your Lambda functions. Provisioned concurrency asks the Lambda service to keep a specified number of containers with your function warm, so they can respond to events quicker even if they haven't been called for a long period of time. When the real-world workload comes in, hundreds of tiny little Lambda instances are prepared for the mighty battle.

Dashbird can help you see patterns hiding in users' behavior. Use it to time your warm up calls. Making numerous requests is often cheaper than increasing memory.

3. Optimize your code

Possibly the cruelest option would be to ask your developers to optimize their code. This doesn't necessarily involve making changes in the actual logic or functionality, but one can often trim down the dependencies and make the function a bit more lightweight.

It's also a good idea to keep as much work outside your function's body as possible. For example, setting up API clients or calculating fixed values from environment variables. This way, the work only slows down the cold start, but not the subsequent calls to your function.

If possible, you should stick to languages like Node.js and Python since they have significantly lower cold start times than their C# or Java counterparts.

You can also try out custom runtimes. If you have very computation-heavy tasks, using a Rust runtime and implementing your function in Rust can save you quite some money. Especially since Lambda functions are billed per millisecond, functions with a high invocation frequency can profit tremendously from such a refactor.

Wrapping up

To make any of the above inferences plausible, you need to know the specific pathology of your misbehaving and cost-inefficient function. This is then followed by asking the right questions, making an educated guess about the solution, then optimizing AWS Lambdas.

Dashbird helps you every step of the way, from keeping track of subtleties like cold starts to knowing whether or not a new solution has made any difference. Of course, there are more parameters to consider, like concurrency and synchrony. They would need a much deeper dive. More about that in one of my next articles!

Stay tuned, and sign up to monitor everything we have went over in this article.

Further reading:

Quick ways to cut cost on your AWS Lambda

Dashbird's Serverless Events Library: Lambda error messages and debugging

Navigating CloudWatch logs with Dashbird

Using Observability to Scale AWS Lambda [Webinar]