DEV Community: AJ Stuyvenberg

Ultimate guide to Secrets in Lambda

AJ Stuyvenberg — Sat, 30 Mar 2024 14:19:11 +0000

We all have secrets. Some are small secrets which we barely hide (sometimes I roll through stop signs on my bike). Others are so sensitive that we don't even want to think about them (serverless actually has servers).

Managing and securing secrets in your applications have similar dimensions! As a result, handling a random 3rd party API key is different from handling the root signing key for an operating system or nuclear launch codes.

This work is a fundamental requirement for any production-quality software system. Unfortunately, AWS doesn't make it easy to select a secrets management tool within their ecosystem. For Serverless developers, this is even more difficult! Lambda is simply one service in a constellation of multiple supporting services which you can use to control application secrets. This guide lays out the most common ways to store and manage secrets for Lambda, the performance impacts of each option, and a framework for considering your specific use cases.

Quick best practices primer

Plaintext secrets should NEVER be hardcoded in your application code or source control. Typically you want to follow the principle of least privilege and limit the access of any runtime secret to only the runtime environment (Lambda, in this case).

This means passing references or encrypted data to configuration files or infrastructure as code tools whenever possible. It also means that decrypting or fetching secrets from a secure storage system at runtime will be the most secure option. This post is geared to deploying your Lambda applications along this dimension.

Lambda Secret Options

Within Lambda, there are four major options for storing configuration parameters and secrets. They are:

Lambda Environment Variables
AWS Systems Manager Parameter Store (Formerly known as Simple Systems Manager, or SSM)
AWS Secrets Manager
AWS Key Management Service

This post will rate each option along the following dimensions:

Ease of use
Cost
Auditability
Rotation Simplicity
Capability

We'll also cover the AWS Lambda Parameter and Secret extension, which is used to retrieve secrets from both Parameter Store and Secrets Manager from within a Lambda function.

Then, we'll consider several example secrets with various blast radii, and decide which service best suits our needs.

Service breakdown Tl;dr

	Ease of Use	Cost	Auditability	Rotation Complexity	Capability
Environment Variables	Easiest	Free!	Poor	Requires UpdateFunctionConfiguration or deployment	Encrypted at rest Decrypted when getFunctionConfiguration called. Limited to 4KB total
Parameter Store Standard	Some assembly required	Free storage Free calls up to 40 calls/second. $0.05/10,000 calls after	Good	Easy manual rotation, not automatic	4KB size limit
Parameter Store Advanced	Some assembly required	$0.05 per month per secret. $0.05/10,000 calls	Good	Easy manual rotation, not automatic	Supports TTL for secrets. 8KB size limit
Secrets Manager	Some assembly required	$0.40 per secret per month $0.05/10,000 calls. 30 day free tier.	Good	Easiest & Automatic Built into the product	Largest binary size, 65KB per secret
Key Management Service (KMS)	Most work	$1 per key per month $0.03/10,000 requests	Good	Depends on ciphertext storage. Easy with DynamoDB/S3, more manual with env vars.	Most flexible option. 4KB per `encrypt` operation. Binary size is limited by storage mechanism. Roll your own Secrets Manager or Parameter Store.

Lambda Environment Variables

Environment variables in Lambda are where most folks start out in their journey. They're baked right in, and can be fetched easily (using something like process.env.MY_SECRET for Node or os.environ.get('MY_SECRET') for Python). Unfortunately they are not the most secure option.

However one common misconception is that environment variables are stored as plain text by AWS Lambda. This is false.

Lambda environment variables are encrypted at rest, and only decrypted when the Lambda Function initializes, or you take an action resulting in a call to GetFunctionConfiguration. This includes visiting the Environment Variables section of the Lambda page in the AWS Console. It startles some people to see their secrets on this page, but you can easily prevent this by denying lambda:GetFunctionConfiguration, or kms:Decrypt permissions from your AWS console user.

Auditability is another challenge of Lambda environment variables. For the principle of least privilege to be effective, we should limit access to secrets only to when they are needed. To ensure this is followed, or investigate and remediate a leaked secret, we need to know which Lambda function used a specific secret and at what time.

Environment variables are automatically decrypted and injected into every function sandbox upon initialization. Given that CloudTrail reflects one call to kms:Decrypt, I presume the entire 4KB environment variable package is encrypted together. This means you lack the ability to audit an individual secret - it's all or nothing.

If you're in a regulated environment, or otherwise distrust Amazon; you can create a Consumer-Managed Key (CMK) and use that to encrypt your environment variables instead.

It's important to note that when you update environment variables, you will trigger a cold start (as long as you're using the $LATEST function alias). Your function sandbox is automatically shut down permanently. Then when a new request arrives, you will experience a cold start and that sandbox will pull the latest environment variables into scope.

Environment variables are also the best-performing option. Systems Manager Parameter Store, Secrets Manager, Lambda environment variables, and KMS all fundamentally rely on KMS and thus a call to kms:Decrypt at some point.

Lambda Function environment variables add around 25ms to your cold start duration, according to an article David Behroozi just wrote. These calls are logged in CloudTrail whenever your function starts.

However, purely storing secrets as environment variables is not the most secure option. Although they are encrypted at rest, environment variables and lambda:GetFunctionConfiguration permissions are treated by Lambda as part of the ReadOnly policy used by AWS internally, auditors, and cloud security SaaS products. This broadens your risk for a vendor or 3rd party auditor becoming compromised and leaking your secrets.

One risk is that you may accidentally leak a secret when sharing your screen while viewing or modifying a Lambda environment variable. It's unfortunate that AWS automatically decrypts and displays these values in plain text. AWS has no excuse for this, and should absolutely hide environment variable values unless toggled on, which is how Parameter Store and Secrets Manager both work.

Furthermore, CloudFormation treats environment variables as regular parts of a template, so they are available when looking at the full template or historical templates for a given stack. Additionally, AWS does not recommend storing anything secret in an environment variable.

You can improve that somewhat for no (or little) cost using a pattern I lay out further on. Before we get there, you should be familiar with the first-class products AWS offers to store your secrets.

AWS Systems Manager Parameter Store

The title is a mouthful, and the service is equally Byzantine. It includes features for managing nodes, patching systems, handling feature flags, and so much more. Earlier it was called the Simple Systems Manager, however it's truly anything but simple.

Today we'll focus only on Lambda and exclusively on the Parameter Store feature which allows us to store a plaintext or secure string either as a simple value or structured item.

You always want to use SecureString for secrets.

Parameter Store offers the choice between Standard and Advanced Parameters. Standard Parameters are free to store, Advanced Parameters incur a $0.05 per month per parameter charge.

Standard parameters are limited to 4KB in size (each), with 10,000 total per region. Advanced Parameters have higher limits of 8KB per item and 100,000 total per region. They come with the bonus of attaching Parameter Policies, which are effectively TTLs for a given parameter.

Standard Parameters are free up to 40 requests per second (for all values stored in Parameter Store). Beyond that, the cost is $0.05 per 10,000 Parameter Store API Interactions. Advanced Parameters are always billed at $0.05/10,000 requests. Fetching each parameter counts as an interaction, so 10 parameters triggers 10 interactions. Parameters are individually versioned, and you can fetch by version or $LATEST.

Historically one major advantage of Secrets Manager over Parameter Store is the ability to share secrets across AWS accounts using a resource-based policy. This is now supported by Parameter Store for Advanced Parameters as well.

Finally, individual Parameter calls are auditable in CloudTrail so you can prove who accessed a Parameter and when.

Performance

For a new TCP connection, Parameter Store fetched a parameter in around 217ms, including 99ms to set up the connection itself:

With an existing connection, fetching the parameter took around 39.3ms:

AWS Secrets Manager

Secrets Manager is purpose-built for encrypting and storing secrets for your application. It also has the largest cost at $0.40 per secret per month. This cost is multiplied by the number of regions you choose to replicate each secret to, so this can add up quickly. Fetching a secret costs $0.05 per 10,000 API calls, and there is a free 30-day trial.

The big features you'll gain over Parameter Store are the ability to automatically replicate secrets across regions, automatically (or manually) rotate secrets. This feature often satisfies requirements for applications subject to regulations like PCI-DSS or HIPAA. If these are must-have features for your application, it makes sense to use Secrets Manager.

Secret values can be up to 65KB in size, which is far larger than environment variables or Parameter Store. Like Parameter Store, calls for GetSecretValue are logged in CloudTrail. The big advantage Secrets have over Parameter Store is the ability to simply rotate or change a secret everywhere it's used. You can do this on a schedule if you're in an environment which demands this, or ad-hoc.

Performance

Similar to Parameter Store, it takes Secrets Manager a bit to warm up. 177ms was the duration to create this TCP connection and make the request:

With a warm connection, fetching a secret from Secrets Manager took only 29.4ms:

Key Management Service

AWS Key Management Service (KMS) is the system which underpins all of these other services. If you look carefully at either the documentation or CloudTrail logs, you'll see KMS!

KMS allows us to create an encryption key, securely store it within AWS, and then use IAM to grant access to resource-based policies used by Lambda to decrypt the ciphertext when your function runs. Instead of passing around a reference to a secret, you'll need to pass your Lambda function the encrypted ciphertext.

Storing and fetching the ciphertext can be implemented many ways, and should generally track the size of the encrypted blob. Small strings can be easily encrypted and stored as environment variables. If you need to share the same secret, you can store the ciphertext in DynamoDB. For large shared secrets, ciphertexts can be stored in S3.

Most often these secrets are decrypted during the initialization phase of a Lambda function. Fun fact, you don't need to store or pass the ID of the key used to encrypt data. That key ID is encoded right along with the encrypted data in the ciphertext! Simply call kms:Decrypt on the blob, and KMS takes care of the rest. Neat!

KMS bills $1 per key per month. There is no charge for the keys created and used by Parameter Store, Secrets Manager, or AWS Lambda. You're also charged $0.03 per 10,000 requests to kms:Decrypt (or other API actions). These calls are individually auditable in CloudTrail.

You'll have to implement rotation yourself, but if you store ciphertexts in DynamoDB, this can be relatively straightforward and cheaper than either Parameter Store or Secrets Manager, especially if you want to distribute a secret across multiple regions.

I see KMS used most frequently to encrypt slowly changing items like certificates, .PEM files, or to securely store signing keys.

Performance

Decrypting one small (~200b) ciphertext with KMS is notably faster than Parameter Store or Secrets Manager. This request took 64.4ms, including creating the TCP connection:

With a warm connection, KMS decrypted my secret in a blistering 6.45ms:

Presumably a big advantage here is that my ciphertext was already present in Lambda (as an environment variable) and didn't need to be fetched from a remote datastore call. KMS merely needed to decrypt the ciphertext and return!

AWS Parameter and Secrets Lambda Extension

To more easily use either Parameter Store or Secrets Manager in Lambda, AWS has published a Lambda extension which handles API calls to the underlying services for you, along with caching and refreshing secrets. You can tune these parameters to your liking as well.

Your function interacts with this extension via a lightweight API running on localhost. It's reasonably well designed, although I find it a bit clumsy overall. This really feels like the type of feature Lambda should implement themselves, and then magically make secrets appear in your function runtime. In contrast, ECS has this behavior built in and I find the experience far superior compared to Lambda.

Furthermore, this extension isn't open source. Because extensions are indistinguishable from your own function code, it leaves a bit of a foul taste in my mouth that I'm completely blessing a random extension with carte-blanche access to both my function code and secrets.

I'm of the firm opinion that we as users shouldn't seriously consider any Lambda Extension unless the code is open source (and can be built/published to my own account if I choose). If AWS changes this behavior, I'll happily update the post.

For these reasons, I prefer interacting with the Parameter Store or Secrets Manager APIs instead, using the aws-sdk. The (excellent) AWS Lambda PowerTools project also supports fetching parameters from multiple sources and is absolutely worth considering.

Now let's consider three example secrets. We'll look at the attack vectors, the blast radius for a leak/compromise, and identify the best cost/benefit solution for each.

Patterns and Practices

Safely securing environment variables

The biggest issue storing sensitive data in environment variables isn't Lambda itself - it's CloudFormation (and your CI pipeline)! When your stack is created or updated, those environment variables are plaintext values in the CloudFormation stack template. Templates are also stored and retrievable in the CloudFormation UI.

To avoid using sensitive information in your CloudFormation Template but avoid the cost overhead of Parameter Store being used at function runtime, you can adopt the following strategy:

Store your secrets as SecureStrings in Systems Manager Parameter Store.
Use CloudFormation dynamic references to pass a reference to your secret to CloudFormation.

Now your secret will land safely encrypted at rest in a Lambda environment variable, and never be visible in CloudFormation.

Standard Parameters are free to store and free to use under 40 req/s, if you're only fetching secrets at deploy time via CloudFormation references, you'll likely never receive a bill for these secrets.

The downside is that your secrets are still viewable in the Lambda Console via lambda:GetFunctionConfiguration, and if you update your secret in Parameter Store, it won't be updated in Lambda until you redeploy your functions.

Envelope Encryption

Consider a case where you may have ~100kb of secrets to store. A handful of signing keys, a couple tokens, maybe an mTLS certificate. Here's where you can use a technique called envelope encryption to secure your data.

Create a KMS key
Generate 256-bit AES key for each customer, application, or secrets payload
Encrypt all of your secrets with the AES key. This is the "envelope"
Include the encrypted secrets in your function zip.
Finally, encrypt the AES key with your KMS key and pass the encrypted key to your function in an environment variable.

You've just encrypted an envelope, and passed the encrypted key to your Lambda Function securely! This also helps save money on KMS keys, as you can re-use one KMS key for multiple AES keys. This pattern is also useful if you need to secure keys for customers in a multi-tenant environment, but laying that out is beyond the scope of this post.

Sensitive Data Exercise

We've covered the fundamental building blocks for securing sensitive information within AWS and using it within Lambda. We've also composed a few patterns you can use to reduce costs or handle specific use cases.

Now, let's consider 4 common secrets used in Lambda and think about how best to secure them.

Telemetry API Key

First up is a telemetry API key. Consider an ELK stack, or any provider you prefer. These keys are free to create, so it's best to create one key per application to limit blast radius and, as a bonus - better track costs. Telemetry keys are also usually write-only. Leaking this key can only cause an attacker to send additional data to the API.

With this in mind, environment variables are likely a good enough option here. They have minimal performance overhead, no cost, and minimal blast radius.

Keys can be easily created for exactly one Lambda function, or CloudFormation stack. If someone peers over your shoulder at a coffee shop, or inadvertently leaks the environment variable - it's simple to change with a few clicks and a re-deploy.

You can also use dynamic references and limit the read permissions for console users or 3rd party roles to further prevent access.

Using a SecureString with Parameter Store would also be a good option as it would likely be free - especially if your application doesn't have any users.

In this case, the blast-radius is small, the rotation complexity is easy, and a key encrypted at rest is likely more than suitable for our use case.

Database Username and Password

Your RDBMs may only allow one username and password string, to be shared across all applications - or maybe you just need to share a secret for the sake of simplicity. If you're not using a stateful connection pooler (like pgbouncer), you may need to share this secret with all your functions.

Here's where Parameter Store is probably also a great fit. If you ever have to change it, your functions can reference an unversioned Parameter and get the latest key. For one key, it's pretty affordable. However this math changes if you have a larger bundle of secrets, which exceed the 4KB or 8KB size limits of Parameter Store.

GitHub Application Private Key

For our second example, consider building and deploying a GitHub Application. Authenticating as a GitHub Application is not quite as simple as a 128bit UUID.

Instead, you must download and save an application key in PEM format. These keys can be a bit large, around ~2KB which may push you close to the 4KB environment variable limit.

You can create multiple keys for the same application at no cost, so deploying one key per stack is still tenable.

If the key were to be leaked, someone could conceivably authenticate as your application and access ANY of the repositories your application is installed into (with whatever permissions your application is configured to use). This is risky!

In this case, you'd probably want to use something like Parameter Store if you choose to create multiple keys and rotate them yourself. You'll help avert the size limit for Lambda environment variables, but it won't be too costly.

If you're dealing with a larger key but don't want to eat the cost of Secrets Manager, KMS or DynamoDB can make sense as well.

I'd be remiss if I didn't mention that like Lambda environment variables, DynamoDB records are also encrypted at rest, optionally with your own consumer-managed key. I assume this is mostly at the hardware (disk) level, so data in memory may not be encrypted. But generally if you're also concerned with someone peeking over your shoulder as you browse DynamoDB items in the AWS console, you could also encrypt them with your own key.

PCI-DSS or HIPAA credential rotation

If you're in a regulated environment with mandated credential rotation, Secrets Manager makes this so easy. As this post has mentioned several times, it's certainly possible to build this yourself. However - it's often worth the cost of $0.40 per secret to have the peace of mind that Secrets Manager will automatically rotate your secrets on a regular cadence. Your auditor will thank you as well.

Wrapping up

My hot take after writing this guide is that Lambda environment variables are generally fine for a one-off API key with a small blast radius. They're fast, free, and easy to use.

For secrets with larger blast radii, use SecureStrings from Parameter Store. If you're working in a regulated environment or you'd like to regularly rotate a secret, it's probably easiest to use Secrets Manager.

Reach for KMS and another storage mechanism if your use case doesn't quite fit into these boxes, or if doing so would be prohibitively expensive.

Ultimately security is a balancing act. I realize best practices are all about limiting risks at every turn, but it still feels wrong to crow about environment variables when so many developers run around with Administrator IAM roles (and can easily read any secret anyway).

At the same time, AWS should do more to restrict the values of environment variables to a permission more restricted than lambda:getFunctionConfiguration.

This post would not exist without David Behroozi challenging me to finish it, and helping out with his CloudTrail digging. You should follow him on twitter. Thanks, David!

Nick Frichette, Alex DeBrie, and Aidan Steele also helped review this, thanks friends!

If you like this type of content please subscribe to my blog or follow me on twitter and send me any questions or comments. You can also ask me questions directly if I'm streaming on Twitch or YouTube.

How Lambda starts containers 15x faster (deep dive)

AJ Stuyvenberg — Tue, 23 Jan 2024 20:26:53 +0000

This is also available as a YouTube video!

In the first post of this series, we demonstrated that container-based Lambda functions can initialize as fast or faster than zip-based functions. This is counterintuitive as zip-based functions are usually much smaller (up to 250mb), while container images typically contain far more data and are supported up 10gb in size. So how is this technically possible?

"On demand container loading on AWS Lambda" was published on May 23rd, 2023 by Marc Brooker et al. I suggest you read the full paper, as it's quite approachable and extremely interesting, but I'll break it down here.

The key to this performance improvement can be summarized in four steps, all performed during function creation.

Deterministically serialize container layers (which are tar.gz files) onto an ext4 file system
Divide filesystem into 512kb chunks
Encrypt each chunk
Cache the chunks and share them across all customers

With these chunks stored and shared safely in a multi-tier cache, they can be fetched more quicky during function cold start.

But how can one safely encrypt, cache, and share actual bits of a container image between users?!

Container images are sparse

One interesting fact about container images is that they're an objectively inefficient method for distributing software applications. It's true!

Container images are sparse blobs, with only a fraction of the contained bytes required to actually run the packaged application. Harter et al found that only 6.5% of bytes on average were needed at startup.

When we consider a collection of container images, the frequency and quantity of similar bytes is very high between images. This means there are lots of duplicated bytes copied over the wire every time you push or pull an image!

This is attributed to the fact that container images include a ton of stuff that doesn't vary between us as users. These are things like the kernel, the operating system, system libraries like libc or curl, and runtimes like the jvm, python, or nodejs.

Not to mention all of the code in your app which you copied from Chat GPT (like everyone else).

The reality is that we're all shipping ~80% of the same code.

Deterministic serialization onto ext4

Container images are stacks of tarballs, layered on top of each other to form a filesystem like the one on your own computer. This process is typically done at container runtime, using a storage driver like overlayfs.

In a typical filesystem, this process of copying files from the tar.gz file to the filesystem's underlying block device is nondeterministic. Files always land in the same directory, but those locations on disk may land on different parts of the block device over the course of multiple instantiations of the container.
This is a concurrency-based performance optimization used by filesystems, which introduces nondeterminism.

In order to de-duplicate and cache function container images, Lambda also needs a filesystem. This process is done when a function is created or updated. But for Lambda to efficiently cache chunks of a function container image, this process needed to be deterministic. So they made filesystem creation a serial operation, and thus the creation of Lambda filesystem blocks are deterministic.

Filesystem chunking

Now that each byte of a container image will land in the same block each time a function is created, Lambda can divide the blocks into 512kb chunks. They specifically call out that larger chunks reduce metadata duplication, and smaller chunks lead to better deduplication and thus cache hit rate, so they expect this exact value to change over time.

The next two steps are the most important.

Convergent encryption

Lambda code is considered unsafe, as any customer can upload anything they want. But then how can AWS deduplicate and share chunks of function code between customers?
The answer is something called Convergent Encryption, which sounds scarier than it is:

Hash each 512kb chunk, and from that, derive an encryption key.
Encrypt each block with the derived key.
Create a manifest file containing a SHA256 hash of each chunk, the key, and file offset for the chunk.
Encrypt the keys list in the manifest file using a per-customer key managed by KMS.

These chunks are then de-duplicated and stored in a s3 when a Lambda function is created.

Now that each block is hashed and encrypted, they can be efficiently de-duplicated and shared across customers. The manifest and chunk key list are decrypted by the Lambda worker during a cold start, and only chunks matching those keys are downloaded and decrypted.
This is safe because for any customer's manifest to contain a chunk hash (and the key derived from it) in the manifest file, that customer's function must have created and sent that chunk of bytes to Lambda.

Put another way, all users with an identical chunk of bytes also all share the identical key.

This is key to sharing chunks of container images without trust. Now if you and I both run a node20.x container on Lambda, the bytes for nodejs itself (and it's dependencies like libuv) can be shared, so they may already be on the worker before my function runs or is even created!

Multi-tiered cache strategy

The last component to this performance improvement is creating a multi-tiered cache. Tier three is the source cache, and lives in an S3 bucket controlled by AWS.

The second tier is an AZ-level cache, which is replicated and separated into an in-memory system for hot data, and flash storage for colder chunks.
Fun fact - to reduce p99 outliers, this cache data is stored using erasure coding in a 4-of-5 code strategy. This is the same sharding technique used in s3.

This allows workers to make redundant requests to this cache while fetching chunks, and abandon the slowest request as soon as 4 of the 5 chunks return. This is a common pattern, which AWS also uses when fetching zip-based Lambda function code from s3 (among many other applications).

Finally the tier-one cache lives on each Lambda worker and is entirely in-memory. This is the fastest cache, and most performant to read from when initializing a new Lambda function.

In a given week, 67% of chunks were served from on-worker caches!

Putting it together

During a cold start, these chunk IDs are looked up using the manifest, and then fetched from the cache(s) and decrypted. The Lambda worker reassembles the chunks and then the function initialization begins. It doesn't matter who uploaded the chunk, they're all shared safely!

Crazy stat

This leads to a staggering statistic. If (after subscribing and sharing this post), you close this page and create a brand new container-based Lambda function right now, there is an 80% chance that new container image will contain zero unique bytes compared to what Lambda already has seen.

AWS has seen the code and dependencies you are likely to deploy before you have even deployed it.

Wrapping up

The whole paper is excellent and includes many other interesting topics like cache eviction, and how this was implemented (in Rust!), so I suggest you read the full paper to learn more. The Lambda team even had to contend with some cache fragements being too popular, so they had to salt the chunk hashes!

It's interesting to me that the Fargate team went a totally different direction here with SOCI. My understanding is that SOCI is less effective for images smaller than 1GB, so I'd be curious if some lessons from this paper could further improve Fargate launches.

At the same time, I'm curious if this type of multi-tenant cache would make sense to improve launch performance of something like GCP Cloud Run, or Azure Container Instances.

If you like this type of content please subscribe to my blog or reach out on twitter with any questions. You can also ask me questions directly if I'm streaming on Twitch or YouTube.

The case for containers on Lambda (with benchmarks)

AJ Stuyvenberg — Tue, 02 Jan 2024 14:45:03 +0000

When AWS Lambda first introduced support for container-based functions, the initial reactions from the community were mostly negative. Lambda isn't meant to run large applications, it is meant to run small bits of code, scaled widely by executing many functions simultaneously.

Containers were not only antithetical to the philosophy of Lambda and the serverless mindset writ large, they were also far slower to initialize (or cold start) compared with their zip-based function counterparts.

If we're being honest, I think the biggest roadblock to adoption was the cold start performance penalty associated with using containers. That penalty has now all but evaporated.

The AWS Lambda team put in tremendous amounts of work and improved the cold-start times by a shocking 15x, according to the paper and talk given by Marc Brooker.

This post focuses on analyzing the performance of container-based Lambda functions with simple, reproducible tests. It also lays out the pros and cons for containers on Lambda. The next post will delve into how the Lambda team pulled off this performance win.

Performance Tests

I set off to test this new container image strategy by creating several identical functions across zip and container-based packaging schemes. These varied from 0mb of additional dependencies, up to the 250mb limit of zip-based Lambda functions. I'm not directly comparing the size of the final image with the size of the zip file, because containers include an OS and system libraries, so they are natively much larger than zip files.

As usual, I'm testing the round trip request time for a cold start from within the same region. I'm not using init duration, which does not include the time to load bytes into the function sandbox.

I created a cold start by updating the function configuration (setting a new environment variable), and then sending a simple test request. The code for this project is open source. I also streamed this entire process live on twitch.

These results were based on the p99 response time, but I've included the p50 times for python below.

This first test contains a set of NodeJS functions running Node18.x. After several days and thousands of invocations, we see the final result. The top row represents zip-based Lambda functions, and the bottom row reports container-based Lambda functions (lower is better):

An earlier version of this post reversed the rows. I've changed this to be consistent with the python result format. Thanks to those who corrected me!

It's easier to read a bar chart:

The second test was similar and performed with Python functions running Python 3.11. We see a very similar pattern, with slightly more variance and overlap on the lower end of function sizes. Here is the p99:

and here is the p50:

Here it is in chart form, once again looking at p99 over a week:

We can see the closer variance at the 100mb and 150mb marks. For the 150mb test I was using Pandas, Flask, and PsycoPG as dependencies. I'm not familiar with the internals of these libraries, so I don't want to speculate on why these results are slightly unexpected.

My simplest answer is that this is a "real world" test using real dependencies. On top of a managed service like Lambda as well as some amount of network latency in a shared multi-tenant system - many variables could be confounding here.

Performance Takeaways

For NodeJS, beyond ~30mb, container images outperform zip based Lambda functions in cold start performance.

For Python, container images vastly outperform zip based Lambda functions beyond 200mb in size.

This result is incredible, because Lambda container images (in total) are much much larger than the comparative zip files.

I want to stress that the size of dependencies is only one factor that plays into cold starts. Besides size, other factors impact static initialization time including:

Size and number of heap allocations
Computations performed during init
Network requests made during init

These nuances are covered in my talk at AWS re:Invent if you want to dig deeper on the topic of cold starts.
All of these individual projects are available on GitHub.

Should you use containers on Lambda?

I am not advocating that you choose containers as a packaging mechanism for your Lambda function based solely on cold start performance.

That said, you should be using containers on Lambda anyway. With these cold start performance improvements, there are very few reasons not to.

While it's technically true that container images are objectively less efficient means of deploying software applications, container images should be the standard for Lambda functions going forward.

Pros:

Containers are ubiquitous in software development, and so many tools and developer workflows already revolve around them. It's easy to find and hire developers who already know how to use containers.
Multi-stage builds are clear and easy to understand, allowing you easily create the lightest and smallest image possible.
Graviton on Lambda is quickly becoming the preferred architecture, and container images make x86/ARM cross-compilation easy. This is even more relevant now, as Apple silicon becomes a popular choice for developers.
Base images for Lambda are updated frequently, and it's easy enough to auto-deploy the latest image version containing security updates
Containers allow support larger functions, up to 10gb
You can use custom runtimes like Bun, Deno, as well as use new runtime versions more easily
Using the excellent Lambda web adapter extension with a container, you can very easily move a function from Lambda to Fargate or Apprunner if cost becomes an issue. This optionality is of high value, and shouldn't be overlooked.
AWS and the broader software development community continues to invest heavily in the container image standard. These improvements to Lambda represent the result of this investment, and I expect that to continue.

Cons:

To update dependencies managed by Lambda runtimes, you'll need to re-build your container image and re-deploy your function occasionally. This is something dependabot can easily do, but it could be painful if you have thousands of functions. These updates come free with managed runtimes anyway.
You do pay for the init duration. Today, Lambda documentation claims that init duration is always billed, but in practice we see that init duration for managed runtimes is not included in the billed duration, logged in the REPORT log line at the end of every execution.
Slower deployment speeds
The very first cold start for a new function or function update seems to be quite slow (p99 ~5+ seconds for a large function). This makes the iterate + test loop feel slow. In any production environment, this should be mitigated by invoking an alias (other than $LATEST). In practice I've noticed this goes away if I wait a bit between deployment and invocation. This isn't great and ideally the Lambda team fixes it soon, but in production it shouldn't be a problem.

If all of your functions are under 30mb and you're team is comfortable with zip files, then it may be worth continuing with zip files.
For me personally, all new Lambda-backed APIs I create are based on container images using the Lambda web adapter.

Ultimately your team and anyone you hire likely already knows how to use containers. Containers start as fast or faster than zip functions, have more powerful build configurations, and more easily support existing workflows. Finally, containers make it easy to optionally move your application to something like Fargate or AppRunner if costs become a primary concern.

It's time to use containers on Lambda.

Thanks for reading!

The next post in this series explores how this performance improvement was designed. It's an example of excellent systems engineering work, and it represents why I'm so bullish on serverless in the long term.

If you like this type of content please subscribe to my blog or reach out on twitter with any questions. You can also ask me questions directly if I'm streaming on Twitch or YouTube.

Stop using Lambda Layers (use this instead)

AJ Stuyvenberg — Wed, 08 Nov 2023 13:38:26 +0000

This post is also available on YouTube:

Lambda layers are a special packaging mechanism provided by AWS Lambda to manage dependencies for zip-based Lambda functions. Layers themselves are nothing more than a sparkling zip file, but they have a few interesting properties which prove useful in some cases. Unfortunately Lambda layers are also difficult to work with as a developer, tricky to deploy safely, and typically don't offer benefits over native package managers. These downsides frequently outweigh the upsides, and we'll examine both in detail.

By the end of this post, you'll understand the pitfalls of general Lambda layer use as well as the niche cases where layers may make sense.

Busting Lambda layer Myths

When I ask developers why they are using Lambda layers I often learn the underlying reasons are misguided. It's not their fault entirely, the documentation makes some imprecise claims which may perpetuate these myths.

Lambda layers do not circumvent the 250mb size limit

I frequently hear folks say they are leveraging Lambda layers to "raise the 250mb limit placed on zip-based Lambda functions". That's simply not true. The size of the unzipped function and all attached layers must be less than 250mb.

This misunderstanding springs from the very first point in the documentation which states that Lambda layers "reduce the size of your deployment packages". While technically it is true that the specific function code you deploy can be reduced with layers, the overall size of the function when it runs in Lambda does not change.

This leads me to my next point.

Lambda layers do not improve or reduce cold start initialization duration

Developers often mistake that a "reduced deployment package" size will reduce cold start latency. This is also untrue, as we already know that the code you load is the single largest contributor to cold start latency. Whether or not these bytes come from a layer or simply the function zip itself is irrelevant to the resulting initialization duration.

Development pain with Layers

One of the biggest challenges for developers leveraging Lambda layers is that they appear magically when a handler executes. While that feat is impressive technically, it poses an issue for developers as text editors and IDEs expect dependencies to be locally available, as do bundlers, test runners, and lint tools. If you run your function code locally or use an emulator, only a subset of those tools cooperate with layers. Although solving these issues is possible, external dependencies provided by Lambda layers require special consideration and handling for limited benefit.

Often, the process of building and deploying Layers separately is enough to avoid them, but there are other reasons to avoid Lambda layers.

Cross-architecture woes

We're writing software for a world which is increasingly powered by ARM chips. It may be your shiny new M3 laptop, or Amazon's own (admittedly excellent) Graviton processor. Your Lambda functions are likely running on x86 or a combination of ARM and x86 processors today.

Lambda layers do support metadata attributes called "supported runtimes" and "supported architectures", but these are merely labels. They don't prevent or enforce any runtime or deployment time compatibility. Imagine your surprise when you attach a binary compiled for x86 to your arm-based Lambda function and receive exec format errors!

I demonstrated this failure live

Deployment difficulties

Lambda layers do not support semantic versioning. Instead, they are immutable and versioned incrementally. While this does help prevent unintentional upgrading, incremental versioning offers no clues as to backwards compatibility or changes in the updated layer package. Additionally, Lambda layers are completely runtime agnostic and offer no manifest, lockfile, or packaging hints. Layers don't provide a package.json, pyproject.toml, or gemspec file to ensure adequate dependency resolution. Instead it's incumbant on the authors to only package compatible code.

One of the main selling points of Lambda layers is that they can share common dependencies between many functions, which is great if every function requires exactly the same compatible version of a dependency. But what happens when you want to upgrade a major version?

You'll need to release a new version of the layer with the new major version, ensure that no developer accidentally applies the incrementally-adjusted layer (remember – no semantic versioning, manifest files, or lockfiles!), and then simultaneously upgrade the Lambda function code and layer at the same time.

But even that doesn't work out automatically, as I've already documented. Deploying a function + layer results in two separate, asynchronous API calls. updateFunction updates the function code while updateFunctionConfiguration updates the configured layers, and both of these are separate control plane operations which can happen in parallel. This means that invoking $LATEST will fail until both calls complete. To avoid this you'll need to create a new function version, apply the new layer, and then update your integration (eg: ApiGateway) to point to the new alias, after both steps are complete.

Now semantic versioning is not perfect, and flexible specification (eg: ~ or ^ for relative versions) means that the combination of bits executing your Lambda function may run together for the very first time in a staging or production environment. This has caused enough issues that package managers have solutions like npm shrinkwrap, but this can be even worse with Lambda layers.

And that's the gist of my point – this is what your package manager should be doing.

Dependency collisions

Lambda layers can cause a particular nasty bug and it stems from how Lambda creates a filesystem from your deployment artifacts. If you've followed this blog, you know that zip archives themselves can already create interesting edge cases when unpacking a zip file onto a file system, and Lambda is not immune to that. When a Lambda function sandbox is created, the main function package is copied into the sandbox and then each layer is copied in order into the same filesystem directory. This means that layers containing files with the same path and filename are squashed.

Although Lambda handler code is copied into a different directory than layer code, the runtime will decide where to look first for dependencies. This is typically handled by the order of directories listed in the PATH environment variable, or the runtime-specific variant like NODE_PATH, Ruby's GEM_PATH, or Java's CLASS_PATH as documented here.

Consider a Lambda function and two layers which all depend on different versions of the same library. Layers don't provide lockfiles or content metadata, so as a developer you may not be aware of this dependency conflict at build time or deployment time.

At runtime, the layer code and function code are copied to their respective directories, but when the handler begins processing a request; it crashes with a syntax error! But your code ran fine locally?! What happened?

The code and dependencies in the Lambda layer expect to have access to version 2 of library ABC, but the runtime has already loaded version 1 of library ABC from the function zip file!

If this seems farfetched, it can happen to you – because it happened to me.

What Lambda layers can do for you

Lambda layers can improve function deployment speeds (but so can your CI pipeline)

Consider two Lambda functions of identical dependencies, one with using layers (A), and one without (B).
It's true that you can expect relatively shorter deployments for A, if you aren't also modifying and deploying the associated layer(s). However the vast majority of CI/CD pipelines support dependency caching, so most users have clear paths towards fast deployments regardless of their use of layers. Yes, your CloudFormation deployment will be a bit longer but ultimately there is not a distinct advantage here.

Lambda layers can share code across functions

Within the same region, one layer can be used across different Lambda functions. This admittedly can be super useful to share libraries for authentication or other cross-functional dependencies. This is especially useful if you (like me) need to share layers for other users, even publicly.

I don't really agree with the other two points in the documentation. Layers may "separate core function logic from dependencies", but only as much as putting that dependency in another file and importing it. Your runtime does this already so this point falls a bit flat.

Finally, I don't think it's best to edit your production Lambda function code live in the console editor, and I especially don't think you should modify your software development process to support this. (Cloud9 IDE is a good product, just don't use the version in the Lambda console.)

Where you should use Lambda layers

Lambda layers aren't all bad, they're a tool with some sharp edges (which AWS should fix!). There are a couple exceptions which you can and should use Lambda layers.

Shared binaries

If you have a commonly used binary like ffmpeg or sharp, it may be easier to compile those projects once and deploy them as a layer. It's handy to share them across functions, and this specific layer will rarely need to be rebuilt and updated. Layers are best with established binaries containing solid API contracts, so you won't need to deal with the deployment difficulties I listed earlier pertaining to major version upgrades.

Custom runtimes

The immensely popular Bref PHP runtime is available as a Layer. Bref is available precompiled for both arm and x86, so it can make sense to use as a layer. The same is true for the Bun javascript runtime. That being said - container images have become far more performant recently and are worth reconsidering, but that's a subject for another post.

Lambda Extensions

Extensions are a special type of Layer but have access to extra lifecycle events, async work, and post processing which regular Lambda handlers cannot access. Extensions can perform work asynchronously from the main handler function, and can execute code after the handler has returned a result to the caller. This makes Lambda Extensions a worthwhile exception to the above risks, especially if they are also pre-compiled, statically linked binary executables which won't suffer from dependency collisions.

Wrapping up

In specific cases it can be worthwhile to use Lambda layers. Specifically for Lambda extensions, or heavy compiled binaries. However Lambda layers should not replace the runtime-specific packaging and ecosystem you already have. Layers don't offer semantic versioning, make breaking changes difficult to synchronize, cause headaches during development, and leave your software susceptible to dependency collisions.

If or when AWS offered semantic versioning, support for layer lockfiles, and integration with native package managers, I'll happily reconsider these thoughts.

Use your package manager wherever you can, it's a more capable tool and already solves these issues for you.

If you like this type of content please subscribe to my blog or reach out on twitter with any questions. You can also ask me questions directly if I'm streaming on Twitch or YouTube.

Understanding AWS Lambda Proactive Initialization

AJ Stuyvenberg — Sun, 16 Jul 2023 18:52:21 +0000

This post was first published on my blog and shared on twitter, so if you like this post – please subscribe!

AWS Lambda can warm your functions (for free)

In March 2023, AWS updated the documentation for the Lambda Function Lifecycle, and included this interesting new statement:

"For functions using unreserved (on-demand) concurrency, Lambda may proactively initialize a function instance, even if there's no invocation."

It goes on to say:

"When this happens, you can observe an unexpected time gap between your function's initialization and invocation phases. This gap can appear similar to what you would observe when using provisioned concurrency."

This sentence, buried in the docs, indicates something not widely known about AWS Lambda; that AWS may warm your functions to reduce the impact and frequency of cold starts, even when used on-demand!

Today, July 13th - they clarified this further:
"For functions using unreserved (on-demand) concurrency, Lambda occasionally pre-initializes execution environments to reduce the number of cold start invocations. For example, Lambda might initialize a new execution environment to replace an execution environment that is about to be shut down. If a pre-initialized execution environment becomes available while Lambda is initializing a new execution environment to process an invocation, Lambda can use the pre-initialized execution environment."

This update is no accident. In fact it's the result of several months I spent working closely with the AWS Lambda service team:

1 - Execution environments (see 'Init Phase' section), and 2 - Invocation Initialization gap

In this post we'll define what a Proactively Initialized Lambda Sandbox is, how they differ from cold starts, and measure how frequently they occur.

Tracing Proactive Initialization

This adventure began when I noticed what appeared to be a bug in a distributed trace. The trace correctly measured the Lambda initialization phase, but appeared to show the first invocation occurring several minutes after initialization. This can happen with SnapStart, or Provisioned Concurrency - but this function wasn't using either of these capabilities and was otherwise entirely unremarkable.

Here's what the flamegraph looks like:

We can see a massive gap between function initialization and invocation - in this case the invocation request wasn't even made by the client until ~12 seconds after the sandbox was warmed up.

We've also observed cases where Initialization occurs several minutes before the first invocation, in this case the gap was nearly 6 minutes:

After much discussion with the AWS Lambda Service team - I learned that I was observing a Proactively Initialized Lambda Sandbox.

It's difficult to discuss Proactive Initialization at a technical level without first defining a cold start, so let's start there.

Defining a Cold Start

AWS Lambda defines a cold start in the documentation as the time taken to download your application code and start the application runtime.

Until now, it was understood that cold starts would happen for any function invocation where there is no idle, initialized sandbox ready to receive the request (absent using SnapStart or Provisioned Concurrency).

When a function invocation experiences a cold start, users experience something ranging from 100ms to several additional seconds of latency, and developers observe an Init Duration reported in the CloudWatch logs for the invocation.

With cold starts defined, let's expand this to understand the definition of Proactive Initialization.

Technical Definition of Proactive Initialization

Proactive Initialization occurs when a Lambda Function Sandbox is initialized without a pending Lambda invocation.

As a developer this is desirable, because each proactively initialized sandbox means one less painful cold start which otherwise a user would experience.

As a user of the application powered by Lambda, it's as if there were never any cold starts at all.

It's like getting Lambda Provisioned Concurrency - for free.

Aligned interests in the Shared Responsibility Model

According to the AWS Lambda service team, Proactive Initialization is the result of aligned interests by both the team running AWS Lambda and developers running applications on Lambda.

We know that from an economic perspective, AWS Lambda wants to run as many functions on the same server as possible (yes, serverless has servers...). We also know that developers want their cold starts to be as infrequent and fast as possible.

Understanding the fact that cold starts absorb valuable CPU time in a shared, multi-tenant system, (time which is currently not billed) it's clear that any option AWS has to minimize this time is mutually beneficial.

AWS Lambda is a distributed service. Worker fleets need to be redeployed, scaled out, scaled in, and respond to failures in the underlying hardware. After all - everything fails all the time.

This means that even with steady-state throughput, Lambda will need to rotate function sandboxes for users over the course of hours or days. AWS does not publish minimum or maximum lease durations for a function sandbox, although in practice I've observed ~7 minutes on the low side and several hours on the high side.

The service also needs to run efficiently, combining as many functions onto one machine as possible. In distributed systems parlace, this is known as bin packing (aka shoving as much stuff as possible into the same bucket).

The less time spent initializing functions which AWS knows will serve invocations, the better for everyone.

When Lambda will Proactively Initialize your function

On a call with the AWS Lambda Service Team, they confirmed some logical cases of Proactive Initialization - deployments and eager assignments.

Consider we're working with a function which at steady state experiences 100 concurrent invocations. When you deploy a change to your function (or function configuration), AWS can make a pretty reasonable guess that you'll continue to invoke that same function 100 times concurrently after the deployment finishes.

Instead of waiting for each invocation to trigger a cold start, AWS will automatically re-provision (roughly) 100 sandboxes to absorb that load when the deployment finishes. Some users will still experience the full cold start duration, but some won't (depending on the request duration and when requests arrive).

This can similarly occur when Lambda needs to rotate or roll out new Lambda Worker hosts.

These aren't novel optimizations in the realm of distributed systems, but this is the first time AWS has confirmed they make these optimizations.

Proactive Initialization due to Eager Assignments

In certain cases, Proactive Initialization is a consequence of natural traffic patterns in your application where an internal system called the AWS Lambda Placement Service will assign pending lambda invocation requests to sandboxes as they become available.

Here's how it works:

Consider a running Lambda function which is currently processing a request. In this case, only one sandbox is running. When a new request triggers a Lambda function, AWS's Lambda Control Plane will check for available warm sandboxes to run your request.

If none are available, a new sandbox is initialized by the Control Plane:

However it's possible that in this time that a warm sandbox completes a request and is ready to receive a new request.
In this case, Lambda will assign the request to the newly-free warm sandbox.

The new sandbox which was created now has no request to serve. It is still kept warm, and can serve new requests - but a user did not wait for the sandbox to warm up.

This is a proactive initialization.

When a new request arrives, it can be routed to this warm container with no delay!

Request B did spend some time waiting for a sandbox (but less than the full duration of a cold start). This latency is not reflected in the duration metric, which is why it’s important to monitor the end to end latency of any synchronous request through the calling service! (Like API Gateway)

Detecting Proactive Initializations

We can leverage the fact that AWS Lambda functions must initialize within 10 seconds, otherwise the Lambda runtime is re-initialized from scratch. Using this fact, we can safely infer that a Lambda Sandbox is proactively initialized when:

Greater than 10 seconds has passed between the earliest part of function initialization first invocation processed and
We're processing the first invocation for a sandbox.

Both of these are easily tested, here's the code for Node:

const coldStartSystemTime = new Date()
let functionDidColdStart = true

export async function handler(event, context) {
  if (functionDidColdStart) {
    const handlerWrappedTime = new Date()
    const proactiveInitialization = handlerWrappedTime - coldStartSystemTime > 10000 ? true : false
    console.log({proactiveInitialization})
    functionDidColdStart = false
  }
  return {
    statusCode: 200,
    body: JSON.stringify({success: true}) 
  }
}

and for Python:

import json
import time

init_time = time.time_ns() // 1_000_000
cold_start = True

def hello(event, context):
    global cold_start
    if cold_start:
        now = time.time_ns() // 1_000_000
        cold_start = False
        proactive_initialization = False
        if (now - init_time) > 10_000:
            proactive_initialization = True
            print(f'{{proactiveInitialization: {proactive_initialization}}}')
    body = {
        "message": "Go Serverless v1.0! Your function executed successfully!",
        "input": event
    }

    response = {
        "statusCode": 200,
        "body": json.dumps(body)
    }

    return response

Frequency of Proactive Initializations

At low throughput, there are virtually no proactive initializations for AWS Lambda functions. But I called this function over and over in an endless loop (thanks to AWS credits provided by the AWS Community Builder program), and noticed that almost 65% of my cold starts were actually proactive initializations, and did not contribute to user-facing latency.

Here's the query:

fields @timestamp, @message.proactiveInitialization
| filter proactiveInitialization == 0 or proactiveInitialization == 1
| stats count() by proactiveInitialization

Here's the detailed breakdown, note that each bar reflects the sum of initializations:

Running this query over several days across multiple runtimes and invocation methods, I observed between 50% and 75% of initializations were Proactive (versus 50% to 25% which were true Cold Starts):

We can see this reflected in the cumulative sum of invocations for a one day window. Here’s a python function invoked at a very high frequency:

We can see after one day, we’ve had 63 Proactively Initialized Lambda Sandboxes, with only 11 Cold Starts. 85% of initializations were proactive!

AWS Serverless Hero Ken Collins maintains a very popular Rails-Lambda package. After some discussion, he added the capability to track Proactive Initializations and came to a similar conclusion - in his case after a 3-day test using Ruby with a custom runtime, 80% of initializations were proactive:

Confirming what we suspected

This post confirms what we've all speculated but never knew with certainty - AWS Lambda is warming your functions. We've demonstrated how you can observe this behavior, and even spoken with the AWS Lambda service team to confirm some triggers for this warming.

But that begs the question - what should you do about AWS Lambda Proactive Initialization?

What you should do about Proactive Initialization

Nothing.

This is the fulfillment of the promise of Serverless in a big way. You'll get to focus on your own application while AWS improves the underlying infrastructure. Cold starts become something managed out by the cloud provider, and you never have to think about them.

We use Serverless services because we offload undifferentiated heavy lifting to cloud providers. Your autoscaling needs and my autoscaling needs probably aren't that similar, but workloads taken in aggregate with millions of functions across thousands of customers, AWS can predictively scale out functions and improve performance for everyone involved.

Wrapping it up

I hope you enjoyed this first look at Proactive Initialization, and learned a bit more about how to observe and understand your workloads on AWS Lambda. If you want to track metrics and/or APM traces for proactively initialized functions, it's available for anyone using Datadog.

This was also my first post as an AWS Serverless Hero!, so if you like this type of content please subscribe to my blog or reach out on twitter with any questions.

Thawing your Lambda Cold Starts with Lazy Loading

AJ Stuyvenberg — Tue, 30 May 2023 12:48:59 +0000

If you've heard anything about Serverless Applications or AWS Lambda Functions, you've certainly heard of the dreaded Cold Start. I've written a lot about Cold Starts, and I spend a great deal of time measuring them as I did for my post on Benchmarking the AWS SDK.

In this post we'll recap what a Cold Start is, then we'll define a technique called Lazy Loading, show you when and how to use it, and measure the outcome!

What is a Cold Start?

Lambda sandboxes are created on demand when a new request arrives, but live for multiple sequential invocations of a function. When an application experiences an increase in traffic, Lambda must create additional sandboxes.

The additional latency caused by this sandbox creation (which the user also experiences) is known as a Cold Start:

Sample App

This application is a Todo list, which is built for multiple tenants. This application is built using AWS Lambda, API Gateway, and DynamoDB.

One particular user (we can pick on me, AJ, in this case), demands that he is notified by SNS any time a new Todo item is added to his list.
The architecture of this application looks like this:

To view full-resolution flamegraphs, I suggest reading this post on my blog.

Eager Loading

Eager loading happens when you load a dependency by calling require, or import at the top of your function code.

Normally, dependencies in your function are Eager loaded - or loaded during initialization. For Node, Python, and Ruby runtimes - your dependencies are loaded when the runtime begins reading your handler files and processing each require or import in the order they are written. If you're writing Rust or Go, this is the default behavior as well because binaries are statically compiled into one file.

This code is very typical and you've probably seen it many times. At the top of the file, we load a DynamoDB client along with a SNS client, then we move on to process the payload:



'use strict';

const { DynamoDBClient } = require("@aws-sdk/client-dynamodb");
const { DynamoDBDocumentClient, PutCommand } = require("@aws-sdk/lib-dynamodb");
const dynamoClient = new DynamoDBClient({ region: process.env.AWS_REGION });
const ddbClient = DynamoDBDocumentClient.from(dynamoClient);

const { SNSClient, PublishBatchCommand } = require("@aws-sdk/client-sns");
const snsClient = new SNSClient({ region: process.env.AWS_REGION });
const { v4: uuidv4 } = require("uuid");

// handler code in gist

The full code is available here.

Eager Loading Cold Start

We can measure the duration of this Cold Start Trace and see that loading DynamoDB loads in around 360ms. The DynamoDB client also depends on the AWS STS client, which is true of SNS and most other services. The trace looks like this:

Further down the flamegraph we see SNS loads in another 50ms:

Lazy Loading to improve performance

If we have hundreds or thousands of users; AJ's todo items may represent only 5% or 1% of calls to this endpoint. However we load the SNS client on every single initialization, regardless of if we'll use SNS!

Let's fix this!

To improve this performance we can move our require statement into a method which we'll call only when a Todo item item from AJ is received. Don't worry that we reassign this variable - in NodeJS, calls to require are cached so this module load will only occur once on the first call to loadSns(). We could also check if the snsClient variable is nil before calling the method, but brevity is preferred here.

This strategy is also effective for Ruby and Python (as well as Java and other languages).



'use strict';

const { DynamoDBClient } = require("@aws-sdk/client-dynamodb");
const { DynamoDBDocumentClient, PutCommand } = require("@aws-sdk/lib-dynamodb");
const dynamoClient = new DynamoDBClient({ region: process.env.AWS_REGION });
const ddbClient = DynamoDBDocumentClient.from(dynamoClient);

let snsClient, PublishBatchCommand, SNSClient
const { v4: uuidv4 } = require("uuid");

const loadSns = () => {
  ({ SNSClient, PublishBatchCommand } = require("@aws-sdk/client-sns"));
  snsClient = new SNSClient({ region: process.env.AWS_REGION });
}

module.exports.addItem = async (event) => {
  const body = JSON.parse(event.body);
  const promises = []
  const newItemId = uuidv4();
  // It's for AJ - load the SNS client!
  if (body.userId === 'aj') {
    loadSns();
    // ... rest of handler code in gist

The full code is available here.

Lazy Loading means that we only load the SNS client when we need it - so let's take a look at the Cold Start Trace when a normal user creates a Todo item:

We can see that the handler loads in 401ms compared to the previous 478ms - that's a 16% decrease in latency for normal users experiencing a Cold Start!

So what happens when a Todo item is created for AJ? You can see that the ~80ms is shifted to the AWS Lambda Handler function span, where AJ has to wait for the SNS client to load:

Wrapping up

Keen observers would point out that the init portion of a Lambda execution lifecycle is free. And they're right! For now. AWS doesn't promise that the init duration is free (although this is widely observed and has been for some time).

Cost in dollars shouldn't really be a factor here, as the overall number of cold starts is limited and shifting this dependency to the user with a special case is worth saving everyone other use the initialization time.

This technique is especially applicable to mono-lambda APIs where dependencies can vary by route, or specific users like in this simple example. I'd also make a strong case that this type of atypical behavior ought to be refactored out into a separate Lambda Function, but that will be a topic for a different day.

As you embark on your Serverless journey, keep an eye out for opportunities to be lazy!

Hopefully you enjoyed this post. If you're interested in other Serverless minutia, be sure to check out the rest of my blog, where this article was first published, and my twitter feed!

Introducing AWS Lambda Response Streaming

AJ Stuyvenberg — Fri, 07 Apr 2023 22:23:49 +0000

Today, AWS has announced support for Streaming Responses from Lambda Functions. This long-awaited capability helps developers stream responses from their functions to their users without necessarily waiting for the entire response to be finished. It's especially useful for server-side rendering, commonly used by modern javascript frameworks.

This capability reduces Time to First Byte, which makes your application feel snappier, and load more quickly - especially for users who are geographically far from the AWS datacenter you’re using, or users with poor connections.

Let's dive in.

Enabling

To enable Streaming Responses, developers will have to modify their function code slightly. Your handler will need to use a new decorator available in the Lambda runtime for Node 14, 16, or 18, which wraps your handler. This decorator is injected directly in the runtime, so you don't need to import any packages. A user extracted the method from the base lambda image quite some time ago, so this launch has clearly been planned for a while.

Here's an example from the launch post:

exports.handler = awslambda.streamifyResponse(
    async (event, responseStream, context) => {
        responseStream.setContentType(“text/plain”);
        responseStream.write(“Hello, world!”);
        responseStream.end();
    }
);

If you're familiar with Node's writable stream API, then you'll recognize that this decorator implements one. AWS suggests you use stream pipelines to write to the stream - again, here's the example from the launch post:

const pipeline = require("util").promisify(require("stream").pipeline);
const zlib = require('zlib');
const { Readable } = require('stream');

exports.gzip = awslambda.streamifyResponse(async (event, responseStream, _context) => {
    // As an example, convert event to a readable stream.
    const requestStream = Readable.from(Buffer.from(JSON.stringify(event)));

    await pipeline(requestStream, zlib.createGzip(), responseStream);
});

Apart from something like server-side HTML rendering, this feature also helps transmit media back to API callers. Here's an example of a Lambda function rendering an image, using response streaming:

/**
 * Response streaming function which loads a large image.
 */
module.exports.handler = awslambda.streamifyResponse(
  async (event, responseStream, _context) => {
    responseStream.setContentType("image/jpeg");
    let result = fs.createReadStream('large-photo.jpeg');

    await pipeline(result, responseStream);    
  }
);

You can see the response streaming to the browser, which looks like this:

Calling these functions

Next, if you're going to call a function which issues a Streaming Response programmatically using the NodeJS AWS SDK, you'll need to use v3. I've written about this change extensively, but most importantly for this feature - it doesn't seem that the v2 SDK is supported at all. So you'll need to upgrade before you can take advantage of Streaming Responses. If you're looking to invoke a function using Streaming Responses with other languages, it's also now supported using the AWS SDK for Java 2.x, and AWS SDKs for Go version 1 and version 2. I'd hope Python's boto3 support is coming soon.

But wait, one catch

Finally, developers can use the streaming response capability only with the newer Lambda Function URL integration. Function URLs are one of several ways to trigger a Lambda Function via an HTTP request, which I've covered previously, in another post. This will be a bit limiting in terms of authentication mechanisms.

API Gateway and ALB are more common HTTP Integration methods for Lambda, and they do not support chunked transfer encoding - so you can't stream responses directly from a Lambda function to API Gateway or ALB using this feature.

You can use API Gateway in front of Lambda Function URL, and use that to increase the response size from the previous limit of 10mb, up to the new soft limit of 20mb - but users won't see an improvement in Time to First Byte.

My take

If you're using Lambda to serve media such as images, videos, or audio - Streaming Responses will help immensely. That's not been a core use case for me personally, but I suspect this will be most leveraged by developers using Lambda to serve frontend applications using server-side rendering. For those users, I think this launch is particularly exciting.
Ultimately, Streaming Response for Lambda is an important step in bringing the capability of Lambda closer to what users can get in other, traditional server-ful compute environments. It's an exciting new feature, and I'm looking forward to seeing the capabilities it unlocks for users.

Wrapping up

As always, if you liked this post you can find more of my thoughts on my blog and on twitter!

Benchmarking the AWS SDK

AJ Stuyvenberg — Mon, 20 Mar 2023 14:32:30 +0000

If you're building Serverless applications on AWS, you will likely use the AWS SDK to interact with some other service. You may be using DynamoDB to store data, or publishing events to SNS, SQS, or EventBridge.

Today the NodeJS runtime for AWS Lambda is at a bit of a crossroads. Like all available Lambda runtimes, NodeJS includes the aws-sdk in the base image for each supported version of Node.

This means Lambda users don't need to manually bundle the commonly-used dependency into their applications. This reduces the deployment package size, which is key for Lambda. Functions packaged as zip files can be a maximum of 250mb including code + layers, and container-based functions support up to 10GB image sizes.

The decision about which SDK you use and how you use it in your function seems simple at first - but it's actually a complex multidimensional engineering decision.

In the Node runtime, Node12.x, 14.x, and 16.x each contain the AWS SDK v2 packaged in the runtime. This means virtually all Lambda functions up until recently have been built to use the v2 SDK. When AWS launched the Node18.x runtime for Lambda, they packaged the AWS SDK v3 by default. Since the AWS SDK is likely the most commonly used library in Lambda, I decided to break down the cold start performance of each version across a couple of dimensions.

We'll trace the cold start and measure the time to load the SDK via the following configurations from the runtime:

The entire v2 SDK
One v2 SDK client
One v3 SDK client

Then we'll use esbuild to tree-shake and minify each client, and run the tests again:

Tree-shaken v2 SDK
One tree-shaken v2 SDK client
One tree-shaken v3 SDK client

Each of these tests were performed in my local AWS region (us-east-1), using x86 Lambda Functions configured with 1024mb of RAM. The client I selected was SNS. I ran each test 3 times and screengrabbed one. Limitations are noted at the end.

To view full-resolution flamegraphs, I suggest reading this post on my blog.

Loading the entire v2 SDK

There are a few common ways to use the v2 SDK.
In most blogs and documentation (including AWS's own, but not always), you'll find something like this:

const AWS = require('aws-sdk');
const snsClient = new AWS.SNS({});
// ... some handler code

Although functional, this code is suboptimal as it loads the entire AWS SDK into memory. Let's take a look at that flamegraph for the cold start trace:

In this case we can see that this function loaded the entire aws-sdk in 324ms. Check out all of this extra stuff that we're loading!

Here we see that we're loading not only SNS, but also a smattering of every other client in the /clients directory, like DynamoDB, S3, Kinesis, Sagemaker, and so many small files that I don't even trace them in this cold start trace:

First run: 324ms
Second run: 344ms
Third run: 343ms

Packaging and loading the entire v2 SDK

One common piece of advice I've read suggests that users should pin a specific version of the aws-sdk, and package it into their application.

Although the aws-sdk is already provided by AWS in the Lambda runtime, the logic is that AWS can roll out changes to the SDK at any point with no changes on your side. These changes should be backwards compatible, but unless you're specifically managing runtime updates, those new SDK changes will be applied automatically - potentially breaking your application.

But does manually packaging the aws-sdk impact the cold start duration? In this test, the code is still the same:

const AWS = require('aws-sdk');
const snsClient = new AWS.SNS({});
// ... some handler code

Yet the flame graph is not:

Note the difference from the first example. When we load node modules from the runtime, the span labels are aws.lambda.require_runtime. Now that we're packaging our own version of the sdk, we see the same general composition of spans labeled aws.lambda.require.

We also see that packaging our own v2 aws-sdk clocks in at 540ms!

First run: 540ms
Second run: 531ms
Third run: 502ms

The v3 aws-sdk is modularized by default, so we won't test importing the entire v3 SDK, so we'll move on to sub-client imports.

Loading one v2 SDK client

Let's consider a more efficient approach. We can instead simply pluck the SNS client (or whichever client you please) from the library itself:

const SNS = require('aws-sdk/clients/sns');
const snsClient = new SNS({});

This should save us a fair amount of time, check out the flame graph:

This is much nicer, 110ms. Since we're not loading clients we won't use,that saves us around 238 milliseconds!

First run: 110ms
Second run: 104ms
Third Run: 109ms

AWS SDK v3

The v3 SDK is entirely client-based, so we have to specify the SNS client specifically. Here's what that looks like in code:

const { SNSClient, PublishBatchCommand } = require("@aws-sdk/client-sns");
const snsClient = new SNSClient({})

This results in a pretty deep cold start trace:

We can see that the SNS client in v3 loaded in 250ms.
The Simple Token Service (STS) contributed 84ms of this time:

First run: 250ms
Second run: 259ms
Third run: 236ms

Bundled JS benchmarks

The other option I want to highlight is packaging the project using something like Webpack or esbuild. JS Bundlers transpile all of your separate files and classes (along with all node_modules) into one single file, a practice originally developed to reduce package size for frontend applications. This helps improve the cold start time in Lambda, as unimported files can be pruned and the entire handler becomes one file.

AWS SDK v2 - minified in its entirety

Now we'll load the entire AWS SDK v2 again, this time using esbuild to transpile the handler and SDK v2:

const AWS = require('aws-sdk');
const snsClient = new AWS.SNS({});
// ... some handler code
}

And here's the cold start trace:

You'll note that now we only have one span tracing the handler (as the entire SDK is now included in the same output file) - but the interesting thing is that the load time is almost 600ms!

First run: 597ms
Second run: 570ms
Third run: 621ms

Why is this so much slower than the non-bundled version?

Handler-contributed cold start duration is primarily driven by syscalls used by the runtime (NodeJS) to open files; eg fs.readSync.

To break this down:

Your code tells NodeJS to require the file.
NodeJS finds the file (this happens inside the require method)
NodeJS makes a system call, which tells the Firecracker VM instance to open the file.
Firecracker opens the file.
NodeJS reads the file entirely.
Your function code continues running.

The handler file is now 7.5mb uncompressed, and Node has to load it entirely.

Additionally I suspect that AWS can separately cache the built-in SDK with better locality (on each worker node) than your individual handler package, which must be fetched after a Lambda Worker is assigned to run your function.

In simple terms - AWS knows most functions will need to load the AWS SDK, so the library is cached on each machine before your function is even created.

Minified v2 SDK - loading only the SNS client

Once again we're importing only the SNS client, but this time we've minified it, so the code is the same:

const SNS = require('aws-sdk/clients/sns');
const snsClient = new SNS({});

You can see in the cold start trace that the SDK is no longer being loaded from the runtime, rather it's all part of the handler:

63ms is much better than the entire minified SDK from the previous test. Here are all three runs:

First run: 63ms
Second run: 71ms
Third run: 67ms

Minified v3 SDK

Next, let's look at a minified project using the SNS client from the v3 SDK:

const { SNSClient, PublishBatchCommand } = require("@aws-sdk/client-sns");
const snsClient = new SNSClient({})

Here's the flamegraph:

Far better now, 104ms. After repeating this test a few times, I saw that 104ms tended towards the high end and measured some as low as 83ms. No surprise that this will vary a bit (see the caveats), but I thought it was interesting that we got around the same performance as the minified v2 sns client code.

First run: 104ms 
Second run: 83ms
Third run: 110ms

I also find it fun to see the core modules, which are provided by Node Core, are also traced:

Scoring

Here's the list of fastest to slowest packaging strategies for the AWS SDK:

Config	Runtime	Result
esbuild + individual v2 SNS client	Node16x	63ms
esbuild + individual v3 SNS client	Node18x	83ms
v2 SNS client from the runtime	Node16x	104ms
v3 SNS client from the runtime	Node18x	250ms
Entire v2 client from the runtime	Node16x	324ms
Entire v2 client, packaged by us	Node16x	540ms
esbuild + entire v2 SDK	Node16x	570ms

Caveats, etc

Measuring the cold start time of a Lambda function and drawing concrete conclusions at the millisecond level is a bit of a perilous task. Deep below a running Lambda function lives an actual server whose load factor is totally unknowable to us as users. There could be an issue with noisy neighbors, where other Lambda functions are stealing too many resources. The host could have failing hardware, older components, etc. It could be networked via an overtaxed switch or subnet, or simply have a bug somewhere in the millions of lines of code needed to run a Lambda function.

Takeaways

Most importantly, users should know how long their function takes to initialize and understand specifically which modules are contributing to that duration.

As the old adage goes:
You can't fix what you can't measure.

Based on this experiment, I can offer a few key takeaways:

Load only the code you need. Consider adding a lint rule to disallow loading the entire v2 sdk.
Small, focused Lambda functions will experience less painful cold starts. If your application has a number of dependencies, consider breaking it up across several functions.
Bundling can be worthwhile, but may not always make sense.

For me, this means using the runtime-bundled SDK and import clients directly. For you, that might be different.

As far as the Node18.x runtime and v3 SDK, AWS has already said they're aware of this issue and working on it. I'll happily re-run this test when there's a notable change in the performance.

Keep in mind, the AWS SDK is only one dependency! Most applications have several, or even dozens of dependencies in each function. Optimizing the AWS SDK may not have a large impact on your service, which brings me to my final point:

Try this on your own functions

I traced these functions using a feature I built for Datadog called Cold Start Tracing, it's available now for Python and Node, and I'd encourage you to try this yourself with your own functions.

Wrapping up

You can find more of my thoughts on my blog and on twitter!

This post has been updated to add a new example, packaging the entire v2 SDK. You can view the diff on github.

AJ's re:Invent Recap - 2022

AJ Stuyvenberg — Mon, 05 Dec 2022 15:06:50 +0000

AWS re:Invent was a whirlwind! I had a great time meeting a number of AWS Community Builders, Heroes, and cloud enthusiasts. A huge part of re:Invent is the highly anticipated product launches, and there were far too many for me to discuss individually. Instead, here are three new features that I'm most excited about.

EventBridge Pipes

EventBridge is one of my favorite serverless services. It's made building event-driven applications quite simple. You can easily create an Event Bus, define a few events, and set up targets to receive those events. This gave users a clear path to build loosely coupled, fully serverless systems.

However - I often found the need to use a Lambda function as a target to filter events in some way. Occasionally I'd do some transformation and re-publish an event back into a bus. This is easy enough, but there are operational and development considerations to adding any additional Lambda function to your application.

Therefore, I'm happy to use any service which allows me to remove custom Lambda functions and replace them entirely with something managed.

Enter EventBridge Pipes. Pipes allow you to define optional filter, transform, or enrich stages between sources and target destinations. The Pipe will maintain order for you, and doesn't have to be used with an Event Bus.

Perhaps the most important impact on EventBridge Pipes is pricing. Events published to a pipe are $.40/million after filtering, where EventBridge is $1/million. More on this later.

You can learn more about EventBridge Pipes in the blog post

SnapStart

I've already written extensively about SnapStart, so I won't dive in here. That said, SnapStart for Lambda is how Lambda should have been from the very beginning.

I discussed this opinion in depth with Tarun, the Lambda Product Manager behind this feature, who understands my perspective (although I won't say he necessarily agrees. This blog is my opinion, not his).

SnapStart is the result of many years of work, requiring infrastructure changes, new caching system deployments, and runtime changes to make the hooks function. It was a heavy lift, and I'm pleased to see this one land.

Hopefully we see SnapStart for more runtimes very soon.

Application Composer

I had never used Stackery for a production deployment, but given how complex some of my CloudFormation templates have been - I think I probably should have. Stackery was SaaS product that helped you build Serverless applications with a simple drag and drop interface.

Stackery was acqui-hired by AWS in 2021, and the product was shut down. It seems that some of those innovations and influences have been rolled into a new feature called Application Composer, and the UX seems surprisingly polished.

AWS is infamous for building reliable, scalable infrastructure tools with a clunky developer experience. But from the videos I've seen so far, Application Composer looks excellent. I haven't played with it yet, but I'm looking forward to it.

You can import existing CloudFormation or SAM templates and visualize them, make changes, and then re-export them without ever having to use another intrinsic function like !ref or !getAtt.
Check out the blog post from Julian Wood to learn more.

Bonus: Javascript resolvers for AppSync

AppSync helps developers write GraphQL APIs on AWS, which I haven't used seriously - mainly due to my aversion to authoring VTL.

Now I'll need to give it a serious second look, as we can use a subset of Javascript to implement business logic in AppSync.

There are limitations, but this release likely helps many users remove Lambda functions which they previously used between AppSync and other AWS resources. For that alone, it deserves a mention here. Learn more here.

Extra Bonus: Step Functions Distributed Map

I'm not an avid Step Functions user, but given that Werner Vogels mentioned this in his keynote - I assume it's huge; and it's always safe to assume that I'm wrong, and that this is game-changing.

I look forward to being wrong on this one.

Wrapping up

Alright, if you've made it to the end, I assume I have either deeply offended you, or piqued your interest. You can find more of my thoughts on my blog and on twitter and let me know!

Introducing Lambda SnapStart

AJ Stuyvenberg — Tue, 29 Nov 2022 04:44:48 +0000

Today, AWS announced a new feature called SnapStart for Lambda. SnapStart is a feature aimed at reducing the duration of Cold Starts, which can be especially painful for Java environments. SnapStart is now available for Java 11 functions.

SnapStart works by taking a snapshot of your function just before the invocation begins, and then storing that snapshot in a multi-tiered cache in order to make subsequent Lambda container initializations much faster.

This snapshot capability is powered by MicroVM Snapshot technology inside FireCracker, the underlying MicroVM framework used by Lambda as well as AWS Fargate. In practice this means function handler code can start running with sub-second latency (up to 10x faster).

SnapStart is available initially for Java (Amazon Corretto 11), but given that the underlying system providing this capability is runtime-agnostic, it seems likely we'll see SnapStart support for other runtimes very soon.

How does it work?

Let's recall this slide from Julian Wood's 2021 re:Invent talk, Best practices of advanced serverless developers:

We see that a traditional Lambda invocation (known as an on-demand invocation) begins by the Lambda placement service creating a new execution environment. Your code (or open-container image) is downloaded to the environment, and the runtime is initialized. Then your handler is loaded, and finally your handler is executed.

Now with SnapStart, a snapshot is taken after a new version of the function is created.

Creating and publishing a new Version takes some additional time, compared to simply using $LATEST. Thankfully snapshots are somewhat long-lived. They are only reaped by Lambda if the function is not invoked for a couple of weeks, then the next invocation would be on-demand and generate a new snapshot.

Once the snapshot is recorded, all new concurrent invocations to fully qualified Lambda ARNs will utilize the snapshot to resume. This is where the payoff occurs, as resuming a snapshot can be up to 10x faster than creating and initializing a new Lambda execution environment.

One important note is that Snap Start doesn't change anything for serial "warm" invocations. Only a new request or event triggering a concurrent invocation (where there is not a warm Lambda container to receive a new invocation) will use SnapStart.

What's in a snapshot?

Snapshots contain both memory and disk state of the function after it's been initialized (but before the invocation has begun). Snapshot data is chunked into 512kb fragments, and cached in a multi-tier strategy.

When a Function snapshot resumes, it will only load chunks required by the function code itself. This is pretty clever, and I presume this is done using mmap's MAP_PRIVATE, as documented in the firecracker repo. However - reads to the snapshot memory or disk are lazy-loaded. This means there may be some latency when referencing variables or other data, as the entire function code may not be loaded when resumed, and don't occur until after a specific location is accessed.

Some important caveats

SnapStart is only usable when invoking fully qualified Lambda ARNs, which means publishing and invoking an specific function version. AWS always recommends using versions for your Lambda integrations as a matter of best practice, but the simple fact is that our development tools (including AWS-backed CDK and SAM) don't do this as a default.

This means you'll likely need to make some changes to your infrastructure-as-code tool if you want to take advantage of SnapStart.
As a quick reminder, here's the difference between an unqualified and qualified function ARN.
Qualified ARN:

arn:aws:lambda:aws-region:acct-id:function:helloworld:42

Unqualified ARN:

arn:aws:lambda:aws-region:acct-id:function:helloworld

Pricing

Free!! Free is good. I like free.

Randomness and Uniqueness

MicroVM Snapshots have inherent Uniqueness and Randomness concerns, as a snapshot of memory from a singular invocation will be re-used across multiple (perhaps concurrent) invocations. Fortunately this is mitigated by using cryptographically-secure pseudo-random number generators, instead of PRNGs.

AWS also provides a tool to help check to ensure your function doesn't assume uniqueness, it's available here

Ephemeral Data and Temporary Credentials

Another consequence of snapshot-resuming is that ephemeral data or temporary credentials have no expiry guarantees. For example; a library which creates an expiring token at function may find that the token is expired when a new container spins up via SnapStart. Therefore, it's best practice to verify that any ephemeral tokens or data is valid before using it.

Network connections

The last likely pitfall that serverless developers may fall into is storing and resuming network connections. It's common practice to memoize a database or network connection outside of the function handler, so it's available for subsequent invocations. This won't work with SnapStart, because although the the HTTP or Database library is still initialized, the actual socket connection can't be transferred or multiplexed to the new containers. So you'll have to re-establish these connections.

The documentation doesn't cover VPC connections, but I anticipate SnapStart won't help here either; as function containers are created and then their network devices are attached to a VPC, versus the somewhat common theory that functions will be created inside a VPC.

My thoughts

To me, SnapStart feels like the way Lambda should have been designed from the very beginning. If the claimed performance improvements hold up, it'll change the way Lambda scaling is perceived in the Serverless space and the industry at large. That said, while SnapStart seems truly compelling, I can't help but consider the developer experience.

Although I think SnapStart likely represents the defacto standard for all new Lambda functions going forward, our tools need to adapt before SnapStart is easy to use.

Using SnapStart means only invoking qualified ARNs (via versioning). As I previously discussed, this isn't the default for our tools and likely means building more complex deployment processes. It also means we, as Serverless developers, need to improve how we build and ship Serverless applications.

Wrapping up

If you want to learn more about SnapStart, you can check out the full documentation

That's all I've got for you today. If you've got questions, or I missed something - feel free to reach out to me on twitter and let me know!

Update:
An earlier version of this article included an graphic which, upon reflection, was confusing. I've updated the graphic to be more clear.

Safely migrating Lambda functions from x86 to ARM

AJ Stuyvenberg — Fri, 18 Nov 2022 16:26:14 +0000

As Serverless developers, we often take our tools for granted. We press "serverless deploy" or "cdk deploy", sip some coffee, and it all "just works". But in reality we're wielding powerful managed services and infrastructure as code; the underlying systems which actually run our software are abstracted away from us - and that's kind of the point. These tools give us zero-downtime deployments, rollbacks, and zero-to-large scale compute right out of the box. It's amazing!

These magical abstractions also mean that we often forget that our tools are sharp. Like a knife used absentmindedly, we'll occasionally leave unsafe defaults in place from development to production. That's not a bad thing necessarily! But sharp tools can unpredictably cut us. And unlike chef knives, software is constantly shifting.

I love getting stuff for free

Serverless developers have the benefit of cloud providers deploying new features which improve our experience and reduce costs. Recently AWS introduced Graviton for Lambda, which leverages their custom ARM-based processor. Using Graviton, AWS says that users can see 19% better performance at 20% lower cost - and many users wouldn't even have to change any of their code at all! At my day job at DataDog, we quickly rolled out ARM-compatible versions of the DataDog Extension and our IaC integrations like the Serverless Plugin and the CDK Construct.

Before long a bug was reported by some folks from Vercel, and I started digging in. At first it seemed like a simple bug; switching from x86 to arm64-based Lambda functions caused unix launch errors. It appeared as though an x86-based binary extension was being applied to an arm64-based function. These binaries are incompatible, as x86 and arm64 have different instruction sets. I was able to reproduce the issue, and started to suspect the CloudFormation template generated by the CDK construct.

That's impossible!

But the CloudFormation template was correct! I couldn't create a condition where we'd erroneously match up the ARM function with x86 Lambda Extension, or vice-versa! It was frustrating. No matter what the template said, for a few seconds during the deployment, the Lambda function would fail to initialize with a unix process launch error.

At this point I had a hunch that this wasn't a bug per-se, rather a sharp edge around CloudFormation and the Lambda control plane. I decided to try to reproduce this issue with the Serverless Framework. It also relies on CloudFormation, but generates different CloudFormation templates, and would rule out the existence of a bug in the CDK construct. I created a demo project and was able to reproduce this immediately.

With two reproducible cases, I filed an AWS support ticket. After some back and forth with the support team, my case ended with a Chime call to the Lambda Workers team. They were super helpful, and pointed out that CloudFormation deployments result in two important API calls to different systems: an updateFunction call and updateFunctionConfiguration call. These API calls happen in parallel, so the updateFunction call is updating the Lambda Function code and architecture, while the updateFunctionConfiguration call is setting layers/extensions, as well as environment variables, tags, and things like timeout value and memory.

This race condition is inherent to Lambda and CloudFormation today, and can occur for Layers, Extensions, Environment Variables, or tags! Ultimately we can entirely avoid this failure mode by adhering to best practices: For any production system, you should never directly invoke the unqualified function ARN. This means specifically rolling out changes to a new Lambda function version or alias, and then mapping that version to your integration (API Gateway, SQS, SNS, EventBridge, etc).

Qualified ARN:

arn:aws:lambda:aws-region:acct-id:function:helloworld:42

Unqualified ARN:

arn:aws:lambda:aws-region:acct-id:function:helloworld

Developer Ergonomics

I freely admit that this is harder to do than simply invoking $LATEST. There's a reason multi-phase deployments are not the default for these IaC tools. In many cases you'll need to deploy twice, once to update the function code/configuration, and once to update your integration to point to the new function. Of course if you've split your stacks, this would already be your deployment practice. But fundamentally I don't think this is your fault, developer. We need sensible defaults that include best practices. We need less risky IAM policies, and we need safer deployments.

We need to demand more ergonomic tools.

Wrapping up

Although we probably can use serverless deploy or cdk deploy with regular, unversioned function ARNs in a lot of cases; we need to remember that we're orchestrating load balancers, messaging queues, and complex integrations. These are complex systems with complex failure modes, and these failure modes don't often appear in development environments, and will only burn us in production.

Our tools need to improve as well. Both CDK and Serverless could have interactive deployments to roll out new Lambda function versions. CloudFormation can and should detect when function code and function configuration changes are being deployed simultaneously, and warn or require versioned functions be used. I'd love to see this particular sharp edge documented in the cdk best practices.

Until then, let's agree not to point all our traffic straight to $LATEST. Our tools are sharp and we use them regularly, but it's important not to forget that we can cut ourselves.

That's all I've got for you today. If you've been burned by this or other Serverless side-effects, feel free to reach out to me on twitter and let me know!

Introducing Lambda Function URLs

AJ Stuyvenberg — Wed, 06 Apr 2022 21:07:29 +0000

AWS has just launched a new, not entirely unfamiliar feature - there is now a new way to invoke a Lambda function via HTTP API call.

Lambda Function URLs are built into Lambda itself, so there's no need to configure an external API Gateway (V1) or HTTP Api (V2).

You can create one right now through the AWS console, either by creating a new function or editing an existing function:

This short post will help you understand what Lambda Function URLs are, when to choose them, and when to reach for a more traditional API integration.

At a glance

Lambda Function URLs allow your function to be called via a HTTP request. This capability isn't new, previously you'd need to pair Lambda with API Gateway (v1 or v2) to invoke a function via HTTP request. API Gateway had a free tier, but after that you'd be charged $1.00/million requests (not including the time your Lambda function required to execute).

The key distinction between API Gateway and Lambda Function URLs is that Function URLs are a free* way to invoke your Lambda function via HTTP request *(you only pay for the very small additional running time incurred by serializing the request and response).

That's right, Lambda Function URLs are free

This is clearly the biggest selling point for Function URLs because it's not uncommon for API Gateway to be the biggest part of a Serverless bill!

There are also more significant advantages:

Function timeout is 15 minutes, instead of API Gateway's 29 seconds
Ease of setup and use
Performance seems to be really good for an API-based Lambda integration. With a vanilla Node.JS App, cold starts take about 900ms until the function is invoked, and warm starts are a blistering 8.35ms 🤯 and

But there are some drawbacks over a API Gateway/HTTP API:

No specified routes or payload formatting options
No custom domain names. Your URL is randomly assigned an ID: https://<url-id>.lambda-url.<region>.on.aws
IAM authorization or public endpoints only, no authorizers are supported.
Only synchronous invocation is supported

Routing

Function URLs are similar to the proxy+ integration you may be familiar with in API Gateway.
This means that any HTTP method to any endpoint will route to your function, eg:
POST https://<url-id>.lambda-url.<region>.on.aws/foo/123/bar
and
GET https://<url-id>.lambda-url.<region>.on.aws/biz/456/
will both invoke your function.

If you want to serve multiple resources from the same Function URL, you'll need to parse the route from the requestOptions in your Lambda Function. This effectively places you into a Mono-Lambda API pattern.

Authorization Options

Your authorization choices are limited to Public, or IAM authorized. This lets you write IAM policies to restrict which users or services can invoke your Lambda Function via the Function URL. It's worth noting that you can still use IAM to limit who can invoke the function explicitly via the aws sdk or CLI, which opens up some interesting configuration choices.

Payload Specification

As there is no method for specifying Lambda integration method, like with API Gateway, Lambda Function URLs infer response format and use the API Gateway payload v2 request format.

If your function returns a string, API Gateway will return a HTTP 200 status code and your message.
If your function returns valid JSON, it will be sent (along with a HTTP 200 status code).

Most users will want more control over the full HTTP response, and thus specific keys like headers, statusCode, and isBase64Encoded are properly assigned to the API response. cookies can also be set, and are represented as a string array.

Function output:

{
  "statusCode": 201,
  "headers": {
    "specified-header": "specified-header-value"
  },
  "body": "\"result\":\"success\"",
  "cookies": ["User_Id=abcd1234; Expires 19 Nov 2021 20:22 GMT"]
}

Client response:

HTTP 201
content-type: application/json
specified-header: specified-header-value
set-cookie: User_Id=abcd1234; Expires=19 Nov 2021 20:22 GMT
{
  "result": "success"
}

The full documentation is available here, and goes into several more examples.

Key takeaways

Having played with Lambda Function URLs, I think they're useful in a couple of important cases - Mono-Lambda APIs, Service to Service communication, and lightweight webhooks. I think with a few iterations, Function URLs could get much better - and possibly be the default integration mechanism for HTTP-based Lambda invocation.

The Mono-Lambda API

Given the caveat that your authentication and authorization is already handled via IAM, or you can resolve it in your function against a provider like Auth0 - Lambda Function URLs are a cheap and easy way spin up a Mono-Lambda API. I've written extensively about why you might consider this pattern, so dig in to this blog post if you're curious to learn more.

The Webhook use case

Sometimes I just need a darn lambda function to talk to Slack, or to receive a webhook from Github. Gluing workflows together has been one of the key attractions of Serverless technology, and Function URLs fit a great niche as they are easy to set up when I don't care to have an api.company.com domain name.

Service-to-Service communication

Serverless APIs often use Cognito or Auth0 to authenticate requests from users, but in a service oriented architecture, one system often needs to authenticate with another system as a service (not acting as a user). Usually this is for things like bulk processing of records, or fetching data asynchronously.

Function URLs protected with IAM roles fill a gap here, as previously you'd either need to pass user authentication context (which is not desirable, especially if the downstream service is being invoked via some persistent mechanism like DynamoDB Streams), or call the Lambda function directly with the AWS SDK (which is either a slight hassle or massive headache).

New for 2023 - Response Streaming

Almost exactly a year later AWS has launched Streaming Responses for NodeJS. This feature helps reduce time to first byte by allowing your function to start streaming the beginning of a response to the user before waiting for the entire response to be finished. Function URLs are the only way to get streaming responses out of Lambda, so you'll need to use them if you'd like to harness this new capability. You can read more about Response Streaming here.

Wrapping up

Long term I see Function URLs fitting a pattern of service discover via Outputs, where public APIs are served with API Gateway, and internal API endpoints are surfaced with Function URLs and shared via CloudFormation Outputs (which I suggest you to use for sharing configuration between services).

Good luck out there. Feel free to reach out on twitter with specific questions, or to share something you're building!