WonderfulSoap

Posted on Feb 12

Deep Dive into AWS Lambda (1): How Is the Handler Invoked? What Exactly Is Lambda Runtime and How Does It Work?

#aws #serverless #awslambda #programming

AWS Lambda, one of the most well-known serverless services, is famous for its simplicity and the fact that you don't need to maintain any servers. All we need to do is write a handler and upload it to AWS Lambda to build all kinds of functionality.
But have you ever wondered:

How exactly is the handler invoked?
What are the underlying mechanisms behind Lambda's implementation?
And can we customize Lambda to support different languages?

This series will dive step by step into the details of Lambda.

The Magic of the Lambda Handler

Below are some very simple Lambda handlers written in different languages, all doing the same thing: returning hello world.

An example in Node.js:

// index.js
export async function handler(event) {
  return 'hello world';
};

An example in Python:

# lambda_function.py
def lambda_handler(event, context):
    return 'hello world'

When we invoke these Lambda functions, the handler is automatically called. This necessarily means there is a mechanism running outside the handler that independently handles handler invocation. Furthermore, regardless of which language the handler is written in, Lambda can correctly process these invocations, which also implies that Lambda must have a universal handler processing mechanism that works across all languages.

What is this mechanism? It is the AWS Lambda Runtime.

bootstrap: Where It All Begins

First, it's helpful to distinguish between cold starts and warm starts.

Lambda cold start: When a request arrives at Lambda for the first time, or when the number of Lambda instances cannot meet the current request demand, Lambda will spin up a brand new instance to handle the request and automatically scale out (similar to starting a brand new server, although a Lambda instance is extremely lightweight and can start almost instantly). This is a Lambda cold start.

Lambda warm start: After a Lambda instance finishes processing a request and enters an idle state, the Lambda instance waits for new requests. When a new request comes in, the Lambda instance can immediately process it without needing to spin up a brand new instance. This is a warm start.

Terminology note: AWS officially refers to this as an "execution environment" rather than an "instance." For the sake of convenience and clarity, this article will refer to the execution environment simply as instance — both terms refer to the same thing in the context of this article.

Q: When Lambda cold starts, it must run some program to enter the processing flow. What command or program does it run first?
A: The answer is bootstrap.

bootstrap is a customizable executable program. It can be an executable binary, a bash script, or any runnable program. Every time Lambda spins up a brand new instance, it immediately invokes bootstrap.

Now let's customize a bootstrap to demonstrate this.

First, create a test Lambda. Note that when creating it, select Amazon Linux 2 as the Runtime, which allows us to customize the Lambda's bootstrap.

Then create the files needed for Lambda locally:

mkdir bootstrap-test
cd bootstrap-test
touch bootstrap
chmod +x bootstrap

Open the bootstrap-test/bootstrap file and add the following content:

#!/bin/sh
set -euo pipefail

echo "bootstrap is executed!"

As you can see, this bootstrap is just a bash script. After creating it, we granted it executable permissions with chmod +x bootstrap. When executed, it simply outputs bootstrap is executed! — that's all.

Then, in the bootstrap-test directory, run zip -r ../bootstrap-test.zip . to create the corresponding zip package, and upload this zip package to the Lambda function we just created (for convenience, we upload it directly through the AWS Console).

Click the Test button in the Console to test the Lambda function execution, and we get the following execution logs:

......

Function Logs:
bootstrap is executed!
INIT_REPORT Init Duration: 2.39 ms  Phase: invoke   Status: error   Error Type: Runtime.ExitError
START RequestId: 38bf088e-bb1d-4666-82d6-f9ad57b9ec4c Version: $LATEST
RequestId: 38bf088e-bb1d-4666-82d6-f9ad57b9ec4c Error: Runtime exited without providing a reason
Runtime.ExitError
END RequestId: 38bf088e-bb1d-4666-82d6-f9ad57b9ec4c
REPORT RequestId: 38bf088e-bb1d-4666-82d6-f9ad57b9ec4c  Duration: 59.29 ms  Billed Duration: 60 ms  Memory Size: 128 MB Max Memory Used: 3 MB

Request ID: 38bf088e-bb1d-4666-82d6-f9ad57b9ec4c

Since our Lambda doesn't implement any actual functionality, the execution failed. However, the key thing to notice is the very first line in the logs: bootstrap is executed! — our custom bootstrap was invoked first when the instance started up.
The general flow is as follows:

Now we understand that bootstrap gets invoked. But how do we receive and process request data?
This is where the Lambda Runtime API comes in.

Lambda Runtime API

The AWS Runtime API is essentially just a few very simple APIs. bootstrap calls these APIs to handle and control Lambda's execution.
Continuing with our bootstrap script example from above:

#!/bin/sh
set -euo pipefail
echo "bootstrap is executed!"

RESPONSE=$(curl -s "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
echo "The event data is: $RESPONSE"

This time, our bootstrap script uses curl to call a special API and outputs the API response.

http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next The function of this API is very simple: calling it returns the event data from the user's current Lambda request.

Repackage this file with zip -r ../bootstrap-test.zip ., upload it to Lambda, and try executing it. Again, for convenience, we test this Lambda execution directly in the Console.

Click the Save button, then click Test, and we get the following Lambda execution logs:

bootstrap is executed!
START RequestId: 0b2613f3-49b4-470f-b2f0-20ec523a40f1 Version: $LATEST
The event data is: {"user-data":"hello, I'm jack!"}
RequestId: 0b2613f3-49b4-470f-b2f0-20ec523a40f1 Error: Runtime exited without providing a reason
Runtime.ExitError
END RequestId: 0b2613f3-49b4-470f-b2f0-20ec523a40f1
REPORT RequestId: 0b2613f3-49b4-470f-b2f0-20ec523a40f1  Duration: 13.69 ms  Billed Duration: 42 ms  Memory Size: 128 MB Max Memory Used: 25 MB  Init Duration: 27.34 ms

First, we see bootstrap is executed! being output, confirming that bootstrap was executed.
Then, immediately following, is the log The event data is: {"user-data":"hello, I'm jack!"}.
The {"user-data":"hello, I'm jack!"} in the log is the return value of the /2018-06-01/runtime/invocation/next API — it returns the event from the user's current invocation. This is the most important function of the Lambda Runtime API.

What Is AWS_LAMBDA_RUNTIME_API?

Let's continue examining the API we called: "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next"
You'll notice we used the environment variable ${AWS_LAMBDA_RUNTIME_API}. This is an environment variable maintained by Lambda that points to the Lambda API server. You might be curious about its specific value and where it points to. Let's modify bootstrap again:

#!/bin/sh
set -euo pipefail
echo "bootstrap is executed!"

echo "AWS_LAMBDA_RUNTIME_API value: ${AWS_LAMBDA_RUNTIME_API}"
RESPONSE=$(curl -s "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
echo "The event data is: $RESPONSE"

We get the following log:

....
AWS_LAMBDA_RUNTIME_API value: 127.0.0.1:9001
...

Oh, the AWS_LAMBDA_RUNTIME_API server is actually on Lambda's local environment! So you might be curious about which process is listening on this port. Let's continue modifying our bootstrap to try outputting the system's process list and the process corresponding to port 9001.

Since the Lambda environment doesn't come with commands like ps, we can only obtain information by accessing /proc and /proc/net/tcp.

The script for getting processes and specific port information is quite complex — don't worry too much about reading every detail. Let's jump straight to the results.

#!/bin/sh
set -euo pipefail
echo "bootstrap is executed!"

echo "AWS_LAMBDA_RUNTIME_API value: ${AWS_LAMBDA_RUNTIME_API}"

# list all process
echo "--- Process List ---"
for pid in /proc/[0-9]*; do
    p=${pid##*/}
    cmd=$(tr '\0' ' ' < "$pid/cmdline")
    echo "PID $p: $cmd"
done

# check which process is providing prot 9001
PORT=9001; \
HEX_PORT=$(printf ':%04X' $PORT); \
# 1. Extract Inodes (Supports both IPv4 and IPv6)
INODES=$(awk -v port="$HEX_PORT" '$2 ~ port {print $10}' /proc/net/tcp /proc/net/tcp6 | sort -u); \
\
if [ -z "$INODES" ]; then \
    echo "Port $PORT is not in use."; \
else \
    for INODE in $INODES; do \
        echo "--- Searching for Inode: $INODE ---"; \
        # 2. Iterate through fds to find the matching Socket Inode
        for FD in /proc/[0-9]*/fd/*; do \
            if [ -L "$FD" ] && [ "$(readlink $FD 2>/dev/null)" = "socket:[$INODE]" ]; then \
                # 3. Extract PID from the path
                PID=$(echo "$FD" | cut -d'/' -f3); \
                echo "[MATCH FOUND]"; \
                echo "PID: $PID"; \
                # 4. Fetch process info directly from /proc (Zero dependencies)
                echo -n "Process Name: "; cat /proc/$PID/comm; \
                echo -n "Command Line: "; tr '\0' ' ' < /proc/$PID/cmdline; echo -e "\n"; \
            fi \
        done \
    done \
fi

RESPONSE=$(curl -s "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
echo "The event data is: $RESPONSE"

Updating the above bootstrap to Lambda and executing it, we get the following logs:

--- Process List ---
PID 1: /var/runtime/init --enable-extensions --logs-egress-api fluxpump --disable-tracing 
PID 8: /bin/sh /var/task/bootstrap 
--- Searching for Inode: 4608 ---
[MATCH FOUND]
PID: 1
Process Name: init
Command Line: /var/runtime/init --enable-extensions --logs-egress-api fluxpump --disable-tracing

In the Process List, we see two processes:

PID 1: /var/runtime/init
PID 8: /bin/sh /var/task/bootstrap The former has PID 1, meaning it is the first process started by the system. The other process is our custom bootstrap script.

Note that our bootstrap process has PID 8 rather than 2. This means that /var/runtime/init ran some other commands for initialization after starting, and only then executed bootstrap. As for what exactly it ran — since Lambda is a black box, we can't know for sure.

Then, looking at the port information, we can see that port 9001 is actually provided by /var/runtime/init. It is responsible for interacting with the external Lambda service and providing the corresponding API endpoint for other programs within the Lambda instance environment.

After the above investigation, our understanding of Lambda's startup flow is much clearer. Let's reorganize it:

Returning Processing Results via the Response API

Now that we understand how Lambda starts up, let's revisit the custom bootstrap we created earlier:

#!/bin/sh
set -euo pipefail
echo "bootstrap is executed!"

RESPONSE=$(curl -s "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
echo "The event data is: $RESPONSE"

It retrieved the user's request information from the /next API, but then did nothing and exited. As a result, our Lambda invocation didn't actually succeed, and the caller didn't receive any return value.
After obtaining the user's request, we need to return a response to the user. This requires another Runtime API:

#!/bin/sh
set -euo pipefail

echo "bootstrap is executed!"

HEADERS_FILE=$(mktemp)
RESPONSE=$(curl -sS -D "$HEADERS_FILE" "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
# Get request id from next API header
REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS_FILE" | tr -d '[:cntrl:]' | awk '{print $2}')

echo "The event data is: $RESPONSE"
echo "The Request ID is: $REQUEST_ID"


LAMBDA_RESPONSE="{\"message\": \"Hello from bootstrap!\", \"echo\": $RESPONSE}"
curl -s -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$LAMBDA_RESPONSE"

echo "Response sent successfully!"

In this bootstrap, we made the following changes:

Write the /next API's response to a temporary file so we can analyze the response headers later.
Extract the Lambda-Runtime-Aws-Request-Id header from the /next API's response headers — this contains the request-id corresponding to the user's current request.
Construct a response body LAMBDA_RESPONSE and call the Runtime API http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response to send it back.

After updating the above content to Lambda and testing, the Lambda invocation succeeded this time.
The event we sent for this invocation was:

{
    "user-data": "hello, I'm jack!"
}

After invoking Lambda, we received the following return value, exactly as expected:

{
  "message": "Hello from bootstrap!",
  "echo": {
    "user-data": "hello, I'm jack!"
  }
}

Now we fully understand the Runtime API's operating mechanism. Summarizing Lambda's invocation flow, we get the following diagram:

Making bootstrap Support Warm Starts and Instance Reuse

Although we implemented a bootstrap that accepts a user invocation and returns a result, the entire bootstrap exits immediately after execution is complete.

Looking at the Lambda execution logs from the above bootstrap:

bootstrap is executed!
START RequestId: 368fec15-8cb0-4973-9bb0-dbd844edfce4 Version: $LATEST
The event data is: {"user-data":"hello, I'm jack!"}
The Request ID is: 368fec15-8cb0-4973-9bb0-dbd844edfce4
{"status":"OK"}
Response sent successfully!
RequestId: 368fec15-8cb0-4973-9bb0-dbd844edfce4 Error: Runtime exited without providing a reason
Runtime.ExitError
END RequestId: 368fec15-8cb0-4973-9bb0-dbd844edfce4
REPORT RequestId: 368fec15-8cb0-4973-9bb0-dbd844edfce4  Duration: 131.66 ms Billed Duration: 132 ms Memory Size: 128 MB Max Memory Used: 4 MB

We can see that after bootstrap output Response sent successfully!, Lambda logged the errors Error: Runtime exited without providing a reason and Runtime.ExitError.
This is because bootstrap exited, which was detected by /var/runtime/init.
Although bootstrap exiting doesn't cause the entire Lambda instance to shut down, it does cause the following problems:

Lambda detects an execution anomaly and reports an error, which can skew error-rate metrics and alerts.
The bootstrap startup process can be quite complex (although the bootstrap in this article is extremely simple, real-world bootstrap implementations are typically quite "heavy"). Restarting bootstrap on every Lambda invocation would significantly impact performance.

To solve this problem, let's continue to improve our bootstrap to keep it running:

#!/bin/sh
set -euo pipefail

echo "bootstrap is executed!"

HEADERS_FILE=$(mktemp)

while true; do
  RESPONSE=$(curl -sS -D "$HEADERS_FILE" "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
  # Get request id from next API header
  REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS_FILE" | tr -d '[:cntrl:]' | awk '{print $2}')

  echo "The event data is: $RESPONSE"
  echo "The Request ID is: $REQUEST_ID"

  LAMBDA_RESPONSE="{\"message\": \"Hello from bootstrap!\", \"echo\": $RESPONSE}"
  curl -s -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$LAMBDA_RESPONSE"

  echo "Response sent successfully!"
done

Compared to the previous version, we added an infinite while true loop to this version of bootstrap and wrapped the /next API request process inside it.
The /next API has a very special behavior: when there are no user requests, the /next API blocks until the next user invocation arrives.
In our example, after bootstrap starts and enters the while infinite loop, it first calls the /next API. Since a Lambda instance cold start is usually triggered by an actual user request, the /next API will return the user's request information immediately at this point.
After bootstrap calls the /response API to return the processing result, bootstrap calls the /next API again. This time, since there are no incoming user requests, the /next API will block (i.e., it simply hangs at the step of calling the /next API). It remains blocked until the next user request comes in, at which point the /next API unblocks and enters the next processing cycle.
This is the core event loop of the Lambda Runtime API. Regardless of the language environment, every Lambda is built on top of this loop that continuously calls the /next and /response APIs.

Finally, let's redraw Lambda's execution flow diagram. With the infinite loop added, this becomes the final version:

Summary

At this point, we have a general understanding of Lambda's execution, and we know what bootstrap is and what the Runtime API is.
But this is far from enough. For example:

Error handling: Although error handling often seems unexciting, it is very important for Lambda. In the next article, we will discuss this topic.
So what exactly is a Runtime, and how are the official Lambda implementations for each language built? We now know what the Runtime API is, but the concept of Runtime is still vague, and we're curious about how the official Lambda implementations for each language work. We will discuss this in future articles.