AWS Lambda has become ubiquitous when it comes to serverless applications. And why shouldn't it be? Using it you can run code for virtually any type of application or backend service - all with zero administration. There is no need to provision or manage any infrastructure. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. But ever wondered how that happens?
How exactly does Lamda run our code? How does it autoscale? How does it balance the load? How does it handle the errors? What about security? Where does the code exactly run? What exactly happens when you call
To answer all our above questions we need to understand the architecture of AWS Lambda. Under the hood, AWS Lambda has a really sophisticated architecture that keeps on evolving. This is what takes care of all the undifferentiated heavy lifting for the developers. This architecture can be dissected into two parts, Control Plane and Data Plane as depicted below:
This plane is responsible for our interaction with AWS Lambda using various methods such as AWS Console, APIs, and AWS SAM. It is responsible for packaging the code and providing configuration APIs which allows us to set various configurations such as concurrency, environment variables, timeouts amongst other options.
This plane is responsible for handling the synchronous and asynchronous invocation of Lambda. Different components present in it handle authentication, scalability, errors, different limits, etc. To understand how these components work in synergy and make Lambda what it is we first need to understand each and every component of it individually.
Front End Invoke: This service authenticates all lambda invocation requests. Moreover, it is also responsible for loading metadata and coordinating with Counting Service to confirm concurrency. Basically, it orchestrates synchronous and asynchronous invocation of Lambda.
Counting Service: A critical component that provides a region-wide view of customer concurrency to help enforce set limits. It tracks the current concurrency and takes further action when the limit is hit.
Worker Manager: It tracks container idle and busy state and schedules incoming invoke requests to available containers. Moreover, it ensures lambda executes with correct privileges. In addition to that, it interacts with Placement Service to scale containers up and down for scalability.
Placement Service: This is responsible for placing sandboxes for workers. It uses intelligence to help determine where a sandbox should be put and how to administer unhealthy workers.
This is where our code executes. Worker Provisions a secure environment for customer code execution. It also provides multiple run time environments. Furthermore, it downloads customer code and mounts it for execution. In addition to that, it also notifies the Worker Manager when execution is completed.
One Worker can have the Lambda function running from various accounts. There are virtualization mechanisms present to handle isolation for codes for the different lambda functions. (More on this in the next tutorial.)
Let's answer our original question
What exactly happens when we hit invoke on lambda?
- Lambda is invoked. The Application Load Balancer routes the request to a fleet of Front End Invoke service.
- Front End Invoke authenticates the request. It then downloads function metadata and goes to the counting service to check the concurrency limit. Upon successful authentication and concurrency validation, it goes to the Worker Manager to reserve a sandbox.
- Upon receiving the request, the Worker Manager firstly locates a Worker. It then creates a sandbox on it. Upon successful creation, it downloads the function code and then initializes the run time. It finally calls init on it. [cold start]
- Once the init function has completed, the Worker lets the Worker Manager know which lets the Front End Invoke know.
- Front End now directly calls Invoke on the Worker.
- Once the code finishes execution Worker lets the Worker Manager know it is idle and thus there is a warm sandbox.
We all know Warm start in Lambda's run faster. This is the exact reason why. During a warm start, the Worker Manager doesn't need to administer a worker and initialize it. It already exists and is ready for utilization. Consider the above scenario where the code has just run on a worker and it is warm.
- The first three steps remain the same.
- Front End reaches to Worker Manager to reserve a sandbox but this time Worker Manager already has a warm sandbox and thus it straightway returns the Worker's location to the Front End Service.
- Front End calls invoke on it which is relatively faster.
What happens when the Worker Manager can't find a Worker for provisioning sandboxes? Such an event occurs when all Workers are utilized and in such a scenario, the Worker Manager interacts with Placement Service which provides it with a new Worker location. Placement service is solely responsible for fulfilling worker capacity and maintaining scalability. Anytime there are no new workers left it provides Worker Manager with new workers(with a lease of 6-10 hrs).
Lambda continuously monitors the health of hosts and removes unhealthy hosts. When a Worker becomes unhealthy Worker Manager detects and stops provisioning sandboxes on it. And if the entire availability zone fails the system components continue working without routing traffic to the failed availability zone. This is how Lambda always remains fault-tolerant.
This is more or less the whole crux of how Lambda works under the hood and executes our code while maintaining scalability and fault tolerance. There is still a lot of questions left unanswered though. How exactly does a Worker work? Moreover, Worker is running functions from various accounts on top of only one hardware then how does it isolate the code and runtime? What is the virtualization technology used? I will be answering all these questions in the next part of this tutorial. Hope this was helpful.