DEV Community

Discussion on: How AWS Lambda Works Under The Hood

Collapse
 
dmalouf profile image
David Malouf

Nice read, thank you Oliver!

I'm not that well-versed in KVM microVMs so I figured I'd ask you: do you know why/how KVM sometimes creates new Lambdas with (sometimes) artifacts from a previously-started Lambda? For example, sometimes there will be items in-memory (AWS often encourages 'storing' DB connections in-memory as they can, sometimes, be reused by a 'new' Lambda) or sometimes there are files in the /tmp folder.

Just curious.

Collapse
 
oliverjumpertz profile image
Oliver Jumpertz

Hey David, thank you very much!

The /tmp folder is permanent.
Imagine you use VMWare to start a virtual machine on your computer. You can shut it down, but the files you stored are persisted and are there again when you restart the VM. The same holds true for microVMs managed by Firecracker.

Regarding in-memory storage:
Your handler is actually executed by a wrapper. That wrapper is an HTTP service that forwards requests to your handler function. As long as the VM is running, that wrapper is running, like a normal microservice. And for all that time it is running, you can indeed store things in memory.

As soon as your VM is shut down, however, that stored information gets lost. :)

Collapse
 
dmalouf profile image
David Malouf

That makes sense, thank you Oliver!

Following on, my experience is that the memory/tmp-folder are not always left-over. Here's my scenario: I was tasked with a proof-of-concept need where a PDF uploaded to S3 triggered a Lambda that would take the PDF, break it into single-page PDFs, convert all of those single-page PDFs into full-size and thumbnail images, and then put all of these 'new' files (single-page PDFs, images) back into S3.

So I wrote a script that put 100 relatively-large PDFs into S3 at roughly the same time (i.e. concurrently). I had the Lambda log what was in the /tmp folder (i.e. print to stdin the list of files in /tmp). I was surprised to see 'other' PDFs in the log (other than the one being worked on).

I was even more surprised that some Lambdas logged no 'other' PDFs, some a few 'others', some many 'others'.

Based on your article, I am wondering if having microVMs running on different physical servers might be the key to this? But I'm just guessing (haha).

Thanks a TON for replying to my first question - very kind to take trivial questions from strangers :-)

Thread Thread
 
oliverjumpertz profile image
Oliver Jumpertz

Okay, yes, I wasn't clear enough on that one, but I also didn't cover this in the article, explicitly.

It is actually never guaranteed that two invocations of your Lambda functions use the same VM. That's up to the management layer.

Multiple invocations in your case will most likely have lead to multiple VMs being sound up with your image. And some of those were reused later, so your temp files were still there.

There sadly is not a lot of explicit documentation on that behavior, as it is also relevant for security, so information is pretty rare. :)