DEV Community

Lambda Just Got a File System. I Put AI Agents on It.

Eric D Johnson on May 13, 2026

You've written this code before. An S3 event fires, your Lambda function wakes up, and the first thing it does is download a file to /tmp. Process ...

Read full post

PracHub • May 13

Mounting S3 as a local file system on Lambda with S3 Files is a great solution for avoiding the /tmp juggling act. I'm curious about the VPC setup, though. It seems to add complexity to something meant to simplify things. How do you manage the increased cold start times with the VPC requirement? If you're getting ready for system design interviews, PracHub has some good question banks that really reflect what interviewers ask, unlike trying to piece things together from blog posts.

Max Quimby • May 13

This pattern is going to land hard for code-analysis and document-processing agents — the "/tmp tax" is a real and unromantic cost that nobody talks about until they've written their fifth boto3 download-process-upload-cleanup wrapper. Two operational caveats worth raising for anyone considering this for production agents though:

First, S3 Files semantics are not POSIX in the way agents tend to assume. If your orchestrator and your two reviewer Lambdas all "write" to the same path concurrently, you don't get atomic last-writer-wins; you get S3's eventually-consistent object semantics underneath a filesystem-shaped API. For your security/style reviewer split where outputs are distinct paths this is fine — but the moment you have two agents updating a shared state file (e.g., a tasks.json), you need an explicit lease/lock or you'll get torn writes.

Second, the cold-start angle deserves measurement, not a one-line "no longer has the penalty" claim. VPC + S3 Files + Bedrock SDK + Strands runtime is a meaningful cold-start surface; for orchestrators that fan out 10+ agent invocations on demand, p99 is still where the user pain lives.

Genuinely excited about this primitive though — it makes the "agents as sandboxed processes with a shared working directory" mental model finally cheap enough to use.

Eric D Johnson AWS • May 13

Appreciate the thoughtful follow-up — both points are worth addressing.

On consistency: S3 Files provides full NFS v4.2 file system semantics — including read-after-write consistency, file locking, and POSIX permissions. It's not raw S3 eventual consistency exposed through a filesystem-shaped API. When a Lambda closes a file handle, the data commits to the high-performance storage layer and is visible to other clients on their next open (close-to-open semantics). Advisory locks via flock/fcntl are supported at the NFS layer.

That said, your instinct is right in spirit: concurrent writes to the same path from multiple Lambda invocations still need coordination. NFS close-to-open means "visible after close" — if two agents have the same file open simultaneously, last-close wins. For the architecture in the post, the agents write to distinct paths by design (each agent owns its output file), so this isn't an issue. But if you were building a shared tasks.json pattern, you'd want either distinct files per agent (append-only log) or advisory locks. I'd argue the better design is to keep the orchestrator as the single writer for shared state — which is what durable functions give you naturally.

On cold starts: Fair challenge — I shouldn't hand-wave it. VPC-attached Lambda cold starts lost the ~10s ENI-attach penalty back in 2019 with Hyperplane networking, but the total runtime surface you're describing is real: VPC network setup + NFS mount + Bedrock SDK init + Strands runtime. I haven't published p99 numbers for this specific stack yet — that's a good follow-up post. What I can say: the NFS mount itself is fast once mount targets exist (sub-100ms in my testing), and Bedrock SDK initialization is dominated by the first inference call, not the client instantiation. SnapStart or provisioned concurrency would eliminate most of the tail, but that's a cost/latency tradeoff worth quantifying.

The "agents as sandboxed processes with a shared working directory" framing is exactly the mental model. Glad it resonates.

Eric D Johnson AWS • May 13

Shameless plug, but since cold starts came up: I recently benchmarked modern Lambda cold starts across runtimes with multi-concurrency and SnapStart in the mix. The results might surprise you if your mental model is still calibrated to 2021-era numbers: Cold Starts Are Dead

Mininglamp • May 15

S3 Files eliminating the /tmp tax is a big deal for multi-agent workloads. The download-process-upload ceremony has always been the awkward part of running AI pipelines on Lambda — agents need shared state and intermediate artifacts, not isolated blob operations. With a mounted filesystem, agent-to-agent handoff becomes just file writes, which maps much better to how local agent frameworks already work. The interesting follow-up is whether this changes the cold start calculus for heavier ML workloads.

Eric D Johnson AWS • Jun 3

Exactly — the "download-process-upload ceremony" is a great way to put it. That's precisely the friction S3 Files removes. Your agents just open() and write() like they would locally, and the shared filesystem handles the coordination. Agent-to-agent handoff becomes a file appearing in a directory, not a boto3 pipeline.

And since the S3 Files mount isn't tied to any single Lambda function — any compute that speaks NFS (other Lambdas, EC2, ECS, EKS) can mount the same filesystem — you also get flexibility in where each agent runs. Your orchestrator can be on Lambda, a heavier ML agent can run on ECS with a GPU, and they all share the same working directory. No S3 API glue code in between.

On the cold start question for heavier ML workloads: for inference via Bedrock (which is what this architecture uses), cold starts are mostly about runtime initialization, not model loading — the model is on Bedrock's side. For self-hosted models, you'd likely run those on ECS/EKS or SageMaker anyway, but they'd still mount the same S3 Files filesystem for I/O. The architecture doesn't force everything onto Lambda — it just makes Lambda viable for the orchestration and lightweight agent layers while heavier workloads run wherever makes sense, all sharing the same filesystem.

If you're curious about modern Lambda cold start numbers specifically, I did a deep dive on that recently: Cold Starts Are Dead.

Harjot Singh • May 31

Putting agents on Lambda with a real filesystem is a neat unlock, because so much of what makes an agent useful (scratch space, intermediate files, working with documents, caching between tool calls) assumed persistent local disk that serverless didn't have. Giving agents a filesystem on ephemeral compute closes a real gap. The architectural tension worth naming is state versus statelessness: Lambda's model is stateless-by-default, agents want continuity, and a filesystem is a tempting place to put state, but ephemeral storage disappears between invocations, so anything that must survive (checkpoints for a resumable long run, durable memory) still needs to live in real external storage, with the local filesystem as fast scratch, not the source of truth. Get that split right and you get the best of both: cheap, scaling, isolated execution plus the working space agents need. The isolation is actually a security bonus too, each invocation's filesystem is its own sandbox, which is exactly the bounded-blast-radius property you want when an agent is writing files. Use the local FS as scratch, keep durable state external. That separate-scratch-from-source-of-truth instinct is core to how I think about agent infra in Moonshift. Are you treating the Lambda filesystem as pure ephemeral scratch, or trying to persist any agent state across invocations through it?

Eric D Johnson AWS • Jun 3

Great question — and the answer is: both, because S3 Files collapses that dichotomy.

The filesystem in this architecture isn't Lambda's ephemeral /tmp. It's an S3 Files mount — a shared NFS filesystem backed by your S3 bucket. Files written by one Lambda invocation persist and are visible to the next invocation, to a different Lambda function, or to any other compute that mounts the same filesystem (EC2, ECS, EKS). It's not tied to the Lambda function itself — it's infrastructure-level shared storage accessible from anything that speaks NFS. Close-to-open consistency means once a file handle is closed, it's committed and visible to other clients on their next open.

So the "separate scratch from source of truth" instinct you're describing — I agree with it as a general principle, but S3 Files makes the filesystem itself the external durable storage. You get the ergonomics of open() and pathlib in your agent code (no boto3 download-process-upload dance), the isolation of per-invocation Lambda sandboxes, and persistence across invocations and across services. The agent writes files like it's on a local disk, but the data lives in S3.

The architecture does still use Lambda's actual /tmp for truly ephemeral scratch (unpacking, temp processing), but the working directory where agents read inputs and write outputs is the S3 Files mount — a shared workspace the orchestrator and all agent functions coordinate through, regardless of what compute they run on.

To directly answer: the Lambda filesystem (S3 Files mount) is the durable external storage, just accessed through filesystem semantics instead of API calls. No state split to manage.

Armorer Labs • May 13

The /tmp tax and IAM gotchas are real. One thing I'd add: when agents run in environments like Lambda, keeping track of what happened across partial failures gets harder fast.

For production agents, I'd want a run record that captures: task input, tool calls made, args/results, retries, and final artifact — regardless of whether the Lambda run succeeded or timed out. Without that, debugging a failed run in a stateless environment turns into guessing.

This is the angle we're building Armorer around: a local control plane for operating agents, not another agent framework.

Eric D Johnson AWS • May 13

I love it!

In my case, this is exactly why the project uses durable functions as the orchestration layer — they are the run record.

Every step in a durable function is checkpointed: task input, each agent invocation, tool calls, results, retries, and final output. If a Lambda times out or fails mid-run, the execution history is already persisted. You pick up exactly where you left off — no guessing, no reconstruction. And worst case, the full execution history is there in the console for manual debugging — you can see every step, what it received, what it returned, and where it broke.

So the observability problem you're describing is real for raw Lambda agents (fire-and-forget invocations with no coordination layer). But once you put a durable orchestrator in front, that audit trail comes built in. The orchestrator is the control plane.

That said, I get the angle — there's value in runtime-agnostic tooling that works across frameworks, not just AWS-native. Different problem surface.

Mykola Kondratiuk • May 18

ran into this exact problem - agents writing partial state to /tmp that the next invocation couldn't see. ended up routing through S3 manually, which was a mess. how does the mounted bucket handle concurrent writes from parallel runs?

Eric D Johnson AWS • Jun 3

Classic pain — I've been there. The boto3 download-process-upload-cleanup loop for every file operation adds up fast, especially when you have agents handing off intermediate artifacts.

To answer your question: S3 Files delivers NFS v4.2 semantics with close-to-open consistency. When a Lambda invocation closes a file handle, the data is committed and visible to any other client (another Lambda invocation, a different Lambda function, EC2, ECS, whatever mounts the same filesystem) on their next open. So parallel runs can safely write to the same mounted filesystem — as long as they're writing to different paths, which is the natural pattern for agents (each agent writes its own output file).

If parallel runs need to write to the same file, NFS advisory locks (flock/fcntl) are supported. But in practice, the better design is to avoid that entirely: have each agent write to its own path, and let the orchestrator (a durable function) be the single reader that assembles the results. That way you get full parallelism with zero contention.

The key mental model shift: this isn't Lambda's ephemeral /tmp. The S3 Files mount is shared, persistent, and backed by your S3 bucket. Every invocation — current, next, or from a completely different function — sees the same filesystem. The /tmp-disappears-between-invocations problem simply doesn't exist here.

Mykola Kondratiuk • Jun 4

the manual S3 loop becomes load-bearing before anyone questions it - that's the part that's hard to undo. by the time you hit perf issues it's woven into three different handoff patterns. curious how the NFS path holds up when agents are writing concurrently vs strict sequential handoffs.

xulingfeng • May 28

This is dangerous — now I have no excuse not to try it 👀 Did you run into this in production or was it more of a lab experiment?

Followed! Looking forward to more content like this.

GoDavaii - Advanced Health AI • May 15

That /tmp tax really hits when dealing with large, transient files. For us, that's often voice data. When a user speaks 'kaaichal' (Tamil for 'fever') to GoDavaii, we're not just moving text-it's audio streams, processed by agents.\n\nEliminating the explicit s3.download_file for intermediate steps, where agents might pass chunks, streamlines...

Rasmus Ros • May 14

Interesting article. The file API is the easy sell, but the real design question is the consistency model.

If writes from Lambda reach S3 within minutes and S3 changes appear on the mount within seconds, then a shared workspace across concurrent functions is not the same thing as a normal local file system. It is closer to a cached coordination layer with lag. That matters a lot for the agent pattern you describe.

For this kind of workflow I would want very explicit ownership rules:
one writer per path
immutable inputs where possible
append-only or versioned outputs
a done marker or manifest file instead of assuming directory state is current

Otherwise it is easy to get stale reads, clobbered writes, or an agent consuming partial output from another step.

The other thing worth calling out is where this is a simplification versus where it is just moving complexity. It clearly removes a lot of /tmp boilerplate. But in exchange you take on VPC setup, mount targets, access points, IAM wrinkles, and a file system abstraction over an object store. That trade can still be good, but it is not free.

I also think readers should be careful about assuming POSIX-like semantics mean POSIX-like behavior under concurrency. For single-function or low-contention pipelines this looks great. For parallel workers sharing a repo-sized workspace, I would want some benchmarks and failure mode testing before treating the mount as the source of truth.

Still, this is a useful post because it focuses on the part people will actually care about in practice: does this make Lambda code less annoying to write. In many cases it probably does.

Xidao • May 14

The /tmp tax is real — I've spent way too many hours writing boilerplate download/process/upload patterns in Lambda functions. The S3 Files mount approach is a huge quality-of-life improvement.

The multi-agent workspace use case you described is particularly interesting. One thing I'd be curious about is how you handle write conflicts when multiple agents are working on the same files simultaneously. With the /tmp approach, each invocation had its own isolated copy, so conflicts weren't a concern. But with a shared mount, two agents could potentially write to the same file at the same time. Do you use any file-level locking, or is the orchestration layer (Step Functions or similar) responsible for serializing writes?

Also, the VPC requirement is worth calling out more. For teams already running in VPC, it's a non-issue. But for those with simpler architectures, adding a VPC + NAT gateway just for S3 Files could be a significant operational overhead. I've seen some teams use EFS directly with Lambda for similar shared-state patterns, and the VPC requirement was always the main friction point. It's good that AWS is making this easier, but it's still something to factor into the architecture decision.

Andrii Krugliak • May 14

Putting a real filesystem under an agent changes the failure modes more than the capabilities. The interesting part is not what the agent can now write to disk; it is what happens when two invocations race on the same file and you discover the agent never had a concurrency model in its head. Lambda's previous statelessness was actually a feature for this exact reason. Worth thinking through which agent workloads benefit from persistence vs which were quietly relying on amnesia.

Artemii Amelin • May 16

The shared S3 workspace approach is clever because it makes the coordination layer boring, which is exactly what you want. The gap I keep hitting with multi-agent Lambda setups is the networking piece. Once agents need to talk to each other across accounts or back to a local instance, you're wiring up auth and NAT from scratch per agent. Pilot Protocol (pilotprotocol.network) handles this. Agents get a persistent virtual address and encrypted tunnel regardless of which network they're on. Makes the communication layer feel as boring as the file access layer you described.

Max Quimby • May 16

The persistent-filesystem-on-Lambda story finally closes a gap I've been working around with S3-as-tmp for years. The honest cost of "stateless functions + remote storage for everything" was never the storage bill — it was the boilerplate to checkpoint partial work and the inevitable bug where step 4 silently re-ran step 2's side effects.

One thing I'd love more clarity on: cold-start behavior when the FS is non-trivial. If an agent has been writing intermediate artifacts to /mnt for a long-running task and the function gets reaped, does the next invocation get a warm view of that filesystem, or are you reattaching from a backing store on the first hit? The latency profile on that matters a lot for whether you can build a "resume where you left off" pattern, vs. needing a separate orchestrator to track which steps completed. Either way, this is the right direction — agents need somewhere to put their scratch work that isn't a database row.