The problem
Wingman is a worker-evaluator agent. A worker LLM tries to satisfy a success criteria, an evaluator decides if it passed, and if it failed the worker retries up to 5 times. Standard loop.
The deployment is where it gets opinionated. It runs on AWS Lambda behind API Gateway, packaged as a container image on ECR. Lambda is stateless by design. The execution environment can be frozen, thawed, or thrown away between requests, and you get no say in when. So the question that drives the whole architecture is: where does the conversation live between turns?
LangGraph has a built-in answer. You attach a checkpointer, give each conversation a thread_id, and the framework persists graph state after every superstep. On the next call you pass the same thread_id and it resumes. That's the blessed path, and on a long-lived server it's the right one.
On Lambda it falls apart. The default MemorySaver keeps checkpoints in process memory. When Lambda freezes the environment, that memory might survive for the next warm invocation or it might be gone. You cannot tell from inside the handler. A user's second message could land on a fresh execution environment with an empty checkpoint store, and the agent forgets the conversation. There are durable checkpointer backends, but they pull you toward keeping a database connection warm and treating the LangGraph thread as your source of truth.
The decision
I stopped trying to make the framework's persistence survive Lambda. Instead I made the compute fully stateless and kept the durable state myself.
The conversation history is a plain list of {role, content} dicts. It lives in DynamoDB, keyed by session_id. On every request the handler reads that history, the agent rebuilds the entire LangGraph state from scratch, runs one superstep, and writes the updated history back.
The checkpointer is still there. It's just deliberately disposable:
self.graph = builder.compile(checkpointer=MemorySaver())
# Fresh thread ID each call — MemorySaver only lives for this invocation
config = {"configurable": {"thread_id": str(uuid.uuid4())}}
result = await self.graph.ainvoke(state, config=config)
A new thread_id every call means the checkpointer never resumes anything. It exists only because the graph wants one, and it dies with the invocation. The real memory is the history row in DynamoDB. LangGraph's persistence layer became a no-op on purpose.
Reconstruction is just a loop that turns stored dicts back into message objects:
past_messages = []
for h in history:
if h["role"] == "user":
past_messages.append(HumanMessage(content=h["content"]))
elif h["role"] == "assistant" and not h["content"].startswith(EVALUATOR_PREFIX):
past_messages.append(AIMessage(content=h["content"]))
past_messages.append(HumanMessage(content=message))
state = {
"messages": past_messages,
"success_criteria": success_criteria or "The answer should be clear and accurate",
"feedback_on_work": None,
"success_criteria_met": False,
"user_input_needed": False,
"turn_count": 0,
}
The request handler is boring, which is the point:
@app.post("/api/chat")
async def chat(request: ChatRequest):
history = _get_session(request.session_id) # read
new_history = await wingman.run_superstep( # rebuild + run
request.message, request.success_criteria, history,
)
_put_session(request.session_id, new_history) # write
return {"history": new_history}
Read, rebuild, run, write. No connection to keep warm, no checkpoint to hope survived. Any Lambda environment can serve any request for any session, because nothing important lives in the compute.
The alternatives I rejected. Keeping state in Lambda memory loses conversations on cold start, which is the bug I described above. A durable LangGraph checkpointer (DynamoDB or Postgres backend) would work, but it makes the framework's thread the source of truth and ties me to its serialization format. Putting the whole history in the request payload pushes state to the client and grows every round trip. DynamoDB on-demand keyed by session was the least clever option, and least clever is what you want for state.
What broke, and what I would change
Two things in this code are honest tradeoffs, not polish.
First, retries don't survive a request. turn_count resets to 0 on every reconstruction, and the MAX_TURNS = 5 cap only applies within a single superstep. So the worker-evaluator loop can burn up to 5 retries answering one message, but it carries no retry budget across messages. For this app that's fine, since each user turn is its own task. If a single user task spanned multiple requests, I'd be silently resetting the budget, and I'd need to persist turn_count into the DynamoDB item alongside history.
Second, I throw away evaluator feedback on reload. The reconstruction loop skips any assistant line starting with EVALUATOR_PREFIX, so the evaluator's reasoning never re-enters the worker's context on the next request. That keeps the stored history clean and the prompt short, but it means cross-turn the worker can't see why it was corrected before. Within a turn the feedback flows fine through feedback_on_work. Across turns it's gone. That was a deliberate call to keep context small, and I'd revisit it if quality on multi-turn tasks dropped.
There's also a concurrency hole I'm aware of. DynamoDB writes here are last-write-wins with no conditional check. Two requests racing on the same session_id would clobber each other's history. A single user clicking once at a time never hits it, but it's the first thing I'd harden with a conditional write or a version attribute if this saw real concurrent traffic.
On cost, DynamoDB is on-demand billing keyed by a single partition key, which keeps a personal-scale app inside free tier. Reads and writes are single-item by primary key, so latency is a few milliseconds and predictable. The expensive part of a request is the LLM calls in the loop, not the state layer. The state layer is cheap. The loop is where the money goes.
Takeaway
On Lambda, don't make your framework's in-process memory the source of truth. Keep the compute disposable, own your durable state in something built for it, and let the checkpointer die every request.
Top comments (0)