<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akash Hadagali Persetti</title>
    <description>The latest articles on DEV Community by Akash Hadagali Persetti (@akashpersetti).</description>
    <link>https://dev.to/akashpersetti</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4008516%2F8863d5f9-a35a-4e4a-8d34-a9f7164efaae.png</url>
      <title>DEV Community: Akash Hadagali Persetti</title>
      <link>https://dev.to/akashpersetti</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akashpersetti"/>
    <language>en</language>
    <item>
      <title>Why my LangGraph agent throws away its checkpoint on every request</title>
      <dc:creator>Akash Hadagali Persetti</dc:creator>
      <pubDate>Mon, 29 Jun 2026 16:56:22 +0000</pubDate>
      <link>https://dev.to/akashpersetti/why-my-langgraph-agent-throws-away-its-checkpoint-on-every-request-21ff</link>
      <guid>https://dev.to/akashpersetti/why-my-langgraph-agent-throws-away-its-checkpoint-on-every-request-21ff</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Wingman is a worker-evaluator agent. A worker LLM tries to satisfy a success criteria, an evaluator decides if it passed, and if it failed the worker retries up to 5 times. Standard loop.&lt;/p&gt;

&lt;p&gt;The deployment is where it gets opinionated. It runs on AWS Lambda behind API Gateway, packaged as a container image on ECR. Lambda is stateless by design. The execution environment can be frozen, thawed, or thrown away between requests, and you get no say in when. So the question that drives the whole architecture is: where does the conversation live between turns?&lt;/p&gt;

&lt;p&gt;LangGraph has a built-in answer. You attach a checkpointer, give each conversation a &lt;code&gt;thread_id&lt;/code&gt;, and the framework persists graph state after every superstep. On the next call you pass the same &lt;code&gt;thread_id&lt;/code&gt; and it resumes. That's the blessed path, and on a long-lived server it's the right one.&lt;/p&gt;

&lt;p&gt;On Lambda it falls apart. The default &lt;code&gt;MemorySaver&lt;/code&gt; keeps checkpoints in process memory. When Lambda freezes the environment, that memory might survive for the next warm invocation or it might be gone. You cannot tell from inside the handler. A user's second message could land on a fresh execution environment with an empty checkpoint store, and the agent forgets the conversation. There are durable checkpointer backends, but they pull you toward keeping a database connection warm and treating the LangGraph thread as your source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision
&lt;/h2&gt;

&lt;p&gt;I stopped trying to make the framework's persistence survive Lambda. Instead I made the compute fully stateless and kept the durable state myself.&lt;/p&gt;

&lt;p&gt;The conversation history is a plain list of &lt;code&gt;{role, content}&lt;/code&gt; dicts. It lives in DynamoDB, keyed by &lt;code&gt;session_id&lt;/code&gt;. On every request the handler reads that history, the agent rebuilds the entire LangGraph state from scratch, runs one superstep, and writes the updated history back.&lt;/p&gt;

&lt;p&gt;The checkpointer is still there. It's just deliberately disposable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fresh thread ID each call — MemorySaver only lives for this invocation
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())}}&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A new &lt;code&gt;thread_id&lt;/code&gt; every call means the checkpointer never resumes anything. It exists only because the graph wants one, and it dies with the invocation. The real memory is the &lt;code&gt;history&lt;/code&gt; row in DynamoDB. LangGraph's persistence layer became a no-op on purpose.&lt;/p&gt;

&lt;p&gt;Reconstruction is just a loop that turns stored dicts back into message objects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;past_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;past_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EVALUATOR_PREFIX&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;past_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="n"&gt;past_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;past_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success_criteria&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;success_criteria&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The answer should be clear and accurate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feedback_on_work&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success_criteria_met&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_input_needed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The request handler is boring, which is the point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/api/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;            &lt;span class="c1"&gt;# read
&lt;/span&gt;    &lt;span class="n"&gt;new_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;wingman&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_superstep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;            &lt;span class="c1"&gt;# rebuild + run
&lt;/span&gt;        &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;success_criteria&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;_put_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# write
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_history&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read, rebuild, run, write. No connection to keep warm, no checkpoint to hope survived. Any Lambda environment can serve any request for any session, because nothing important lives in the compute.&lt;/p&gt;

&lt;p&gt;The alternatives I rejected. Keeping state in Lambda memory loses conversations on cold start, which is the bug I described above. A durable LangGraph checkpointer (DynamoDB or Postgres backend) would work, but it makes the framework's thread the source of truth and ties me to its serialization format. Putting the whole history in the request payload pushes state to the client and grows every round trip. DynamoDB on-demand keyed by session was the least clever option, and least clever is what you want for state.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke, and what I would change
&lt;/h2&gt;

&lt;p&gt;Two things in this code are honest tradeoffs, not polish.&lt;/p&gt;

&lt;p&gt;First, retries don't survive a request. &lt;code&gt;turn_count&lt;/code&gt; resets to 0 on every reconstruction, and the &lt;code&gt;MAX_TURNS = 5&lt;/code&gt; cap only applies within a single superstep. So the worker-evaluator loop can burn up to 5 retries answering one message, but it carries no retry budget across messages. For this app that's fine, since each user turn is its own task. If a single user task spanned multiple requests, I'd be silently resetting the budget, and I'd need to persist &lt;code&gt;turn_count&lt;/code&gt; into the DynamoDB item alongside &lt;code&gt;history&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Second, I throw away evaluator feedback on reload. The reconstruction loop skips any assistant line starting with &lt;code&gt;EVALUATOR_PREFIX&lt;/code&gt;, so the evaluator's reasoning never re-enters the worker's context on the next request. That keeps the stored history clean and the prompt short, but it means cross-turn the worker can't see why it was corrected before. Within a turn the feedback flows fine through &lt;code&gt;feedback_on_work&lt;/code&gt;. Across turns it's gone. That was a deliberate call to keep context small, and I'd revisit it if quality on multi-turn tasks dropped.&lt;/p&gt;

&lt;p&gt;There's also a concurrency hole I'm aware of. DynamoDB writes here are last-write-wins with no conditional check. Two requests racing on the same &lt;code&gt;session_id&lt;/code&gt; would clobber each other's history. A single user clicking once at a time never hits it, but it's the first thing I'd harden with a conditional write or a version attribute if this saw real concurrent traffic.&lt;/p&gt;

&lt;p&gt;On cost, DynamoDB is on-demand billing keyed by a single partition key, which keeps a personal-scale app inside free tier. Reads and writes are single-item by primary key, so latency is a few milliseconds and predictable. The expensive part of a request is the LLM calls in the loop, not the state layer. The state layer is cheap. The loop is where the money goes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;On Lambda, don't make your framework's in-process memory the source of truth. Keep the compute disposable, own your durable state in something built for it, and let the checkpointer die every request.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>aws</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
