<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: karchichen</title>
    <description>The latest articles on DEV Community by karchichen (@karchic_dev).</description>
    <link>https://dev.to/karchic_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3922967%2Fe2744971-2034-4fb7-9331-613a9f9e7fbf.png</url>
      <title>DEV Community: karchichen</title>
      <link>https://dev.to/karchic_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/karchic_dev"/>
    <language>en</language>
    <item>
      <title>5 production bugs I debugged in popular AI libraries this week (and the fix patterns you can steal)</title>
      <dc:creator>karchichen</dc:creator>
      <pubDate>Sun, 10 May 2026 07:45:16 +0000</pubDate>
      <link>https://dev.to/karchic_dev/5-production-bugs-i-debugged-in-popular-ai-libraries-this-week-and-the-fix-patterns-you-can-steal-n8e</link>
      <guid>https://dev.to/karchic_dev/5-production-bugs-i-debugged-in-popular-ai-libraries-this-week-and-the-fix-patterns-you-can-steal-n8e</guid>
      <description>&lt;p&gt;If you're building anything with LangChain, OpenAI's Python SDK, or LangGraph in&lt;br&gt;
production, you've probably run into the same class of bugs that have been&lt;br&gt;
quietly breaking real apps for months. Over the last week I dug through ~70&lt;br&gt;
open issues across these libraries while doing client debugging work, and there&lt;br&gt;
are five patterns that come up &lt;em&gt;constantly&lt;/em&gt; — each with a non-obvious root&lt;br&gt;
cause and a 5-line fix.&lt;/p&gt;

&lt;p&gt;Here they are, ranked by how much pain they cause.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. &lt;code&gt;_create_usage_metadata&lt;/code&gt; crashes when &lt;code&gt;service_tier&lt;/code&gt; is set and &lt;code&gt;cached_tokens&lt;/code&gt; is missing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;code&gt;langchain-ai/langchain&lt;/code&gt; (&lt;a href="https://github.com/langchain-ai/langchain/issues/36657" rel="noopener noreferrer"&gt;#36657&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt_details&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_details&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="n"&gt;prompt_details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cached_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;-- this line is the bug
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This stores &lt;code&gt;None&lt;/code&gt; in the dict explicitly when &lt;code&gt;cached_tokens&lt;/code&gt; is missing.&lt;br&gt;
The subsequent &lt;code&gt;.get(key, 0)&lt;/code&gt; then sees the key as &lt;em&gt;present&lt;/em&gt; and returns &lt;code&gt;None&lt;/code&gt;&lt;br&gt;
instead of the default &lt;code&gt;0&lt;/code&gt; — classic Python &lt;code&gt;dict.get&lt;/code&gt; + explicit-None trap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt_details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cached_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;or 0&lt;/code&gt; instead of the default-arg form. Handles both missing keys &lt;em&gt;and&lt;/em&gt;&lt;br&gt;
explicit &lt;code&gt;None&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; The bug only triggers when you've enabled cached tokens&lt;br&gt;
&lt;em&gt;and&lt;/em&gt; are using a service tier — which is exactly when you're optimizing cost,&lt;br&gt;
i.e. production traffic. In dev with cached_tokens disabled, you never see it.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. &lt;code&gt;@langchain/aws&lt;/code&gt; crashes on empty AIMessage text blocks in Bedrock Converse
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;code&gt;langchain-ai/langchainjs&lt;/code&gt; (&lt;a href="https://github.com/langchain-ai/langchainjs/issues/10782" rel="noopener noreferrer"&gt;#10782&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt; When converting a LangChain &lt;code&gt;AIMessage&lt;/code&gt; to Bedrock Converse format,&lt;br&gt;
empty text blocks (e.g. when an LLM emits a tool-call without any prose) fall&lt;br&gt;
through to an &lt;code&gt;else&lt;/code&gt; branch that throws.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;textBlock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;textBlock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// skip empty blocks instead of throwing&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The subtler issue worth flagging upstream:&lt;/strong&gt; if the &lt;em&gt;only&lt;/em&gt; content is empty&lt;br&gt;
text + tool calls, the resulting Converse message has an empty &lt;code&gt;content&lt;/code&gt; array.&lt;br&gt;
Bedrock's API rejects that with a different error. So you need a second guard&lt;br&gt;
on the final message construction.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. SSRF bypass in &lt;code&gt;validate_safe_url&lt;/code&gt; when &lt;code&gt;LANGCHAIN_ENV=local_test&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;code&gt;langchain-ai/langchain&lt;/code&gt; (&lt;a href="https://github.com/langchain-ai/langchain/issues/37297" rel="noopener noreferrer"&gt;#37297&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt; A &lt;code&gt;local_test&lt;/code&gt; branch in the URL validator does an &lt;strong&gt;early return&lt;/strong&gt;&lt;br&gt;
before the hostname/IP validation runs. Set &lt;code&gt;LANGCHAIN_ENV=local_test&lt;/code&gt; and&lt;br&gt;
the entire SSRF check short-circuits — not just the localhost allowlist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attack surface:&lt;/strong&gt; any code path that lets an attacker influence env vars (a&lt;br&gt;
misconfigured container, a dev-staging promotion mistake, a shared CI runner)&lt;br&gt;
gets a free pass on outbound URL filtering. SSRF to internal services becomes&lt;br&gt;
one env-var-flip away.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix shape:&lt;/strong&gt; &lt;code&gt;local_test&lt;/code&gt; mode should &lt;em&gt;widen&lt;/em&gt; which destinations are&lt;br&gt;
permitted, not skip validation entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local_test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;allowlist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LOCAL_ALLOWLIST&lt;/span&gt;  &lt;span class="c1"&gt;# broader, but still validated
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;allowlist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PROD_ALLOWLIST&lt;/span&gt;
&lt;span class="nf"&gt;validate_against&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allowlist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# always runs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; This is a CVE-class bug hiding behind what looks like&lt;br&gt;
an env-gated dev convenience. If you're running LangChain in any env that&lt;br&gt;
ingests user URLs, audit your &lt;code&gt;validate_safe_url&lt;/code&gt; calls today.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. &lt;code&gt;create_agent&lt;/code&gt; exits prematurely on next turn with stale &lt;code&gt;structured_response&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;code&gt;langchain-ai/langchain&lt;/code&gt; (&lt;a href="https://github.com/langchain-ai/langchain/issues/36957" rel="noopener noreferrer"&gt;#36957&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt; &lt;code&gt;create_agent&lt;/code&gt; doesn't clear or re-validate &lt;code&gt;structured_response&lt;/code&gt;&lt;br&gt;
from the checkpoint state at the start of a new turn. If the previous run&lt;br&gt;
ended with a populated &lt;code&gt;structured_response&lt;/code&gt;, the graph's conditional edge&lt;br&gt;
sees it as a valid terminal signal and routes to END &lt;em&gt;before the LLM even runs&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You'll see this as: "my agent worked perfectly on turn 1, then on turn 2 it&lt;br&gt;
just... returns immediately with no LLM call." Easy to misdiagnose as a tool&lt;br&gt;
binding issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; explicitly reset &lt;code&gt;structured_response&lt;/code&gt; to &lt;code&gt;None&lt;/code&gt; in your state&lt;br&gt;
initializer or in a pre-processing node before the agent node executes each turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;structured_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reset_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Responses stream accumulator crashes when &lt;code&gt;response.output_item.added&lt;/code&gt; has &lt;code&gt;item=None&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;code&gt;openai/openai-python&lt;/code&gt; (&lt;a href="https://github.com/openai/openai-python/issues/3125" rel="noopener noreferrer"&gt;#3125&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt; The streaming accumulator hard-crashes when a provider sends a&lt;br&gt;
&lt;code&gt;response.output_item.added&lt;/code&gt; event where &lt;code&gt;item&lt;/code&gt; is &lt;code&gt;None&lt;/code&gt;. This happens&lt;br&gt;
intermittently with some Responses API providers — especially during high&lt;br&gt;
concurrency or when streaming through proxies that re-batch events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt;  &lt;span class="c1"&gt;# graceful no-op
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't use &lt;code&gt;getattr(event, "item", None)&lt;/code&gt; — &lt;code&gt;item&lt;/code&gt; is a defined attribute on&lt;br&gt;
the Pydantic model, just nullable. The right check is &lt;code&gt;event.item is None&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defense-in-depth:&lt;/strong&gt; the same guard belongs on &lt;code&gt;response.output_item.done&lt;/code&gt;,&lt;br&gt;
since a provider that sends null on &lt;code&gt;added&lt;/code&gt; could trivially do the same on&lt;br&gt;
&lt;code&gt;done&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pattern across all five
&lt;/h2&gt;

&lt;p&gt;Notice anything? Every single one of these bugs is at the &lt;strong&gt;boundary&lt;/strong&gt; between&lt;br&gt;
your code and an external system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM provider returning ambiguous null vs missing&lt;/li&gt;
&lt;li&gt;LLM provider streaming events that don't match schema&lt;/li&gt;
&lt;li&gt;User-controlled input (env vars, URLs) flowing into a security check&lt;/li&gt;
&lt;li&gt;Checkpoint state surviving across turns when it shouldn't&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building an AI app and want to harden it before something breaks at&lt;br&gt;
2am, &lt;strong&gt;audit your boundary code first&lt;/strong&gt;. Provider responses, streaming events,&lt;br&gt;
checkpoint deserialization, env-var-gated branches. That's where 80% of&lt;br&gt;
production AI bugs live.&lt;/p&gt;

&lt;p&gt;If you've hit a similar bug and want a second set of eyes on it, my GitHub is&lt;br&gt;
&lt;a href="https://github.com/KarchiChen" rel="noopener noreferrer"&gt;github.com/KarchiChen&lt;/a&gt; — I'm tracking issues&lt;br&gt;
across LangChain / LangGraph / OpenAI Python / Anthropic SDK and happy to look&lt;br&gt;
at specific repros.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally drafted from a week of debugging vibe-coded AI MVPs. Each issue&lt;br&gt;
above links to the live GitHub thread — feel free to upvote / pile on if it&lt;br&gt;
matches what you've seen.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>langchain</category>
      <category>debugging</category>
    </item>
  </channel>
</rss>
