<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: souvik roy</title>
    <description>The latest articles on DEV Community by souvik roy (@opengraph-tech).</description>
    <link>https://dev.to/opengraph-tech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3895747%2Fb533a93e-8b0d-4c63-b9b5-e2fad878722c.jpg</url>
      <title>DEV Community: souvik roy</title>
      <link>https://dev.to/opengraph-tech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/opengraph-tech"/>
    <language>en</language>
    <item>
      <title>The 5 Places Every AI Agent Dies (and the 4,000-Line Repo That Fixes All Five)</title>
      <dc:creator>souvik roy</dc:creator>
      <pubDate>Fri, 24 Apr 2026 10:25:00 +0000</pubDate>
      <link>https://dev.to/opengraph-tech/the-5-places-every-ai-agent-dies-and-the-4000-line-repo-that-fixes-all-five-2c7g</link>
      <guid>https://dev.to/opengraph-tech/the-5-places-every-ai-agent-dies-and-the-4000-line-repo-that-fixes-all-five-2c7g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Most agents fail in one of five places. OpenAgent is built for all five."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I've spent the last six months debugging other people's agents in production. Different stacks, different models, different domains. Same five failure modes. Every time.&lt;/p&gt;

&lt;p&gt;If you've ever shipped an agent that worked beautifully in the demo and then produced a confident, fluent, &lt;strong&gt;completely wrong&lt;/strong&gt; answer in prod — this post is for you.&lt;/p&gt;

&lt;p&gt;We'll walk through the five places agents die, and we'll use a repo called &lt;a href="https://github.com/OpenGraph-AI/OpenAgent" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenAgent&lt;/strong&gt;&lt;/a&gt; as our reference implementation. It's ~4,000 lines. MIT. Reads like a cookbook. By the end of this post, you'll either fork it, or you'll understand your own agent well enough to fix it.&lt;/p&gt;

&lt;p&gt;Either is a win.&lt;/p&gt;




&lt;h2&gt;
  
  
  The TL;DR (for the skimmers)
&lt;/h2&gt;

&lt;p&gt;Every production agent needs to answer five questions, in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intent&lt;/strong&gt; — what is the user &lt;em&gt;actually&lt;/em&gt; asking?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambiguity&lt;/strong&gt; — what do I not know yet?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clarifier&lt;/strong&gt; — should I ask, or look it up?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner&lt;/strong&gt; — what's the smallest set of steps that gets us there?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Executor&lt;/strong&gt; — how do I run those steps without losing the thread?
Most agent frameworks collapse two or three of these into a single prompt. That's why they break in ways you can't debug. OpenAgent keeps all five as separate, typed, testable stages.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Intent ▸ Ambiguity ▸ Clarifier ▸ Planner ▸ Executor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each stage has a &lt;strong&gt;typed input&lt;/strong&gt; and a &lt;strong&gt;typed output&lt;/strong&gt;. The Pydantic schema between any two stages is your test surface — and your debug trail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why most agents fail
&lt;/h2&gt;

&lt;p&gt;Picture the most common agent architecture shipped in 2024–2025:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;done&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This "mega-prompt in a while loop" works for demos. It dies in prod because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There's no notion of "what does the user actually want" separate from "what do I do next."&lt;/li&gt;
&lt;li&gt;There's no moment where the agent admits it doesn't know enough.&lt;/li&gt;
&lt;li&gt;There's no plan you can inspect, edit, or resume.&lt;/li&gt;
&lt;li&gt;When it breaks, the only debug signal is a 4,000-token transcript.
You don't have an agent. You have a stochastic &lt;code&gt;while&lt;/code&gt; loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAgent's thesis: &lt;strong&gt;split the loop into five small specialists, wire them with typed contracts, and never let the LLM decide control flow.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's go stage by stage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 1 — Intent: turn fuzz into a function signature
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem.&lt;/strong&gt; Humans don't type goals. They type fragments, vibes, half-sentences. &lt;code&gt;"can you make this better"&lt;/code&gt; is not a specification. Executing on it gives you a confident wrong answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The job.&lt;/strong&gt; Turn raw text into a typed object. Goal, context, constraints, output format, success criteria — and, crucially, &lt;strong&gt;alternative interpretations&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IntentSchema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;expected_output_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;success_criteria&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;alternative_interpretations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# your future self thanks you
&lt;/span&gt;    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;                         &lt;span class="c1"&gt;# triggers the next stage
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last field — &lt;code&gt;alternative_interpretations&lt;/code&gt; — is the one most tutorials skip and the one that saves you. If the model lists three plausible reads of the request, you have a signal that the intent is fuzzy. That signal flows into Stage 2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mental model:&lt;/strong&gt; intent is the &lt;em&gt;function signature&lt;/em&gt;. Until you have it, you don't have a problem. You have a feeling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 2 — Ambiguity: your agent's epistemic humility layer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem.&lt;/strong&gt; Even a cleanly extracted intent can be under-specified. &lt;code&gt;"Write a blog post about our launch"&lt;/code&gt; is structurally fine but missing audience, length, tone, deadline, channel. An agent that steamrolls past this produces a polished artifact no one asked for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The job.&lt;/strong&gt; Audit the intent along fixed dimensions — scope, audience, depth, format, deadline, domain — and flag each with a severity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AmbiguityReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AmbiguityFlag&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# dimension, level, impact
&lt;/span&gt;    &lt;span class="n"&gt;needs_clarification&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This report is a &lt;strong&gt;decision gate&lt;/strong&gt;. The pipeline branches on &lt;code&gt;needs_clarification&lt;/code&gt;, not on a gut feel. If medium-or-higher flags exist, we route to Stage 3. Otherwise we sail straight to planning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key separation:&lt;/strong&gt; the ambiguity agent flags &lt;em&gt;what's missing&lt;/em&gt;. It does &lt;strong&gt;not&lt;/strong&gt; write the clarifying questions. Mixing those two jobs optimizes both poorly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 3 — Clarifier: ask, or look it up?
&lt;/h2&gt;

&lt;p&gt;This is the stage that makes OpenAgent &lt;em&gt;feel&lt;/em&gt; different from every other agent you've built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The naive fix for ambiguity is "just ask the user."&lt;/strong&gt; Do that every time and your agent becomes a questionnaire. Users bounce after three questions. Seven is a bloodbath.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The better fix:&lt;/strong&gt; answer what the web can answer. Ask the user only for what they alone know.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Derive targeted web queries from each ambiguity flag
&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;clarifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_questions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ambiguity_report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. For each question: can this be confidently answered from the search results?
&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;clarifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;auto_resolve_questions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ambiguity_report&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Split resolved from unresolved
&lt;/span&gt;&lt;span class="n"&gt;unresolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auto_resolved&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unresolved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;send_to_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unresolved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# pause the pipeline
&lt;/span&gt;    &lt;span class="n"&gt;answers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;wait_for_user_response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;answers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clarifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build_auto_answers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# sail through
&lt;/span&gt;
&lt;span class="n"&gt;clarified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;clarifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_answers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ambiguity_report&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Think of this as a &lt;strong&gt;cost-triage&lt;/strong&gt; step. User attention is the most expensive resource your agent has. Spend it only on personal or organizational context — things that are &lt;em&gt;genuinely&lt;/em&gt; unknowable without the user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defaults worth stealing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confidence threshold for auto-resolve: &lt;code&gt;0.7&lt;/code&gt;. Lower and the model confabulates sources.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - Max questions to the user: &lt;strong&gt;3&lt;/strong&gt;. Users answer 3. They abandon 7.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Stage 4 — Planner: a DAG, not a vibe
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem.&lt;/strong&gt; Dropping a clarified goal into a single "do the thing" prompt gives you a brittle monolith. The model can't back up. You can't resume. You can't verify anything until the whole thing finishes — and by then, you're five paragraphs deep into the wrong answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The job.&lt;/strong&gt; Turn a clarified intent into &lt;strong&gt;numbered, dependency-aware, independently verifiable steps&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PlanStep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;step_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;       &lt;span class="c1"&gt;# topological execution
&lt;/span&gt;    &lt;span class="n"&gt;validation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;               &lt;span class="c1"&gt;# how to know it succeeded
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two fields here are non-negotiable in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;dependencies&lt;/code&gt;&lt;/strong&gt; — lets the executor topologically sort, and later parallelize independent branches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;validation&lt;/code&gt;&lt;/strong&gt; — turns "done" from a feeling into a checkable predicate.
If you can't write a validation criterion, the step is too vague. Rewrite it. This alone will make your agent 10× more reliable.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Stage 4.5 — Context: gather before you act
&lt;/h2&gt;

&lt;p&gt;Most tutorials skip this. It's the highest-leverage hidden stage in the whole pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem.&lt;/strong&gt; An executor that reaches for tools &lt;em&gt;mid-generation&lt;/em&gt; is slow and erratic. The model decides while generating what to search for, then context-switches. Quality drops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The job.&lt;/strong&gt; Before executing &lt;em&gt;any&lt;/em&gt; step, read the whole plan and, for each step, decide what it needs: knowledge-base lookups, web searches, outputs from dependency steps. Fan out all retrievals &lt;strong&gt;in parallel&lt;/strong&gt;. Attach the results to each step as a &lt;code&gt;StepContext&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;resource_plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clarified_intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_state&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# resource_plan.step_contexts[i] contains everything step i needs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Separate gathering (embarrassingly parallel) from reasoning (serial). Don't interleave them. Sequential retrievals leave 3–5× latency on the table.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 5 — Executor: run the steps, keep the thread, prove you hit the goal
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem.&lt;/strong&gt; "Execute the plan" is another vibe. A real executor has to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run steps in dependency order.&lt;/li&gt;
&lt;li&gt;Pass prior outputs into dependents.&lt;/li&gt;
&lt;li&gt;Stream chunks to the UI so it doesn't freeze.&lt;/li&gt;
&lt;li&gt;Survive a step failure without corrupting the rest.&lt;/li&gt;
&lt;li&gt;At the end, &lt;strong&gt;prove the deliverable actually answers the original goal&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;topological_sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resource_plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step_contexts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step_number&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;send_chunk&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;step_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step_number&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assemble_final&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="c1"&gt;# final includes:
#   output, completeness_check, clarity_check,
#   relevance_check, correctness_check, trace_to_goal
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last field — &lt;strong&gt;&lt;code&gt;trace_to_goal&lt;/code&gt;&lt;/strong&gt; — is what catches a technically-correct, goal-irrelevant output. It's the check that turns a pipeline into an agent you can actually trust.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five recipes you can steal today
&lt;/h2&gt;

&lt;p&gt;Even if you don't adopt OpenAgent, steal these five patterns. They each solve a class of bugs that costs teams days:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Stream by default, not as an afterthought.&lt;/strong&gt; Every agent should accept an &lt;code&gt;on_stream&lt;/code&gt; callback. Your UI code should never have two paths.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Pause the pipeline like a coroutine, not a state machine.&lt;/strong&gt; When the clarifier needs user input, suspend between phases and resume when the answer arrives. Use a session object that knows its current phase — not a pile of booleans.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cache the intent, not the final output.&lt;/strong&gt; Intent is a deterministic function of &lt;code&gt;user_text + prompt&lt;/code&gt;. Cache it. Final output depends on the full session including clarifications — caching it will bite you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Typed schemas at every boundary.&lt;/strong&gt; Pydantic between agents is not ceremony. It's your test surface. Bad outputs get caught at parse time, not six steps later when something dereferences a missing field.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Keep the LLM out of control flow.&lt;/strong&gt; The LLM decides &lt;em&gt;content&lt;/em&gt;. Python decides &lt;em&gt;flow&lt;/em&gt; — which phase runs, when to pause, when to retry, when to fall back. If your prompt has an &lt;code&gt;if tool_name == "ask_user":&lt;/code&gt; branch, you've inverted it.&lt;/p&gt;




&lt;h2&gt;
  
  
  How this compares to LangGraph / CrewAI / AutoGen
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;OpenAgent&lt;/th&gt;
&lt;th&gt;LangGraph&lt;/th&gt;
&lt;th&gt;CrewAI&lt;/th&gt;
&lt;th&gt;AutoGen&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mental model&lt;/td&gt;
&lt;td&gt;Typed pipeline&lt;/td&gt;
&lt;td&gt;Graph of nodes&lt;/td&gt;
&lt;td&gt;Role-playing crew&lt;/td&gt;
&lt;td&gt;Multi-agent chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typed contracts between stages&lt;/td&gt;
&lt;td&gt;✅ Pydantic&lt;/td&gt;
&lt;td&gt;⚠️ Optional&lt;/td&gt;
&lt;td&gt;⚠️ Loose&lt;/td&gt;
&lt;td&gt;⚠️ Loose&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-resolving clarifier&lt;/td&gt;
&lt;td&gt;✅ Built-in&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework weight&lt;/td&gt;
&lt;td&gt;~4k LOC&lt;/td&gt;
&lt;td&gt;Heavy&lt;/td&gt;
&lt;td&gt;Heavy&lt;/td&gt;
&lt;td&gt;Heavy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Pause for user" first-class&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️ Via interrupts&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Via prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reads like a cookbook&lt;/td&gt;
&lt;td&gt;✅ By design&lt;/td&gt;
&lt;td&gt;⚠️ Reference docs&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When to pick OpenAgent:&lt;/strong&gt; you want to understand every moving part, control each prompt, and own your agent's reasoning end-to-end — not inherit someone else's abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to pick a framework:&lt;/strong&gt; you want to ship fast without thinking about architecture, and the framework's defaults match your domain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quickstart (literally 60 seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/OpenGraph-AI/OpenAgent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;OpenAgent
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env       &lt;span class="c"&gt;# set LLM_API_KEY at minimum&lt;/span&gt;
python run.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8000/static/index.html&lt;/code&gt;, type a fuzzy request, and watch each phase stream into the UI in real time: intent extraction, ambiguity flags, clarifying questions, the plan, and the executor producing the answer step-by-step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimum config:&lt;/strong&gt; one variable — &lt;code&gt;LLM_API_KEY&lt;/code&gt;. Works with any OpenAI-compatible provider. No Redis? Falls back to in-memory. No Exa? Skips web search. Missing keys are features, not errors.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to start reading the code
&lt;/h2&gt;

&lt;p&gt;If you clone it, open files in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://github.com/OpenGraph-AI/OpenAgent/blob/main/backend/models/schemas.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/models/schemas.py&lt;/code&gt;&lt;/a&gt; — the contracts between phases. Read this first. Everything else is transformations.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/OpenGraph-AI/OpenAgent/blob/main/backend/agents/intent_agent.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/agents/intent_agent.py&lt;/code&gt;&lt;/a&gt; — the simplest agent. A clean template for your own.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/OpenGraph-AI/OpenAgent/blob/main/backend/agents/clarification_agent.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/agents/clarification_agent.py&lt;/code&gt;&lt;/a&gt; — the most interesting. Auto-resolve via web search is the trick worth stealing.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/OpenGraph-AI/OpenAgent/blob/main/backend/orchestrator/pipeline.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/orchestrator/pipeline.py&lt;/code&gt;&lt;/a&gt; — how phases are wired, paused, and resumed.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  5. &lt;a href="https://github.com/OpenGraph-AI/OpenAgent/blob/main/backend/agents/execution_agent.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/agents/execution_agent.py&lt;/code&gt;&lt;/a&gt; — step-by-step execution with per-step context injection.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The meta-lesson
&lt;/h2&gt;

&lt;p&gt;The real insight in OpenAgent isn't any single stage. It's the &lt;strong&gt;shape of the solution&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The LLM hype cycle trained us to think bigger prompt = better agent. In production, the opposite is true. Better agents come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller prompts.&lt;/li&gt;
&lt;li&gt;Stronger contracts between prompts.&lt;/li&gt;
&lt;li&gt;Explicit control flow &lt;em&gt;outside&lt;/em&gt; the LLM.&lt;/li&gt;
&lt;li&gt;Pauses, retries, and fallbacks as first-class citizens.
If you're building an agent right now, forget the framework wars for a minute. Ask yourself: &lt;strong&gt;can I point to the five places mine could fail, and the typed object that lives at each boundary?&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If yes, ship.&lt;/p&gt;

&lt;p&gt;If no, spend an afternoon reading &lt;a href="https://github.com/OpenGraph-AI/OpenAgent" rel="noopener noreferrer"&gt;OpenAgent&lt;/a&gt;. Then go back and fix yours.&lt;/p&gt;




&lt;h2&gt;
  
  
  Your turn
&lt;/h2&gt;

&lt;p&gt;Drop a comment with the stage &lt;strong&gt;you&lt;/strong&gt; struggle with most — intent extraction, ambiguity flagging, clarification UX, planning, execution tracing — and I'll reply with the specific file in OpenAgent that nails it.&lt;/p&gt;

&lt;p&gt;And if this saved you a 2am debugging session, the repo lives or dies on one thing:&lt;/p&gt;

&lt;p&gt;⭐ &lt;strong&gt;&lt;a href="https://github.com/OpenGraph-AI/OpenAgent" rel="noopener noreferrer"&gt;Star OpenAgent on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Fork it. Break it. Build yours on top of it. That's the whole point.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with intent, by the folks at &lt;a href="https://opengraph.tech" rel="noopener noreferrer"&gt;OpenGraph.tech&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
