<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ray</title>
    <description>The latest articles on DEV Community by Ray (@raymondnl).</description>
    <link>https://dev.to/raymondnl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3931931%2F4f9ea8fb-b436-4516-a76f-2e67cd780e73.png</url>
      <title>DEV Community: Ray</title>
      <link>https://dev.to/raymondnl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raymondnl"/>
    <language>en</language>
    <item>
      <title>My server pushes hints to agents — and the 3 iterations that led there</title>
      <dc:creator>Ray</dc:creator>
      <pubDate>Tue, 09 Jun 2026 22:59:20 +0000</pubDate>
      <link>https://dev.to/raymondnl/my-server-pushes-hints-to-agents-and-the-3-iterations-that-led-there-51a9</link>
      <guid>https://dev.to/raymondnl/my-server-pushes-hints-to-agents-and-the-3-iterations-that-led-there-51a9</guid>
      <description>&lt;p&gt;I avoided MCP from day one. No schema overhead, no token tax. The agent called my GraphQL API directly with a behavior spec and good documentation. I assumed that was enough: clear docs, correct architecture, let the agent figure it out.&lt;/p&gt;

&lt;p&gt;It wasn't. The moment that changed my thinking: watching the agent burn 1,500 tokens on a single upload because it kept guessing JSON field formats wrong, reading docs across multiple pages, and retrying. I fixed the docs. The problem resurfaced on different fields. I fixed those too. It kept coming back. CLI wasn't a nice-to-have. It was the only thing that actually stopped the bleeding. And it was only the first of three iterations before I stumbled into the most interesting one: letting the server talk directly to the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Iteration 1: Avoiding MCP from day one
&lt;/h2&gt;

&lt;p&gt;The MCP context tax is well-documented at this point, so I won't belabor it. My API surface covers 34 commands. As MCP tools, that's 34 schemas × ~180 tokens = &lt;strong&gt;~6,120 tokens of constant overhead&lt;/strong&gt; in every conversation turn, regardless of whether the agent uses them.&lt;/p&gt;

&lt;p&gt;I understood this from first principles and chose a different path: a SKILL.md behavior spec + direct GraphQL API calls. No registered tools, no schema overhead. The agent reads the behavior spec once when the skill is invoked, then calls the API via curl.&lt;/p&gt;

&lt;p&gt;Architecture: correct. Problem: solved?&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: Making the right architectural choice doesn't mean the agent behaves well. The real work starts after the architecture is in place.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Iteration 2: From raw GraphQL to CLI
&lt;/h2&gt;

&lt;p&gt;The agent called my GraphQL API directly. Complex fields (provenance metadata as nested JSON, blocker objects, model config) required it to assemble raw JSON payloads in curl commands.&lt;/p&gt;

&lt;p&gt;It guessed wrong constantly. One wrong field type → GraphQL error → agent reads the API docs to figure out the correct format → docs are detailed and split across multiple pages → 2+ page fetches per retry attempt. A single operation that should cost ~200 tokens was burning 1,500+ in error-recovery loops.&lt;/p&gt;

&lt;p&gt;I tried fixing the docs. Made them more precise, added inline examples, consolidated pages. The problem kept resurfacing on different fields. Every time I fixed one, another appeared.&lt;/p&gt;

&lt;p&gt;The insight: &lt;strong&gt;the issue wasn't documentation quality. It was that raw APIs force agents to assemble structures without type safety, and LLMs are fundamentally bad at this.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Before: agent assembles raw JSON in curl&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST /graphql &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"query":"mutation { uploadAsset(input: { shotId: \"...\", type: \"start_frame\", provenance: { method: \"ai_generated\", model: \"gpt-image-2\", prompt: \"...\" } }) { id } }"}'&lt;/span&gt;

&lt;span class="c"&gt;# After: typed CLI arguments, zero JSON assembly&lt;/span&gt;
python3 nl.py upload &amp;lt;shotId&amp;gt; start_frame frame.png &lt;span class="nt"&gt;--method&lt;/span&gt; ai_generated &lt;span class="nt"&gt;--model&lt;/span&gt; &lt;span class="s2"&gt;"gpt-image-2"&lt;/span&gt; &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"Winter city street"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI dispatcher routes 34 commands through typed arguments. The agent doesn't guess field types or assemble nested objects. It passes flags.&lt;/p&gt;

&lt;p&gt;A bonus: the &lt;code&gt;--json&lt;/code&gt; flag gives the agent structured data for reasoning, while the default gives a human-readable table. One CLI, two audiences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For the agent: structured JSON for parsing&lt;/span&gt;
python3 nl.py overview &amp;lt;noteId&amp;gt; &lt;span class="nt"&gt;--json&lt;/span&gt;

&lt;span class="c"&gt;# For the developer watching: readable progress&lt;/span&gt;
python3 nl.py overview &amp;lt;noteId&amp;gt;
&lt;span class="c"&gt;# Episode 01: The Algorithm Hunter&lt;/span&gt;
&lt;span class="c"&gt;#   [===done===|--review--|......not_started.......] 3/12&lt;/span&gt;
&lt;span class="c"&gt;#   Shot   Status       Rolls    Best   PF&lt;/span&gt;
&lt;span class="c"&gt;#   01A    done         3        48     Y&lt;/span&gt;
&lt;span class="c"&gt;#   01B    review       2        41     Y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson: Don't make your agent assemble what you can pre-structure. CLI arguments are inherently type-safe for LLMs. If your agent is doing error→doc→retry loops, the fix isn't better docs. It's eliminating the assembly step entirely.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Iteration 3: The pause-and-reflect methodology
&lt;/h2&gt;

&lt;p&gt;CLI fixed execution. But the agent still made bad decisions. It would re-roll a rejected video without changing the prompt first. It would write new prompts without checking what had already been tried. It would skip the preflight check and waste a generation on incomplete assets.&lt;/p&gt;

&lt;p&gt;These weren't execution failures. They were judgment failures. The agent did what I asked correctly, but chose wrong actions.&lt;/p&gt;

&lt;p&gt;I stopped production and asked the agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Stop. Before we continue, did you have a lot of inefficient actions just now? Which were because the docs or skill spec aren't clear enough? And is there a recurring scenario where a new API that gives you everything at once would've helped?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent pointed to specific gaps in my SKILL.md:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No explicit rule saying "never re-roll without changing something"&lt;/li&gt;
&lt;li&gt;No guidance on checking past insights before writing new prompts&lt;/li&gt;
&lt;li&gt;Missing decision thresholds (what score means "fix and retry" vs "debug first"?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I patched those gaps. Ran more production. Stopped again. Asked again. Each cycle surfaced new blind spots in the behavior spec.&lt;/p&gt;

&lt;p&gt;This isn't a one-time audit. It's a repeating feedback loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;produce → reflect → polish spec → produce → reflect → ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson: Your agent is both the consumer and the best auditor of your behavior spec. It knows exactly where the spec failed it. You just have to stop and ask, then actually fix what it tells you.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Server-pushed guidance
&lt;/h2&gt;

&lt;p&gt;Even after several reflect cycles, one class of failures persisted, and it taught me the most interesting lesson.&lt;/p&gt;

&lt;p&gt;The agent wrote a video generation prompt but didn't reference any of the uploaded assets. The generated video had nothing to do with the reference frames sitting right there in the project. The assets existed. The agent just… forgot to use them.&lt;/p&gt;

&lt;p&gt;After the failure, I asked: "If there had been a message right before you wrote the prompt listing the available assets and how to reference them, would you have caught this?" The agent said yes.&lt;/p&gt;

&lt;p&gt;So I built it. When the server detects that preflight has passed but the active prompt contains no &lt;code&gt;@filename&lt;/code&gt; references, it injects a hint listing every available asset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Agent wrote a prompt but didn't reference uploaded assets&lt;/span&gt;
&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pendingHints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;available_refs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Available refs for prompting: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;refs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`@&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assetType&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;targetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;shot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;refs&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same method, different failure: the agent uploaded all assets, preflight passed, but it forgot to advance the shot status from &lt;code&gt;asset_prep&lt;/code&gt; to &lt;code&gt;ready&lt;/code&gt;. Another hint, born from the same question: "would a nudge here have prevented this?"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Preflight passed but agent forgot to advance status&lt;/span&gt;
&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pendingHints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ready_to_advance&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Preflight passed but shot is still in asset_prep. Update status to ready.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`nl.py shot-update &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;shotId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; --status ready`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every high-value hint in the system was designed this way: agent fails → I ask "what hint would have prevented this?" → I build the trigger.&lt;/p&gt;

&lt;p&gt;Notice something about these hints: they're not written for me. They're written for the agent. The messages contain CLI commands, &lt;code&gt;@filename&lt;/code&gt; conventions, status values, all the agent's working vocabulary. This isn't a notification system for humans. It's the server talking directly to the agent in its own language.&lt;/p&gt;

&lt;p&gt;And the best part: even if the agent ignores a hint and makes the mistake anyway, the hint is already sitting in its context window. When the agent enters debug mode after the failure, it naturally recalls "there was a hint about this." The hint doesn't need to be obeyed to be useful. It just needs to exist in context.&lt;/p&gt;

&lt;p&gt;Hints travel as &lt;code&gt;extensions.agentHints&lt;/code&gt; in every GraphQL response. On the client, they route to stderr so they don't contaminate the JSON the agent is parsing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agentHints&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;mark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}[&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mark&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's also a pool of probabilistic hints for general best practices, but the high-value ones are all reverse-engineered from specific agent failures, at zero extra database cost, because they're generated from data the mutation already loaded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: Design hints backwards: from failure to trigger, not from architecture to feature. And write them for the agent, not for yourself.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture That Emerged
&lt;/h2&gt;

&lt;p&gt;Three layers, each discovered through a different failure mode:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Solves&lt;/th&gt;
&lt;th&gt;Freedom&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SKILL.md&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wrong decisions&lt;/td&gt;
&lt;td&gt;High: what to do, when, why&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CLI scripts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Assembly errors&lt;/td&gt;
&lt;td&gt;Low: exact operations, zero guessing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;agentHints&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Forgotten context&lt;/td&gt;
&lt;td&gt;Reactive: server speaks when relevant&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Remove any one layer and the agent drifts. The spec without hints means forgotten context. Hints without spec means no decision framework. CLI without either means correct execution of wrong decisions.&lt;/p&gt;




&lt;p&gt;I'm building Narrative Lion, a research tool for content creators that turns the videos you study into a Playbook your AI can actually use. The agent architecture described here runs its production pipeline. Check it out at &lt;a href="https://narrativelion.com/?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=launch_blog" rel="noopener noreferrer"&gt;narrativelion.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Curious how others handle this: do you push guidance from the server, or keep everything in the behavior spec?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
      <category>agentskills</category>
    </item>
  </channel>
</rss>
