<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: erlangb</title>
    <description>The latest articles on DEV Community by erlangb (@erlangb).</description>
    <link>https://dev.to/erlangb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817288%2F615db6e4-f7e0-4a7c-b3e5-5319cb2e5980.jpg</url>
      <title>DEV Community: erlangb</title>
      <link>https://dev.to/erlangb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/erlangb"/>
    <language>en</language>
    <item>
      <title>A Movie Finder with AI Reflexion using GoLang</title>
      <dc:creator>erlangb</dc:creator>
      <pubDate>Wed, 11 Mar 2026 21:34:36 +0000</pubDate>
      <link>https://dev.to/erlangb/a-movie-finder-with-ai-reflexion-using-golang-3n0i</link>
      <guid>https://dev.to/erlangb/a-movie-finder-with-ai-reflexion-using-golang-3n0i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: The "Vibes-Based" Engineering Trap
&lt;/h2&gt;

&lt;p&gt;We’ve all been there. You ask an LLM for "underground 80s sci-fi," and it starts strong with Blade Runner (hardly underground). Then, desperate to please, it hallucinations: "Have you seen Neon Shadows (1984)?" It sounds perfect. It sounds real. It doesn’t exist.&lt;/p&gt;

&lt;p&gt;In a side project, that’s a "lol." In production, that’s a total failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: The Confidence Gap
&lt;/h2&gt;

&lt;p&gt;LLMs aren't stupid; they are just pathologically helpful. They prioritize being pleasant over being factual because they lack a Skepticism Layer. Most developers fall into the trap of Linear Prompting:&lt;/p&gt;

&lt;p&gt;Send request.&lt;/p&gt;

&lt;p&gt;Hope for the best.&lt;/p&gt;

&lt;p&gt;But hope is not an engineering strategy. To build reliable Agentic AI, we need to move from "sending prompts" to "building pipelines that verify."&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Reflexion and Orchestration
&lt;/h3&gt;

&lt;p&gt;To solve this for my "Movie Finder" use case, I didn't just write a better prompt. I implemented the &lt;strong&gt;Reflexion Pattern&lt;/strong&gt;: an architectural loop where one agent's output is treated as a "draft" that must survive a rigorous audit by a second, skeptical agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cgnfyjem2yj7p3pb06o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cgnfyjem2yj7p3pb06o.jpg" alt="loop-movie"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To bridge this gap, I used the &lt;strong&gt;EINO framework&lt;/strong&gt;. EINO (pronounced 'ay-no') is a Go-native orchestration framework designed specifically for LLM workflows. It allows you to model complex agentic logic as a graph of nodes, which was perfect for implementing the &lt;strong&gt;Reflexion Pattern&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;🛠️ Open Source &amp;amp; Local Setup&lt;br&gt;
The full source code for this project is available on GitHub. To visualize and monitor the agent's reasoning steps, I used two libraries I developed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/erlangb/agent_monitor" rel="noopener noreferrer"&gt;agent_monitor&lt;/a&gt;: The core Go project to run usecases&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/erlangb/agentmeter" rel="noopener noreferrer"&gt;agentmeter&lt;/a&gt;: a library for capturing and printing agent internals.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;"Why did I build an AI agent in &lt;strong&gt;Go&lt;/strong&gt; instead of Python?&lt;br&gt;
The honest answer: &lt;strong&gt;It's what I know.&lt;/strong&gt; But beyond familiarity, I wanted to explore the current 'state of the art' for Agentic AI in the Go ecosystem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this article, I’ll show you how I moved beyond simple prompts to a dual-agent system. By pitting a Cinephile against a Clerk, I’ve built an adversarial loop where agents "argue" their way toward grounded truth.&lt;/p&gt;

&lt;p&gt;It’s not just about getting an answer; it’s about building a system that uses systematic skepticism to virtually eliminate hallucinations.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Concept: What is the Reflexion Pattern?
&lt;/h2&gt;

&lt;p&gt;At its core, the &lt;strong&gt;Reflexion Pattern&lt;/strong&gt; is a design pattern for LLM agents that introduces a "self-correction" loop.&lt;br&gt;
Think of a standard AI agent as a solo freelancer working without an editor. They produce work, and you get what you get. The Reflexion Pattern turns that solo freelancer into a &lt;strong&gt;team of two&lt;/strong&gt;: one who creates, and one who audits.&lt;/p&gt;
&lt;h3&gt;
  
  
  How it works (The 3-Step Dance)
&lt;/h3&gt;

&lt;p&gt;In my movie finder, the loop follows a specific cycle of &lt;strong&gt;Generation&lt;/strong&gt;, &lt;strong&gt;Critique&lt;/strong&gt;, and &lt;strong&gt;Correction&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generation (The Draft):&lt;/strong&gt; The first agent (The Cinephile) receives the user's request and generates a response. It operates purely on its internal training data—which, as we know, can be prone to "stochastic dreaming" (hallucinations).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critique (The Fact-Check):&lt;/strong&gt; Instead of showing the user the result, the output is passed to the second agent (The Clerk). This agent is given a specific "Skeptic" persona and, crucially, access to &lt;strong&gt;External Tools&lt;/strong&gt; (in this case, the &lt;strong&gt;Tavily Search API&lt;/strong&gt;). Its only job is to find reasons why the first agent might be wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correction (The Iteration):&lt;/strong&gt; If the Clerk finds an error, it doesn't just fail the process. It generates a &lt;strong&gt;feedback signal&lt;/strong&gt;—a structured message explaining &lt;em&gt;what&lt;/em&gt; was wrong and &lt;em&gt;why&lt;/em&gt;. This feedback is fed back into the first agent, which now has a "second chance" to get it right.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  The Architecture: Mapping the Graph
&lt;/h2&gt;

&lt;p&gt;To implement the Reflexion pattern in Go, I used EINO's Graph composition. This allows us to treat our agents as independent nodes connected by edges, including a conditional "branch" that creates our self-correction loop.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Pipeline Logic
&lt;/h3&gt;

&lt;p&gt;Here is the simplified implementation of the FindMoviesPipeline. Notice how the "Loop" isn't a complex for loop in the code, but a visual branch in the graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewFindMoviesPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cinephile&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CinephileAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;clerk&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ClerkAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;curator&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CuratorChain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FindMoviesPipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// 1. Initialize the Graph with a shared State&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewGraph&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;appmodel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FindMoviesState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;appmodel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FindMoviesState&lt;/span&gt;&lt;span class="p"&gt;]()&lt;/span&gt;

    &lt;span class="c"&gt;// 2. Add the Agent Nodes&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddLambdaNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cinephile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InvokableLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cinephile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Invoke&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddLambdaNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"clerk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InvokableLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clerk&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Invoke&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddLambdaNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"curator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InvokableLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;curator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Invoke&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c"&gt;// 3. Define the linear flow&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"cinephile"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cinephile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"clerk"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// 4. The Reflexion Branch: The "Skeptic" decides where to go next&lt;/span&gt;
    &lt;span class="n"&gt;branch&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewGraphBranch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;appmodel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FindMoviesState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="c"&gt;// If the Clerk is happy OR we've tried too many times, move to curation&lt;/span&gt;
          &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSatisfied&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RetryCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MaxRetries&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
             &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"curator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="c"&gt;// Otherwise, send it back to the Cinephile for correction&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"cinephile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
       &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"curator"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"cinephile"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddBranch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"clerk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"curator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithGraphName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"find_movies_graph"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this code matters:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The State Object: The FindMoviesState acts as the "Short-term Memory." It carries the current list of movies, the Clerk's critiques, and the RetryCount.&lt;/li&gt;
&lt;li&gt;Decoupled Logic: The cinephile doesn't know the clerk exists. It just knows it receives a state and returns a state. This makes &lt;strong&gt;testing individual agents&lt;/strong&gt; much easier.&lt;/li&gt;
&lt;li&gt;The Branch is the Brain: The NewGraphBranch function is where the Reflexion happens. It forces the system to be honest. If state.IsSatisfied is false, the data is physically impossible to reach the curator node.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Architecture: Inside the FindMoviesUseCase
&lt;/h2&gt;

&lt;p&gt;In my FindMoviesUseCase, the data moves through four distinct stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Node Breakdown
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. The Refiner (The Translator)&lt;/strong&gt;&lt;br&gt;
Before any "thinking" happens, we need structure. The &lt;strong&gt;RefinerChain&lt;/strong&gt; takes the user's messy, natural language input—&lt;em&gt;"I want some weird 70s space movies that feel like David Bowie's music"&lt;/em&gt;—and converts it into a clean Go struct.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; Structured parameters like &lt;em&gt;primary_genre&lt;/em&gt;, &lt;em&gt;secondary_genres&lt;/em&gt;, &lt;em&gt;end_year&lt;/em&gt;, &lt;em&gt;start_year&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input: &amp;gt; "I want some weird 70s space movies that feel like David Bowie's music"

RefinerChain Output:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"primary_genre"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"science fiction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"secondary_genres"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"weird"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"space"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"musical vibe"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"start_year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1970&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"end_year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1979&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_classic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"original_text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I want some weird 70s space movies that feel like David Bowie's music"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"query_info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weird space science fiction 1970s Bowie vibe"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;2. The Cinephile (The Creative Brain)&lt;/strong&gt;&lt;br&gt;
This is our primary Generator. It uses the refined parameters to search its internal knowledge and propose a curated list of films.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Risk:&lt;/strong&gt; This is where hallucinations live. If the LLM "remembers" a movie that doesn't exist, it will confidently include it here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. The Clerk (The Auditor &amp;amp; Tool User)&lt;/strong&gt;&lt;br&gt;
This is the heart of the &lt;strong&gt;Reflexion Loop&lt;/strong&gt;. The Clerk is a "Skeptic" node equipped with the &lt;strong&gt;Tavily Search tool&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Process:&lt;/strong&gt; It takes every movie from the Cinephile's list and verifies it against the real world.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Decision:&lt;/strong&gt; If the movies match the user query is set satisfied true, otherwise it returns the list of critiques for every movie found. If one or more movies have passed the critique, it the Cinephile will keep them, and add more movies to be verified by the clerk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. The Curator (The Final Editor)&lt;/strong&gt;&lt;br&gt;
Once the loop is broken (either through success or reaching the MaxRetries limit), the data hits the &lt;strong&gt;CuratorChain&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It performs a final prune. It read the last clerk response and movie list and finalize the result to the end user.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Loop in Action: A Real-World "Argument"
&lt;/h2&gt;

&lt;p&gt;To see the value of the Reflexion pattern, we have to look at how the agents interact when things go wrong. In this example, I asked the system for:&lt;br&gt;
&lt;em&gt;"Italian movies from the late 90s about the new millennium."&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Round 1: The Cinephile's "Stochastic Dreaming"
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;CinephileAgent&lt;/strong&gt; generated three suggestions. They looked plausible, but there were hidden hallucinations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Ecco fatto&lt;/em&gt; (1998)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Tutto l'amore che c'è&lt;/em&gt; (1999)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Luna e l'altra&lt;/em&gt; (1996)&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Round 2: The Clerk's Skepticism (Tavily Search)
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;ClerkAgent&lt;/strong&gt; immediately triggered a series of parallel &lt;strong&gt;Tavily searches&lt;/strong&gt; to verify these titles.&lt;br&gt;
It didn't just check if the movies existed; it checked the metadata against the user's specific constraints (Year: 1997–1999). Here is the "Correction Note" it generated:&lt;br&gt;
&lt;strong&gt;Clerk:&lt;/strong&gt; &lt;em&gt;"isSatisfied: false. 'Tutto l'amore che c'è' is actually a 2000 film, not 1999. 'Luna e l'altra' is from 1996, which is outside the requested range. Replace these."&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Round 3: The Correction
&lt;/h3&gt;

&lt;p&gt;The graph routed the state back to the &lt;strong&gt;Cinephile&lt;/strong&gt;. Crucially, the Cinephile was "aware" of its previous mistakes because they were stored in the shared FindMoviesState.&lt;br&gt;
&lt;strong&gt;Cinephile:&lt;/strong&gt; &lt;em&gt;"Previous critiques to fix: Replace Tutto l'amore... and Luna e l'altra."&lt;/em&gt; It kept &lt;em&gt;Ecco fatto&lt;/em&gt; (which passed) and proposed new candidates: &lt;em&gt;Cose che non ti ho mai detto&lt;/em&gt; and &lt;em&gt;I piccoli maestri&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Round 4: Deep Verification
&lt;/h3&gt;

&lt;p&gt;The Clerk is a tough critic. It rejected the new suggestions too, but for deeper reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;National Identity:&lt;/strong&gt; It caught that &lt;em&gt;Cose che non ti ho mai detto&lt;/em&gt; is actually a Spanish-American film, not Italian.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thematic Alignment:&lt;/strong&gt; It caught that &lt;em&gt;I piccoli maestri&lt;/em&gt; is a WWII resistance film—technically Italian and from 1998, but it has &lt;strong&gt;nothing&lt;/strong&gt; to do with the "new millennium" theme.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The Final Result: Deterministic Success
&lt;/h3&gt;

&lt;p&gt;Finally, the loop closed on a verified, accurate list:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ecco fatto&lt;/strong&gt; (1998)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tutti giù per terra&lt;/strong&gt; (1997)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without the Reflexion loop, the user would have received a list where 66% of the data was technically wrong.&lt;/p&gt;

&lt;p&gt;Following a slider where you can see all the output.&lt;/p&gt;

&lt;p&gt;

&lt;iframe height="600" src="https://codepen.io/daniele-dangeli/embed/VYKmVRp?height=600&amp;amp;default-tab=result&amp;amp;embed-version=2"&gt;
&lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: From "Stochastic" to "Deterministic"
&lt;/h2&gt;

&lt;p&gt;LLMs are "stochastic" (probabilistic) by nature. They are built to predict the next word, not to tell the truth. By implementing the &lt;strong&gt;Reflexion Pattern&lt;/strong&gt;, we transform that probability into a more "deterministic" system. If the Clerk doesn't find a factual match on the web, the data simply does not pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification is the New Optimization
&lt;/h2&gt;

&lt;p&gt;Instead of spending weeks fine-tuning a model or "begging" a prompt to be accurate, we can achieve better results by giving agents &lt;strong&gt;tools&lt;/strong&gt; (like Tavily) and &lt;strong&gt;feedback&lt;/strong&gt;. The "Cinephile vs. Clerk" interaction proves that two specialized agents working in a loop will always outperform a single "Generalist" agent trying to do everything at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building this wasn't just about finding niche Italian movies; it was about exploring how we can trust the software we build in the age of AI. If you are a Go developer, don't wait for a "Python-equivalent" to emerge. The tools are already here.&lt;br&gt;
The next time your LLM hallucinates, don't just change the prompt. &lt;strong&gt;Change the architecture.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>go</category>
      <category>llm</category>
    </item>
    <item>
      <title>The Hidden Cost of MCP Tools: a 2.5x Token Reduction to Save 50% in Costs</title>
      <dc:creator>erlangb</dc:creator>
      <pubDate>Wed, 11 Mar 2026 00:38:27 +0000</pubDate>
      <link>https://dev.to/erlangb/the-hidden-cost-of-mcp-tools-a-25x-token-reduction-to-save-50-in-costs-3d21</link>
      <guid>https://dev.to/erlangb/the-hidden-cost-of-mcp-tools-a-25x-token-reduction-to-save-50-in-costs-3d21</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I want to be clear: I'm not an AI guru. I'm just a developer running experiments with "agentic" programming in Go, trying to see what actually works once you move past the "Hello World" phase.&lt;/p&gt;

&lt;p&gt;After finishing this &lt;a href="https://www.coursera.org/learn/agentic-ai-with-langchain-and-langgraph" rel="noopener noreferrer"&gt;course&lt;/a&gt; about &lt;em&gt;Langchain&lt;/em&gt; and &lt;em&gt;LangGraph&lt;/em&gt;, I wanted to find a practical way to build an agent. Since most of my experience is in &lt;em&gt;Go&lt;/em&gt;, I started exploring the Go ecosystem and came across the &lt;a href="https://github.com/cloudwego/eino" rel="noopener noreferrer"&gt;EINO&lt;/a&gt; framework.&lt;/p&gt;

&lt;p&gt;Almost immediately, I hit a wall: how do you actually keep track of the steps, actions, and results in a system where more than one actor is involved? I started with the usual approach—debugging and following logs—but I quickly realized that logs weren't enough to see the full picture.&lt;/p&gt;

&lt;p&gt;To help me see what was happening under the hood, I built two small projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/erlangb/agent_monitor" rel="noopener noreferrer"&gt;agent_monitor&lt;/a&gt;&lt;/strong&gt;: A Go observability playground for inspecting and running agentic pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/erlangb/agentmeter" rel="noopener noreferrer"&gt;agentmeter&lt;/a&gt;&lt;/strong&gt;: A library specifically designed to track tokens and reasoning traces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm not mentioning these to show off the code, but just because I used these projects to run different experiments.&lt;/p&gt;

&lt;p&gt;This is the first in a series of articles where I'll share what I'm learning about building agents in Go (though the logic applies to any language). In this post, we're looking at &lt;strong&gt;MCP tool optimization&lt;/strong&gt;. In the next one, I'll dive into a &lt;strong&gt;movie reflection system&lt;/strong&gt; I built to help agents double-check their own decisions and limit hallucinations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Aha!" Moment
&lt;/h2&gt;

&lt;p&gt;While using these tools to inspect my own agents, I noticed something. Initially, I took the easy route: I injected a raw MCP client connection directly into the Agent. Since an MCP connection is designed to expose all available tools, I figured, "Let the agent have everything; it's smart enough to handle it."&lt;/p&gt;

&lt;p&gt;I was wrong. After running tests with Tavily Search and MapBox MCPs, I realised that giving an agent raw, unfiltered access to an MCP connection is usually a bad idea.&lt;/p&gt;

&lt;p&gt;If you're an expert in the field, this first post might seem trivial. But if you're just starting to approach MCP, I hope these findings save you some time. Even for a simple pipeline, you must consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A Tool Filter&lt;/strong&gt;: To control exactly which tools the agent can see for a specific task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Tool Overlay&lt;/strong&gt;: A custom layer that uses a "Tolerant Reader" approach to prune the tool's response before the LLM ever sees it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's dive into the Tool Overlay. The tool filtering it's also important to dont overload the agent context with tools you don't need. &lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup: A Simple Test
&lt;/h2&gt;

&lt;p&gt;I set up an instructed LLM travel agent with one job: "Suggest 10 places to visit in Rome." I used the Tavily MCP search tool to get the data.&lt;/p&gt;

&lt;p&gt;I ran the experiment twice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run 1 (The Lazy Way):&lt;/strong&gt; I let the agent use the MCP client to access the tool directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run 2 (The Structured Approach):&lt;/strong&gt; I added a layer between the tool and the LLM to parse the response and remove the fields I didn't need.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Run 1: The Raw MCP Response
&lt;/h2&gt;

&lt;p&gt;MCP tools return verbose responses by design. Here is a snippet of what Tavily actually sends back for a single search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"top 10 places to visit in Florence"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Top 10 Must See Places in Florence.."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.romecabs.com/blog/docs/top-10-must-see-places-rome/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"**Piazza del Duomo**, the **Gallery of the Academy**, **Uffizi** **Gallery**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.80405265&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"raw_content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"favicon"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"response_time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.67"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"auto_parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"travel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"search_depth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"basic"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"credits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"request_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123e4567-e89b-12d3-a456-426614174111"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Reasoning &amp;amp; Results (Run 1):&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz9evucigy43yjc2bqzv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz9evucigy43yjc2bqzv.png" alt="reasoning full tavily body " width="800" height="389"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnpwjs9fjupma5cnodyd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnpwjs9fjupma5cnodyd.png" alt="result full tavily body" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The "Tolerant Reader" Overlay
&lt;/h2&gt;

&lt;p&gt;In my second run, I defined a custom tool using the same &lt;br&gt;
Tavily MCP connection but overridden the search function, unmarshaling only the &lt;code&gt;content&lt;/code&gt; and &lt;code&gt;score&lt;/code&gt; fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// TavilyResult holds only the fields we care about&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;TavilyResult&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Content&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;  &lt;span class="s"&gt;`json:"content"`&lt;/span&gt;
    &lt;span class="n"&gt;Score&lt;/span&gt;   &lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="s"&gt;`json:"score"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// ... inside the tool call ...&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TavilySearchResponse&lt;/span&gt;
&lt;span class="c"&gt;// We Unmarshal only the essential fields&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sonic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnmarshalString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mcpRawResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sonic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Marshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Reasoning &amp;amp; Results (Run 2):&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsofzkn53lkfoa242u6vo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsofzkn53lkfoa242u6vo.png" alt="reasoning running result structured tavily" width="800" height="356"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F55ig92d81iwtmlw0rwmx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F55ig92d81iwtmlw0rwmx.png" alt="running result structured tavily" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Analyze the Results
&lt;/h2&gt;

&lt;p&gt;When I looked at the output, the difference was significant—especially considering this is just a single, simple interaction.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Raw MCP (&lt;code&gt;tavily_raw&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;Parsed (&lt;code&gt;tavily_parsed&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool payload&lt;/td&gt;
&lt;td&gt;12,184 b&lt;/td&gt;
&lt;td&gt;5,115 b&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.4x smaller (−58%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens in&lt;/td&gt;
&lt;td&gt;4,157&lt;/td&gt;
&lt;td&gt;1,446&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.9x fewer&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$0.0102&lt;/td&gt;
&lt;td&gt;$0.0051&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Test: &lt;em&gt;"Suggest 10 places to visit in Florence"&lt;/em&gt; — model: gpt-4.1&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The MCP Overhead in Numbers
&lt;/h3&gt;

&lt;p&gt;I repeated the experiment several times, and the results were consistent. The &lt;strong&gt;7,069 bytes stripped&lt;/strong&gt; per tool call are not just wasted bandwidth—they are directly converted into input tokens that the LLM must read, price, and fit into its context window.&lt;/p&gt;

&lt;p&gt;The raw Tavily response carries fields the agent simply doesn't need for this task: &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;url&lt;/code&gt;, &lt;code&gt;favicon&lt;/code&gt;, &lt;code&gt;raw_content&lt;/code&gt;, &lt;code&gt;request_id&lt;/code&gt;, &lt;code&gt;response_time&lt;/code&gt;, &lt;code&gt;auto_parameters&lt;/code&gt;, and &lt;code&gt;usage&lt;/code&gt;. Once you remove them, the payload drops by &lt;strong&gt;58%&lt;/strong&gt;, mapping almost perfectly to the 2.9x token reduction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a "Small" Saving Matters
&lt;/h3&gt;

&lt;p&gt;In a complex system, a 2.9x token reduction compounds quickly. If your agent makes 10 tool calls in a single session, you aren't just saving a few cents—you are effectively preventing your context window from exploding. By keeping the input lean, you leave more room for the actual reasoning and long-term memory the agent needs to finish the job.&lt;/p&gt;

&lt;p&gt;Real-world systems work in loops: the agent searches, reasons, calls another tool, summarizes, and then responds. A 50% cost reduction on one call might look like pocket change, but we rarely stop at one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;MCP servers return everything because they are built for interoperability. They don’t know if your agent is a travel bot or a data scientist, so they send the "kitchen sink" to be safe.&lt;/p&gt;

&lt;p&gt;However, as a developer, the space between that server and your Agent is your responsibility. There is a clear tradeoff here: if you prune too much data, you might limit the agent's capacity to find unexpected connections. But for most specific tasks, forcing an LLM to read favicon URLs and request_ids is just paying a "tax for noise."&lt;/p&gt;

&lt;p&gt;This becomes even more critical in an enterprise environment where you might be wrapping your own internal APIs with an MCP server. It is increasingly evident that MCP tools should be wrapped and executed in a layer outside the agent's direct context.&lt;/p&gt;

&lt;p&gt;If you want to dive deeper into the "Tool Overload" problem, these articles were instrumental in my research:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/code-execution-with-mcp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.lunar.dev/post/why-is-there-mcp-tool-overload-and-how-to-solve-it-for-your-ai-agents" rel="noopener noreferrer"&gt;https://www.lunar.dev/post/why-is-there-mcp-tool-overload-and-how-to-solve-it-for-your-ai-agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm still just scratching the surface of agentic programming in Go, but this was an important lesson: don't just "plug and play" your MCP tools. Apply a layer between MCP and agents and save the tokens for the reasoning that actually matters.&lt;/p&gt;

&lt;p&gt;In the next post, I'll dive into how I built a "movie reflection system" to improve agent accuracy and reduce hallucinations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Check my work
&lt;/h2&gt;

&lt;p&gt;The code for both variants is available in &lt;a href="https://github.com/erlangb/agent_monitor" rel="noopener noreferrer"&gt;agent_monitor&lt;/a&gt;.&lt;br&gt;
This can be used to run pre-filled simple use-cases, or write your own using EINO.&lt;/p&gt;

</description>
      <category>agentic</category>
      <category>ai</category>
      <category>go</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
