<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Surana</title>
    <description>The latest articles on DEV Community by Amit Surana (@amitsurana).</description>
    <link>https://dev.to/amitsurana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3683952%2Fac1fa449-922b-498a-9f36-5eccdc49c66e.png</url>
      <title>DEV Community: Amit Surana</title>
      <link>https://dev.to/amitsurana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amitsurana"/>
    <language>en</language>
    <item>
      <title>Beyond Keywords: Engineering a Production-Ready Agentic Search Framework in Go</title>
      <dc:creator>Amit Surana</dc:creator>
      <pubDate>Mon, 29 Dec 2025 10:00:02 +0000</pubDate>
      <link>https://dev.to/amitsurana/beyond-keywords-engineering-a-production-ready-agentic-search-framework-in-go-3j45</link>
      <guid>https://dev.to/amitsurana/beyond-keywords-engineering-a-production-ready-agentic-search-framework-in-go-3j45</guid>
      <description>&lt;p&gt;Search systems have historically been optimized for retrieval: given a query, return the most relevant documents. That model breaks down the moment user intent shifts from finding information to solving problems.&lt;/p&gt;

&lt;p&gt;Consider a query like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How will tomorrow's weather in Seattle affect flight prices to JFK?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a search problem. It's a reasoning problem — one that requires decomposition, orchestration across multiple systems, and synthesis into a coherent answer.&lt;/p&gt;

&lt;p&gt;This is where agentic search comes in.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk through how we designed and productionized an agentic search framework in Go — not as a demo, but as a real system operating under production constraints like latency, cost, concurrency, and failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Search to Agentic Search
&lt;/h2&gt;

&lt;p&gt;Keyword and vector search systems excel at matching queries to documents. What they don't handle well is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step reasoning&lt;/li&gt;
&lt;li&gt;Tool coordination&lt;/li&gt;
&lt;li&gt;Query decomposition&lt;/li&gt;
&lt;li&gt;Answer synthesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agentic search treats the LLM not as a text generator, but as a &lt;strong&gt;planner&lt;/strong&gt; — a component that decides what actions to take to answer a question.&lt;/p&gt;

&lt;p&gt;At a high level, an agentic system must be able to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Understand user intent&lt;/li&gt;
&lt;li&gt;Decide which tools to call&lt;/li&gt;
&lt;li&gt;Execute those tools safely&lt;/li&gt;
&lt;li&gt;Iterate when necessary&lt;/li&gt;
&lt;li&gt;Synthesize a final response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The hard part isn't wiring an LLM to tools. The hard part is doing this predictably and economically in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;

&lt;p&gt;We structured the system around three core concerns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Planning&lt;/strong&gt; – deciding what to do&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt; – running tools efficiently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesis&lt;/strong&gt; – producing the final answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s the end-to-end flow:&lt;/p&gt;

&lt;p&gt;Each stage is deliberately isolated. Reasoning does not leak into execution, and execution does not influence planning decisions directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flow Orchestrator: The Control Plane
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Flow Orchestrator&lt;/strong&gt; manages the full lifecycle of a request. Its responsibilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coordinating planner invocations&lt;/li&gt;
&lt;li&gt;Executing tools concurrently&lt;/li&gt;
&lt;li&gt;Handling retries, timeouts, and cancellations&lt;/li&gt;
&lt;li&gt;Streaming partial responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of a linear pipeline, the orchestrator supports parallel execution using Go's goroutines. This becomes essential once multiple independent tools are involved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query Planner: Mandatory First Pass, Conditional Iteration
&lt;/h2&gt;

&lt;p&gt;The Query Planner is always invoked at least once.&lt;/p&gt;

&lt;h3&gt;
  
  
  First Planner Call (Always)
&lt;/h3&gt;

&lt;p&gt;On the first invocation, the planner:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes the user query&lt;/li&gt;
&lt;li&gt;Produces an initial set of tool calls&lt;/li&gt;
&lt;li&gt;Establishes a consistent reasoning baseline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even trivial queries go through this step to maintain uniform behavior and observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lightweight Classifier Gate
&lt;/h3&gt;

&lt;p&gt;Before invoking the planner a second time, we run a lightweight classifier model to determine whether the query is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-step&lt;/li&gt;
&lt;li&gt;Multi-step&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This classifier is intentionally cheap and fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Second Planner Call (Only for Multi-Step Queries)
&lt;/h3&gt;

&lt;p&gt;If the query is classified as multi-step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The planner is invoked again&lt;/li&gt;
&lt;li&gt;It receives:&lt;/li&gt;
&lt;li&gt;The original user query&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tool responses from the first execution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It determines:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whether more tools are required&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Which tools to call next&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to sequence them&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This prevents uncontrolled planner loops — one of the most common failure modes in agentic systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Registry: Where Reasoning Meets Reality
&lt;/h2&gt;

&lt;p&gt;Every tool implements a strict Go interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// ToolInterface is the tool interface for developers to implement which uses&lt;/span&gt;
&lt;span class="c"&gt;// generics with strongly typed&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ToolInterface&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Input&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Output&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Execute initiates the execution of a tool.&lt;/span&gt;
    &lt;span class="c"&gt;//&lt;/span&gt;
    &lt;span class="c"&gt;// Parameters:&lt;/span&gt;
    &lt;span class="c"&gt;// - input: Strong typed tool request input.&lt;/span&gt;
    &lt;span class="c"&gt;// - output: Strong typed tool request output.&lt;/span&gt;
    &lt;span class="c"&gt;// - toolContext: Additional output data that is not used by the agent model.&lt;/span&gt;
    &lt;span class="c"&gt;// - err: structured error from tool. in some cases error is passed to LLM. eg: no_response from tool&lt;/span&gt;
    &lt;span class="n"&gt;Execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requestContext&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;Input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="n"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toolContext&lt;/span&gt; &lt;span class="n"&gt;ToolResponseContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// GetDefinition gets the tool definition sent to Large Language Model.&lt;/span&gt;
    &lt;span class="n"&gt;GetDefinition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;ToolDefinition&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This design gives us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Natural-language outputs for planner feedback&lt;/li&gt;
&lt;li&gt;Structured metadata for downstream use&lt;/li&gt;
&lt;li&gt;Compile-time safety&lt;/li&gt;
&lt;li&gt;Safe parallel execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Tool Registry acts as a trust boundary. Planner outputs are treated as intent — not instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel Tool Execution
&lt;/h2&gt;

&lt;p&gt;Planner-generated tool calls are executed concurrently whenever possible.&lt;/p&gt;

&lt;p&gt;Go's concurrency model makes this practical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lightweight goroutines&lt;/li&gt;
&lt;li&gt;Context-based cancellation&lt;/li&gt;
&lt;li&gt;Efficient I/O-bound execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of the reasons Go scales better than Python when agentic systems move beyond prototypes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Response Generation and Streaming
&lt;/h2&gt;

&lt;p&gt;Once tools complete, responses flow into the Response Generator.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge-based queries&lt;/strong&gt; are summarized and synthesized using an LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct-answer queries&lt;/strong&gt; (weather, sports, stocks) bypass synthesis and return raw tool output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Responses are streamed via Server-Sent Events (SSE) so users see partial results early, improving perceived latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caching Strategy: Making Agentic Search Economical
&lt;/h2&gt;

&lt;p&gt;One production reality became clear almost immediately: &lt;strong&gt;LLM calls have real cost — in both latency and dollars.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once we began serving beta traffic, caching became mandatory. Our guiding principle was simple: &lt;strong&gt;Avoid LLM calls whenever possible.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Semantic Cache (Full Response)
&lt;/h3&gt;

&lt;p&gt;We first check a semantic cache keyed on the user query.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache hit → return response immediately&lt;/li&gt;
&lt;li&gt;Entire agentic flow is bypassed
This delivers the biggest latency and cost win.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: Planner Response Cache
&lt;/h3&gt;

&lt;p&gt;If the semantic cache misses, we check whether the planner output (tool plan) is cached.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skips the planner LLM call&lt;/li&gt;
&lt;li&gt;Executes tools directly
Planner calls are among the most expensive and variable operations — caching them stabilizes both latency and cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 3: Summarizer Cache
&lt;/h3&gt;

&lt;p&gt;Finally, we cache summarizer outputs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool results often repeat&lt;/li&gt;
&lt;li&gt;Final synthesis can be reused&lt;/li&gt;
&lt;li&gt;Reduces LLM load during traffic spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each cache layer short-circuits a different part of the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons from Production
&lt;/h2&gt;

&lt;p&gt;A few hard-earned lessons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM calls are expensive&lt;/strong&gt; — caching isn't optional at scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic caching&lt;/strong&gt; pays off immediately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner loops&lt;/strong&gt; must be gated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most queries&lt;/strong&gt; are simpler than they look&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools fail&lt;/strong&gt; — retries and fallbacks matter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; is non-negotiable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents aren't autonomous&lt;/strong&gt; — orchestration beats autonomy&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>agentic</category>
      <category>ai</category>
      <category>go</category>
    </item>
  </channel>
</rss>
