<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daathwi Naagh</title>
    <description>The latest articles on DEV Community by Daathwi Naagh (@daathwi).</description>
    <link>https://dev.to/daathwi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3870711%2F07bb0d78-b664-46a3-a9c1-ef58ad478db3.jpeg</url>
      <title>DEV Community: Daathwi Naagh</title>
      <link>https://dev.to/daathwi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/daathwi"/>
    <language>en</language>
    <item>
      <title>Google ADK vs LangGraph — The Definitive Comparison for Agent Builders</title>
      <dc:creator>Daathwi Naagh</dc:creator>
      <pubDate>Fri, 10 Apr 2026 01:11:39 +0000</pubDate>
      <link>https://dev.to/daathwi/google-adk-vs-langgraph-the-definitive-comparison-for-agent-builders-jjm</link>
      <guid>https://dev.to/daathwi/google-adk-vs-langgraph-the-definitive-comparison-for-agent-builders-jjm</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Choosing the wrong framework doesn't just slow you down. It shapes every architectural decision that follows. Here's how to get it right.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why This Decision Matters More Than You Think
&lt;/h2&gt;

&lt;p&gt;The agent framework you pick isn't just a tooling choice. It determines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How you model state and orchestration&lt;/li&gt;
&lt;li&gt;How you debug when things go wrong in production&lt;/li&gt;
&lt;li&gt;What cloud infrastructure you're implicitly committing to&lt;/li&gt;
&lt;li&gt;How fast your team can move from prototype to deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both &lt;strong&gt;LangGraph&lt;/strong&gt; and &lt;strong&gt;Google ADK&lt;/strong&gt; are serious, production-capable frameworks. But they have fundamentally different philosophies — and that gap matters enormously depending on what you're building.&lt;/p&gt;

&lt;p&gt;Let's go deep.&lt;/p&gt;




&lt;h2&gt;
  
  
  At a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;LangGraph&lt;/th&gt;
&lt;th&gt;Google ADK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Released by&lt;/td&gt;
&lt;td&gt;LangChain Team&lt;/td&gt;
&lt;td&gt;Google (Google Cloud NEXT, April 2025)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Core philosophy&lt;/td&gt;
&lt;td&gt;Graph-based state machines&lt;/td&gt;
&lt;td&gt;Code-first, hierarchical agent trees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstraction level&lt;/td&gt;
&lt;td&gt;Low-level, explicit control&lt;/td&gt;
&lt;td&gt;Higher-level, batteries-included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model support&lt;/td&gt;
&lt;td&gt;Fully model-agnostic&lt;/td&gt;
&lt;td&gt;Optimized for Gemini, but model-agnostic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud tie-in&lt;/td&gt;
&lt;td&gt;Deploy anywhere&lt;/td&gt;
&lt;td&gt;Native Vertex AI / GCP integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;Medium–High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Precision, auditability, complex flows&lt;/td&gt;
&lt;td&gt;Speed, multi-agent systems, GCP environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;LangSmith / Langfuse&lt;/td&gt;
&lt;td&gt;OpenTelemetry-native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State management&lt;/td&gt;
&lt;td&gt;Built-in checkpointing + time travel&lt;/td&gt;
&lt;td&gt;Session state with pluggable backends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming&lt;/td&gt;
&lt;td&gt;Per-node token streaming&lt;/td&gt;
&lt;td&gt;Bidirectional audio/video + text (Gemini Live API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production maturity&lt;/td&gt;
&lt;td&gt;High (battle-tested)&lt;/td&gt;
&lt;td&gt;Early–Medium (growing fast)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Philosophy: Where They Diverge Fundamentally
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LangGraph — "You Are the Architect"
&lt;/h3&gt;

&lt;p&gt;LangGraph is an extension of LangChain that treats your agent as a &lt;strong&gt;directed graph (or DAG)&lt;/strong&gt;. Every step, every branch, every loop — you define it explicitly.&lt;/p&gt;

&lt;p&gt;You construct &lt;strong&gt;nodes&lt;/strong&gt; (LLM calls, tool calls, custom logic) and &lt;strong&gt;edges&lt;/strong&gt; (transitions, conditions, cycles). The agent's entire execution path is a graph you designed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LangGraph — you define the graph explicitly
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classify_intent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_search&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;respond&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;route_by_intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_needed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;direct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;respond&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;respond&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is powerful. And demanding. You own the architecture completely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google ADK — "You Define the Agents, ADK Handles the Rest"
&lt;/h3&gt;

&lt;p&gt;ADK treats agents as &lt;strong&gt;hierarchical tree structures&lt;/strong&gt;. A root agent delegates to specialized sub-agents. Orchestration is handled through pattern primitives: &lt;code&gt;SequentialAgent&lt;/code&gt;, &lt;code&gt;ParallelAgent&lt;/code&gt;, &lt;code&gt;LoopAgent&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ADK — declare agents and their roles, ADK orchestrates
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SequentialAgent&lt;/span&gt;

&lt;span class="n"&gt;research_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the given topic thoroughly.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;writer_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a concise summary based on research.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SequentialAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer_agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ADK provides the scaffolding. You define the logic, roles, and tools. The framework manages context, routing, state, and lifecycle.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dive: Feature by Feature
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Orchestration Model
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; uses a graph model — nodes and edges. It shines when your workflow has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex conditional branching&lt;/li&gt;
&lt;li&gt;Loops and retries with custom exit conditions&lt;/li&gt;
&lt;li&gt;Parallel branches that must merge at specific points&lt;/li&gt;
&lt;li&gt;Precise control over which step runs when&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt; uses a hierarchical agent tree. It shines when your workflow looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A root "manager" agent that delegates to specialists&lt;/li&gt;
&lt;li&gt;Tasks that can run sequentially or in parallel by design&lt;/li&gt;
&lt;li&gt;Multi-agent workflows where each agent has a clear, encapsulated role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key difference: &lt;strong&gt;LangGraph models flow as a graph. ADK models flow as a team.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  2. State Management
&lt;/h3&gt;

&lt;p&gt;This is where LangGraph has a significant technical edge for complex use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; has built-in &lt;strong&gt;checkpointing&lt;/strong&gt; — state is persisted at every node. This enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time travel debugging&lt;/strong&gt;: replay your agent from any prior state&lt;/li&gt;
&lt;li&gt;Resuming interrupted runs&lt;/li&gt;
&lt;li&gt;Human-in-the-loop flows (pause, wait for approval, continue)&lt;/li&gt;
&lt;li&gt;Fault tolerance in long-running workflows
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LangGraph time-travel: rewind to any past state
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abc123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...]},&lt;/span&gt; &lt;span class="n"&gt;as_node&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt; manages state through &lt;code&gt;Session&lt;/code&gt; objects — short-term state per conversation, with pluggable backends for longer-term memory. It's cleaner for conversational flows and multi-session memory, but doesn't natively offer the time-travel / checkpoint replay that LangGraph does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner for complex state&lt;/strong&gt;: LangGraph. &lt;strong&gt;Winner for conversational memory across sessions&lt;/strong&gt;: ADK.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Multi-Agent Systems
&lt;/h3&gt;

&lt;p&gt;Both frameworks support multi-agent architectures, but they approach it very differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;: You build multi-agent systems by composing graphs. One graph can invoke another as a subgraph. Communication between agents is via shared state passed through the graph. It's powerful but requires you to design the topology explicitly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt;: Multi-agent is a &lt;strong&gt;first-class primitive&lt;/strong&gt;. ADK is explicitly designed for hierarchical agent teams. Sub-agents can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invoked sequentially (&lt;code&gt;SequentialAgent&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Invoked in parallel (&lt;code&gt;ParallelAgent&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Looped until a condition is met (&lt;code&gt;LoopAgent&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Called as tools by a root agent (&lt;code&gt;AgentTool&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ADK also supports &lt;strong&gt;Agent2Agent (A2A) Protocol&lt;/strong&gt; — a standardized interface allowing ADK agents to call agents built in LangGraph, CrewAI, or other frameworks. This is a major interoperability win.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ADK: Run flight and hotel agents IN PARALLEL
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ParallelAgent&lt;/span&gt;

&lt;span class="n"&gt;booking_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ParallelAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;booking_pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;flight_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hotel_agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# runs concurrently
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Winner for multi-agent-first design&lt;/strong&gt;: ADK.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Observability and Debugging
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; integrates tightly with &lt;strong&gt;LangSmith&lt;/strong&gt; (and Langfuse via callbacks). You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step-by-step trace of every node execution&lt;/li&gt;
&lt;li&gt;Token usage per node&lt;/li&gt;
&lt;li&gt;Visual graph replay of agent runs&lt;/li&gt;
&lt;li&gt;LangGraph Studio: visual debugging UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt; is built with &lt;strong&gt;OpenTelemetry&lt;/strong&gt; natively. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plugs into any OTel-compatible backend (Jaeger, Grafana, Datadog, etc.)&lt;/li&gt;
&lt;li&gt;One-click integrations with Langfuse and other LLM observability platforms&lt;/li&gt;
&lt;li&gt;Built-in evaluation framework for both final responses and intermediate steps&lt;/li&gt;
&lt;li&gt;Visual Web UI + CLI for local debugging&lt;/li&gt;
&lt;li&gt;When deployed on Vertex AI: Cloud Trace integration out of the box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LangGraph's edge&lt;/strong&gt;: LangSmith is mature and deeply integrated.&lt;br&gt;
&lt;strong&gt;ADK's edge&lt;/strong&gt;: OpenTelemetry-first avoids vendor lock-in.&lt;/p&gt;


&lt;h3&gt;
  
  
  5. Tool Ecosystem
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LangGraph/LangChain&lt;/strong&gt; has a massive, mature ecosystem — thousands of pre-built integrations, tools, and chains built over years. It's hard to beat for breadth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt; brings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-built tools: Google Search, Code Execution, BigQuery, AlloyDB&lt;/li&gt;
&lt;li&gt;MCP (Model Context Protocol) tool support&lt;/li&gt;
&lt;li&gt;LangChain tools usable inside ADK (interoperability)&lt;/li&gt;
&lt;li&gt;Other agent frameworks (CrewAI, LangGraph agents) usable as tools&lt;/li&gt;
&lt;li&gt;Support for 200+ models via LiteLLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ADK's tool interoperability story is strong — it can consume LangChain tools, which largely closes the ecosystem gap.&lt;/p&gt;


&lt;h3&gt;
  
  
  6. Deployment
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;: Deploy anywhere. Containerize your graph and run it on any infrastructure. LangGraph Cloud (managed service) available for scale. Truly cloud-agnostic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt;: "Deploy anywhere" is the stated goal, and it works — but the native experience is GCP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-command deploy&lt;/strong&gt; to Vertex AI Agent Engine&lt;/li&gt;
&lt;li&gt;Native Cloud Run, GKE support&lt;/li&gt;
&lt;li&gt;Managed sessions, auth, and tracing on Vertex AI automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're on GCP, ADK's deployment story is a genuine competitive advantage. If you're on AWS, Azure, or self-hosted, LangGraph is simpler.&lt;/p&gt;


&lt;h3&gt;
  
  
  7. Streaming
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;: Per-node token streaming. Standard LLM streaming, solid and reliable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt;: Bidirectional audio and video streaming via the &lt;strong&gt;Gemini Live API&lt;/strong&gt;. This is unique — no other major framework natively supports this. For voice agents, customer support bots, or multimodal applications, ADK is in a different league here.&lt;/p&gt;


&lt;h3&gt;
  
  
  8. Developer Experience
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; feels like a &lt;strong&gt;graph DSL&lt;/strong&gt; — powerful, but you're working at a lower abstraction level. It rewards engineers who want transparency and deterministic behavior. The cost: more boilerplate, steeper learning curve, fragmented documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt; feels like a &lt;strong&gt;full-stack Python application framework&lt;/strong&gt; — Web UI, CLI, API server, test harness, deploy pipelines, all included. It rewards engineers who want to move fast and think in terms of agents and roles rather than nodes and edges.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Honest Tradeoffs
&lt;/h2&gt;
&lt;h3&gt;
  
  
  LangGraph Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Unmatched precision and control over execution flow&lt;/li&gt;
&lt;li&gt;Best-in-class state checkpointing and time-travel debugging&lt;/li&gt;
&lt;li&gt;Mature ecosystem, battle-tested in production&lt;/li&gt;
&lt;li&gt;Truly model-agnostic and cloud-agnostic&lt;/li&gt;
&lt;li&gt;Excellent for compliance-heavy environments (every decision is auditable)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  LangGraph Weaknesses
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Verbose code for straightforward multi-agent patterns&lt;/li&gt;
&lt;li&gt;Steeper learning curve — graph thinking isn't intuitive for all teams&lt;/li&gt;
&lt;li&gt;Documentation can be fragmented&lt;/li&gt;
&lt;li&gt;No native multimodal streaming&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  ADK Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fastest path to hierarchical multi-agent systems&lt;/li&gt;
&lt;li&gt;Built-in evaluation, Web UI, CLI — production-grade DX out of the box&lt;/li&gt;
&lt;li&gt;Native A2A protocol for cross-framework agent interoperability&lt;/li&gt;
&lt;li&gt;OpenTelemetry-native observability (no vendor lock-in)&lt;/li&gt;
&lt;li&gt;Best multimodal/streaming support of any major framework&lt;/li&gt;
&lt;li&gt;Actively backed by Google, powering internal products (Agentspace, Customer Engagement Suite)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  ADK Weaknesses
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Newer — less production battle-testing than LangGraph&lt;/li&gt;
&lt;li&gt;GCP ecosystem makes it awkward outside Google Cloud&lt;/li&gt;
&lt;li&gt;Less fine-grained control than LangGraph for complex cyclical flows&lt;/li&gt;
&lt;li&gt;Gemini optimization means other models are second-class (though supported)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Decision Guide: When to Choose What
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Choose LangGraph when...
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;You need surgical precision over every execution step.&lt;/strong&gt;&lt;br&gt;
Compliance systems, financial workflows, healthcare automation — any domain where you must prove exactly what happened and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your workflow has complex, custom loops and branching logic.&lt;/strong&gt;&lt;br&gt;
Non-standard patterns that don't fit "sequential" or "parallel" — LangGraph lets you model any flow you can imagine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You're building long-running tasks that must survive interruptions.&lt;/strong&gt;&lt;br&gt;
Checkpointing + resume is a LangGraph superpower. Multi-day agent runs, workflows requiring human approval mid-execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You're multi-cloud or cloud-agnostic.&lt;/strong&gt;&lt;br&gt;
If AWS, Azure, or self-hosted infrastructure is non-negotiable, LangGraph is the frictionless path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your team already knows LangChain.&lt;/strong&gt;&lt;br&gt;
The ecosystem familiarity is a real productivity advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need the widest model support without friction.&lt;/strong&gt;&lt;br&gt;
OpenAI, Anthropic, Mistral, local models — all first-class citizens.&lt;/p&gt;


&lt;h3&gt;
  
  
  Choose Google ADK when...
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;You're building on Google Cloud (Vertex AI, GCP).&lt;/strong&gt;&lt;br&gt;
One-command deployment, managed sessions, Cloud Trace — the native experience is genuinely excellent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed to production matters more than architectural customization.&lt;/strong&gt;&lt;br&gt;
ADK's batteries-included approach gets you from prototype to deployed agent faster than anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You're building hierarchical multi-agent systems.&lt;/strong&gt;&lt;br&gt;
Agent teams with clear roles and delegation are ADK's native strength.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need multimodal or voice agents.&lt;/strong&gt;&lt;br&gt;
Bidirectional audio/video streaming via Gemini Live API is uniquely available here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want cross-framework agent interoperability via A2A.&lt;/strong&gt;&lt;br&gt;
If your org is mixing ADK agents, LangGraph agents, and CrewAI agents — the A2A protocol makes ADK the best orchestration hub.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your model choice is Gemini (or you want access to Gemini 3 Pro/Flash).&lt;/strong&gt;&lt;br&gt;
ADK and Gemini are deeply co-designed. You'll get the best performance, streaming, and tooling here.&lt;/p&gt;


&lt;h2&gt;
  
  
  Situational Cheatsheet
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Compliance/audit-critical workflow&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP-native enterprise deployment&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex custom loops and cycles&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent delegation with clear roles&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-running tasks with resume/replay&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice/multimodal agents&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS or Azure infrastructure&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast prototyping to production&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You need every model under the sun&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini is your primary model&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HITL (Human-in-the-loop) workflows&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-framework agent interop (A2A)&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team prefers explicit flow control&lt;/td&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team prefers role-based agent design&lt;/td&gt;
&lt;td&gt;ADK&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Can You Use Both?
&lt;/h2&gt;

&lt;p&gt;Yes — and increasingly, teams do.&lt;/p&gt;

&lt;p&gt;ADK can treat a LangGraph agent as an &lt;code&gt;AgentTool&lt;/code&gt;. LangGraph can call ADK-built agents as subgraphs via API. With MCP and A2A protocol support in ADK, the two frameworks are becoming interoperable rather than mutually exclusive.&lt;/p&gt;

&lt;p&gt;A pragmatic architecture some teams use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Root Orchestrator (ADK — hierarchical multi-agent)
├── Research Agent (ADK — Google Search, BigQuery)
├── Processing Agent (LangGraph — complex stateful loop)
│   ├── Validate Node
│   ├── Transform Node
│   └── Retry Node (with checkpointing)
└── Output Agent (ADK — Gemini streaming response)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use each where it's strongest.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; is the engineer's framework. It gives you the surgical control, auditability, and state management that complex production systems demand. You pay for it in learning curve and boilerplate. For compliance-heavy, custom-flow, cloud-agnostic workloads — it's the right tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK&lt;/strong&gt; is the product team's framework. It's fast, cohesive, and opinionated in the right ways. Multi-agent orchestration is a first-class citizen, deployment is frictionless on GCP, and the multimodal streaming story is unmatched. For hierarchical agent teams, GCP environments, and teams that want to move fast — it's compelling and only getting better.&lt;/p&gt;

&lt;p&gt;The framework you pick should match your &lt;strong&gt;workflow pattern&lt;/strong&gt;, your &lt;strong&gt;infrastructure&lt;/strong&gt;, and your team's &lt;strong&gt;mental model&lt;/strong&gt;. Neither is universally better.&lt;/p&gt;

&lt;p&gt;Pick the one that makes your specific problem easier to model — then go build.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you shipped production agents on either? I'd be curious what failure modes you've hit in practice — that's where the real framework comparison happens.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>gcp</category>
      <category>langchain</category>
    </item>
    <item>
      <title>100s of Tools in Your Agent — Here's How to Actually Pick the Right One</title>
      <dc:creator>Daathwi Naagh</dc:creator>
      <pubDate>Fri, 10 Apr 2026 00:49:45 +0000</pubDate>
      <link>https://dev.to/daathwi/100s-of-tools-in-your-agent-heres-how-to-actually-pick-the-right-one-547i</link>
      <guid>https://dev.to/daathwi/100s-of-tools-in-your-agent-heres-how-to-actually-pick-the-right-one-547i</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;You've built an agent. You've wired up 100+ tools. You feel good about it.&lt;/p&gt;

&lt;p&gt;Then it starts hallucinating. Picking the wrong tool. Collapsing entire workflows over a single misclassified query.&lt;/p&gt;

&lt;p&gt;The failure isn't the LLM. &lt;strong&gt;It's the architecture.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My previous post covered a genuine use case I found for Gemma4 — and this is exactly where it fits in.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Naive Approach (and Why It Fails)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Load all tools into LLM context and let it decide.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sounds simple. It is. And it breaks at scale.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hallucinations increase with context length&lt;/li&gt;
&lt;li&gt;Bloated context = slower, more expensive calls&lt;/li&gt;
&lt;li&gt;LLM gets confused choosing between 50+ tool descriptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Slow. Unreliable. Expensive agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Slightly Smarter" Approach (Still Broken)
&lt;/h2&gt;

&lt;p&gt;RAG over tool descriptions. Seems reasonable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User query → embedding → top 5 matches → LLM picks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sounds clean. But &lt;strong&gt;embeddings can't distinguish intent.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Similar words ≠ same meaning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User says: &lt;em&gt;"I need iPhone"&lt;/em&gt;&lt;br&gt;
Tool: &lt;code&gt;check_product_catalog&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Embedding search may:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Miss the tool completely&lt;/li&gt;
&lt;li&gt;Retrieve irrelevant tools&lt;/li&gt;
&lt;li&gt;Break the entire downstream workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wrong tool gets selected. Entire workflow collapses.&lt;/p&gt;

&lt;p&gt;The problem isn't the embedding model. It's that you're asking a single layer to carry too much responsibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Worked: A Layered Filtering Stack
&lt;/h2&gt;

&lt;p&gt;The approach that works in production is &lt;strong&gt;layered filtering&lt;/strong&gt; — not pure semantic search, not raw LLM reasoning. Both together, in the right order.&lt;/p&gt;

&lt;p&gt;Here's the stack I use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Intent Classification&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemma4:e4b&lt;/code&gt; via Ollama (9.6 GB, local)&lt;/td&gt;
&lt;td&gt;Fast, private intent routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Search&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;nomic-embed-text&lt;/code&gt; via Ollama (274 MB)&lt;/td&gt;
&lt;td&gt;Embedding over filtered subset&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The 5-Step Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — Classify Intent First
&lt;/h3&gt;

&lt;p&gt;A lightweight LLM maps the query to a &lt;strong&gt;high-level category&lt;/strong&gt; before any search happens.&lt;/p&gt;

&lt;p&gt;This eliminates entire irrelevant domains upfront. If the user is asking about orders, you never even look at inventory tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="s2"&gt;"I need iPhone"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;category:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;product_discovery&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2 — Hard Filter by Metadata
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Deterministic rules. Not embeddings.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Only tools matching the classified intent category are eligible. The search space collapses from 100+ tools to maybe 10–15.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;category:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;product_discovery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;eligible&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tools:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;check_catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;search_products&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_inventory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3 — Semantic Search Within the Clean Subset
&lt;/h3&gt;

&lt;p&gt;Now RAG works — because it's running over a &lt;strong&gt;small, relevant set&lt;/strong&gt;, not noisy hundreds.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;nomic-embed-text&lt;/code&gt; finds the closest semantic matches within your filtered pool. False positives drop dramatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Score and Rank
&lt;/h3&gt;

&lt;p&gt;Confidence scoring on the top candidates. Auditable. Explainable.&lt;/p&gt;

&lt;p&gt;No black box decisions. You can log exactly &lt;em&gt;why&lt;/em&gt; a tool was selected — which matters when things break in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5 — LLM Final Pick
&lt;/h3&gt;

&lt;p&gt;Send the top candidates + the original user query to the LLM.&lt;/p&gt;

&lt;p&gt;Now it's choosing between 3–5 relevant tools, not 100+. The context is clean. The decision is accurate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;check_product_catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;search_products&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_item_details&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I need iPhone"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;LLM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;picks:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;check_product_catalog&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Each layer does one job.&lt;/strong&gt; No single layer carries too much responsibility.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Intent classifier  → reduces domain space
Metadata filter    → reduces tool count (deterministic, fast)
Semantic search    → finds closest match in clean subset
Scoring            → adds confidence + auditability
LLM               → makes final call with minimal context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's why it's reliable. That's why it's fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part No One Talks About
&lt;/h2&gt;

&lt;p&gt;Even the best architecture fails if your &lt;strong&gt;tool descriptions are written like API docs&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad — written for engineers
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_product_catalog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Queries product catalog by SKU with region filtering.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Good — written for users
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_product_catalog&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Use this when someone wants to find a product, check if something 
    is available, look up an item by name or model, or browse what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s 
    in stock. Works for queries like &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;do you have iPhone 15?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or 
    &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;show me your laptop options&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write descriptions in the &lt;strong&gt;language your users actually speak.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think about how someone types a message to an agent — not how an engineer names a function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The system learns from the language you give it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;End-to-end latency (all 4 steps)&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool selection accuracy&lt;/td&gt;
&lt;td&gt;Significantly higher than pure RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model infra&lt;/td&gt;
&lt;td&gt;Fully local, private&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bottleneck&lt;/td&gt;
&lt;td&gt;1000+ concurrent users (next problem to solve)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;We don't need smarter models.&lt;br&gt;&lt;br&gt;
We don't need infinite context windows.  &lt;/p&gt;

&lt;p&gt;We need &lt;strong&gt;better system design&lt;/strong&gt; and the discipline to &lt;strong&gt;think like a user.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A layered architecture with good tool descriptions, running on lightweight local models, will outperform a bloated LLM context every time.&lt;/p&gt;

&lt;p&gt;That's the real work.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built this in production. Happy to discuss the concurrent scaling problem — that's a different beast entirely.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Found this useful? Follow for more on agent architecture and production ML systems.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentaichallenge</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
