<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Zoricic</title>
    <description>The latest articles on DEV Community by Zoricic (@zoricic).</description>
    <link>https://dev.to/zoricic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3804118%2F2d59183d-e624-4e98-8a9f-4987bd9c6474.png</url>
      <title>DEV Community: Zoricic</title>
      <link>https://dev.to/zoricic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zoricic"/>
    <language>en</language>
    <item>
      <title>Three Things Had to Align: The Real Story Behind the LLM Revolution</title>
      <dc:creator>Zoricic</dc:creator>
      <pubDate>Tue, 31 Mar 2026 20:39:35 +0000</pubDate>
      <link>https://dev.to/zoricic/three-things-had-to-align-the-real-story-behind-the-llm-revolution-3kmm</link>
      <guid>https://dev.to/zoricic/three-things-had-to-align-the-real-story-behind-the-llm-revolution-3kmm</guid>
      <description>&lt;p&gt;ChatGPT didn't come out of nowhere. It's the result of 60 years of dead ends, one accidental breakthrough, and three completely separate technologies all maturing at exactly the same moment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before the Revolution: Chatbots That Didn't Learn
&lt;/h2&gt;

&lt;p&gt;AI is older than most people realize. But early AI looked nothing like what we have today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ELIZA&lt;/strong&gt; (1966) was the first chatbot. It simulated conversation using pattern matching — if you typed "I feel sad," it replied "Why do you feel sad?" It didn't learn anything. It followed hand-written rules. Impressive for 1966. Useless for anything complex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RNNs&lt;/strong&gt; emerged in the 1990s as the dominant approach for language processing. &lt;strong&gt;LSTMs&lt;/strong&gt; (Long Short-Term Memory networks), invented by Hochreiter &amp;amp; Schmidhuber in 1997, improved on RNNs specifically to address memory limitations — but both shared the same fundamental design: reading text sequentially, word by word, like looking through a keyhole. By the time either reached the end of a long sentence, the beginning was largely forgotten.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"The dog, which had been chasing the cat down the long street, was tired."
                                                               ↑
                               RNN had forgotten "dog" here ──┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Meanwhile, Google Was Running on Rules
&lt;/h3&gt;

&lt;p&gt;While researchers struggled with language models, Google was building search using a series of named algorithms — each solving a specific problem:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;What It Did&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PageRank&lt;/td&gt;
&lt;td&gt;1998&lt;/td&gt;
&lt;td&gt;Ranked pages by how many other pages linked to them&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Panda&lt;/td&gt;
&lt;td&gt;2011&lt;/td&gt;
&lt;td&gt;Penalized low-quality and duplicate content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Penguin&lt;/td&gt;
&lt;td&gt;2012&lt;/td&gt;
&lt;td&gt;Penalized sites that bought fake links&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hummingbird&lt;/td&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;First steps toward understanding concepts, not just keywords&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RankBrain&lt;/td&gt;
&lt;td&gt;2015&lt;/td&gt;
&lt;td&gt;First use of machine learning — handled never-seen-before queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;RankBrain was Google's first real AI in search. It used mathematical vectors to connect unknown words to known ones. But it still couldn't write a single sentence — and it still saw words as isolated units, not as parts of connected thought.&lt;/p&gt;

&lt;p&gt;The forgetting problem and the keyword problem were the same problem in different clothes. Both systems failed to capture relationships between words across distance. The fix came from a single paper.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Breakthrough: "Attention Is All You Need" (2017)
&lt;/h2&gt;

&lt;p&gt;In 2017, a team of Google researchers published a paper with a confident title: &lt;em&gt;Attention Is All You Need&lt;/em&gt;. The paper addressed &lt;strong&gt;sequence transduction&lt;/strong&gt; broadly — mapping one sequence to another — and used machine translation as the benchmark task, showing the Transformer outperforming existing approaches on English-to-German and English-to-French. What they invented was the &lt;strong&gt;Transformer architecture&lt;/strong&gt;, and it accidentally became the foundation for every major AI system since. Google Translate's adoption of Transformers was gradual and varied by language pair; the architecture didn't replace the previous approach across the board overnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Idea: Self-Attention
&lt;/h3&gt;

&lt;p&gt;Instead of reading text sequentially, the Transformer looks at the entire input at once and assigns a &lt;strong&gt;weight&lt;/strong&gt; — an importance score — to each word relative to every other word.&lt;/p&gt;

&lt;p&gt;When processing "tired" in our earlier example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"The dog, which had been chasing the cat down the long street, was tired."

  Word      Attention Weight
  ───────────────────────────
  dog        HIGH  ← subject; dogs can be tired
  chasing    MED   ← action dog performed
  cat        LOW   ← object of chasing; not tired
  street     LOW   ← location; not tired
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model builds a relationship map of the entire sentence simultaneously. Nothing is read sequentially. Nothing is forgotten mid-sentence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Required New Hardware
&lt;/h3&gt;

&lt;p&gt;Self-attention is parallelizable — every word's relationship to every other word can be computed at the same time, not one after another.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RNN:         [word1] → [word2] → [word3] → result
             Sequential. Each step waits for the previous.

Transformer: [word1] ↔ [word2] ↔ [word3]
               ↕          ↕         ↕
             All relationships computed simultaneously.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GPUs were designed to render thousands of pixels in parallel for video games. That same parallel architecture maps directly onto self-attention calculations. The match was accidental — and decisive.&lt;/p&gt;

&lt;p&gt;Training GPT-3 without modern GPU hardware wouldn't take months. It would take hundreds of years.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Things Had to Align
&lt;/h2&gt;

&lt;p&gt;Here's the part most explanations skip: if the Transformer paper had been published in 2005, nothing would have happened.&lt;/p&gt;

&lt;p&gt;Modern LLMs exist because three things converged at the same moment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│  THE ALGORITHM  │   │    THE DATA     │   │  THE HARDWARE   │
│                 │   │                 │   │                 │
│  Transformer    │ + │  Billions of    │ + │  GPU clusters   │
│  architecture   │   │  internet pages,│   │  powerful       │
│  (the recipe)   │   │  books, Wikipedia│  │  enough to run  │
│                 │   │  (ingredients)  │   │  the math       │
└─────────────────┘   └─────────────────┘   └─────────────────┘
                              ↓
                  All three align ~2017
                              ↓
                  LLM revolution begins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data requirement is the most underestimated factor. LLMs need billions of sentences to develop statistical understanding of language. The internet, Wikipedia, and digitized books only reached the necessary scale in the early 2010s. The algorithm existing in 2005 would have had nothing meaningful to train on.&lt;/p&gt;

&lt;p&gt;Miss any one of the three — and ChatGPT doesn't exist.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Directions from One Architecture
&lt;/h2&gt;

&lt;p&gt;Once the Transformer existed, two companies used it to solve two completely different problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  BERT — Google (2018)
&lt;/h3&gt;

&lt;p&gt;Google needed to &lt;em&gt;understand&lt;/em&gt; language, not generate it. Search requires answering: what does this query actually mean?&lt;/p&gt;

&lt;p&gt;BERT (&lt;strong&gt;B&lt;/strong&gt;idirectional &lt;strong&gt;E&lt;/strong&gt;ncoder &lt;strong&gt;R&lt;/strong&gt;epresentations from &lt;strong&gt;T&lt;/strong&gt;ransformers) trained using &lt;strong&gt;Masked Language Modeling&lt;/strong&gt;: researchers masked 15% of words in a sentence and trained the model to predict them from surrounding context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Paris is the [MASK] of France."  →  BERT learns: "capital"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key innovation: &lt;em&gt;bidirectionality&lt;/em&gt;. Previous models read left-to-right. BERT reads the full sentence in all directions at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt; Google Search circa 2019. Before BERT, searching &lt;em&gt;"can I pick up a prescription for someone else at the pharmacy"&lt;/em&gt; matched keywords. After BERT, Google understood that "for someone else" was the semantically critical phrase.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPT Series — OpenAI (2018–2020)
&lt;/h3&gt;

&lt;p&gt;OpenAI needed to &lt;em&gt;generate&lt;/em&gt; language. The decoder half of the Transformer — predicting the next word given all previous words — became the basis for GPT.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Milestone&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-1&lt;/td&gt;
&lt;td&gt;2018&lt;/td&gt;
&lt;td&gt;Proved large-scale pre-training transfers to other tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-2&lt;/td&gt;
&lt;td&gt;2019&lt;/td&gt;
&lt;td&gt;Outputs convincing enough to trigger public misuse debates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-3&lt;/td&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;175 billion parameters — first truly massive, general-purpose model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;BERT vs. GPT compared:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;BERT (Google)&lt;/th&gt;
&lt;th&gt;GPT (OpenAI)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reading direction&lt;/td&gt;
&lt;td&gt;Bidirectional&lt;/td&gt;
&lt;td&gt;Left to right&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary goal&lt;/td&gt;
&lt;td&gt;Understanding&lt;/td&gt;
&lt;td&gt;Generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training signal&lt;/td&gt;
&lt;td&gt;Masked word prediction&lt;/td&gt;
&lt;td&gt;Next word prediction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Used for&lt;/td&gt;
&lt;td&gt;Search, Q&amp;amp;A, classification&lt;/td&gt;
&lt;td&gt;Chatbots, writing, code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Neither is strictly better. They're optimized for different jobs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Missing Piece: Why GPT-3 Wasn't ChatGPT
&lt;/h2&gt;

&lt;p&gt;GPT-3 could generate fluent text. But left to its own training objective — predict the next word — it was unreliable as an assistant. Ask it a question and it might answer it, continue the question, or list five more similar questions. It had no concept of being &lt;em&gt;helpful&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The gap between GPT-3 (2020) and ChatGPT (2022) is two techniques:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instruction tuning&lt;/strong&gt; fine-tunes a pre-trained model on examples of instructions paired with good responses. Instead of "predict the next token in internet text," the training signal becomes "given this instruction, produce a useful response." The model learns to follow directions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RLHF (Reinforcement Learning from Human Feedback)&lt;/strong&gt; goes further. Human raters compare pairs of model outputs and rank which is better. Those rankings train a &lt;strong&gt;reward model&lt;/strong&gt; — a separate model that scores outputs. The language model is then fine-tuned using reinforcement learning to maximize that score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pre-trained GPT-3
        │
        ▼
Instruction tuning (supervised, labeled examples)
        │
        ▼
RLHF — human raters rank outputs → reward model → RL fine-tuning
        │
        ▼
InstructGPT / ChatGPT — helpful, harmless, honest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why ChatGPT felt qualitatively different from GPT-3 API users' experience. The underlying model was similar in architecture. What changed was the training objective: from "predict internet text" to "be useful to a human."&lt;/p&gt;

&lt;p&gt;OpenAI published the &lt;strong&gt;InstructGPT&lt;/strong&gt; paper in early 2022 describing this approach. ChatGPT launched in November 2022 using the same method.&lt;/p&gt;




&lt;h2&gt;
  
  
  Beyond Text: Multimodal Models
&lt;/h2&gt;

&lt;p&gt;LLMs started as text-only systems. By 2024, the leading models — GPT-4o, Gemini 1.5, Claude 3 — process text, images, audio, and video within the same model.&lt;/p&gt;

&lt;p&gt;The core insight: the Transformer's attention mechanism is not specific to text. An image can be divided into patches and treated as a sequence of tokens. Audio can be converted to spectrograms and similarly tokenized. The same self-attention machinery that relates words to each other can relate image patches to words to audio frames.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│              Multimodal Input                       │
│                                                     │
│  "What is in this image?"  +  [image file]         │
│         ↓                          ↓               │
│   Text tokens               Image patches           │
│         └──────────┬───────────────┘               │
│                    ▼                                │
│           Shared Transformer                        │
│           (attention over all tokens)               │
│                    ↓                                │
│            "A dog chasing a cat"                    │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Practical implications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A model can answer questions about a screenshot, diagram, or photograph&lt;/li&gt;
&lt;li&gt;Audio can be transcribed, translated, and responded to in a single pass&lt;/li&gt;
&lt;li&gt;Video frames become a sequence of image tokens the model can reason over&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The boundary between "language model" and "AI system" has effectively dissolved. What were separate specialized models (vision model, speech model, text model) are converging into single multimodal architectures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hardware: The Unsung Hero
&lt;/h2&gt;

&lt;p&gt;Algorithm and data get most of the coverage. Hardware rarely does — despite being the actual constraint.&lt;/p&gt;

&lt;p&gt;Microsoft built a dedicated supercomputer with 10,000 NVIDIA V100s specifically for OpenAI's research. OpenAI never disclosed exact training figures for GPT-3, but estimates put large training runs in the thousands of GPUs running for weeks. Today, systems running Gemini and GPT use clusters of H100 cards at similar scale. This is why NVIDIA became one of the most valuable companies in the world.&lt;/p&gt;

&lt;p&gt;Google took a different path, designing custom silicon specifically for Transformer math:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;Used by&lt;/th&gt;
&lt;th&gt;Strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA H100&lt;/td&gt;
&lt;td&gt;OpenAI, Anthropic, Meta&lt;/td&gt;
&lt;td&gt;General-purpose, widely available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google TPUs&lt;/td&gt;
&lt;td&gt;Google (Gemini)&lt;/td&gt;
&lt;td&gt;Purpose-built for Transformer math, more efficient per watt&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemini has been trained and served across multiple TPU generations (v4, v5e, v5p, and the newer Trillium/v6 for more recent models). TPUs aren't available to external developers for training — they're Google's internal advantage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;p&gt;If you're building on top of LLMs today, the historical distinctions above translate into concrete choices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing between understanding and generation models:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Model type&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Classifying text, ranking documents, Q&amp;amp;A over a corpus&lt;/td&gt;
&lt;td&gt;Encoder (BERT-style)&lt;/td&gt;
&lt;td&gt;BERT, RoBERTa, sentence-transformers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generating text, summarizing, conversing, coding&lt;/td&gt;
&lt;td&gt;Decoder (GPT-style)&lt;/td&gt;
&lt;td&gt;GPT-4, Claude, Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Most new product work in 2024+&lt;/td&gt;
&lt;td&gt;Multimodal API&lt;/td&gt;
&lt;td&gt;GPT-4o, Gemini 1.5 Pro, Claude 3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Full Timeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1966        ELIZA — first chatbot (rules only, no learning)
1990s–1997  RNNs (1990s) + LSTMs (1997) — sequential processing, forgetting problem
1998        PageRank — Google's original ranking algorithm
2013        Hummingbird — first concept-based search
2015        RankBrain — first ML in Google Search
2017        "Attention Is All You Need" — Transformer invented
2018        BERT (Google) + GPT-1 (OpenAI) — the split path begins
2019        BERT deployed in Google Search
2020        GPT-3 — 175B parameters, first truly massive LLM
2022        InstructGPT — instruction tuning + RLHF; ChatGPT launches November 2022
2024–2025   Multimodal models — GPT-4o, Gemini 1.5, Claude 3 unify text, image, audio, video
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The Transformer's attention mechanism solved the forgetting problem that had bottlenecked language AI for decades — by computing relationships across the entire input simultaneously rather than word by word. From that single architectural change, both understanding models (BERT) and generation models (GPT) were built. But the algorithm alone wasn't the revolution. It was the algorithm converging with enough training data and hardware powerful enough to run the math — and none of that aligned until ~2017. Even then, GPT-3 in 2020 wasn't ChatGPT: the final piece was RLHF, which replaced "predict internet text" with "be useful to a human" as the training objective. "AI" and "LLM" are not synonyms. Neither are "pre-trained model" and "assistant." The distinctions matter when evaluating what any given system can actually do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/1706.03762" rel="noopener noreferrer"&gt;Attention Is All You Need&lt;/a&gt; — the original 2017 Transformer paper&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/1810.04805" rel="noopener noreferrer"&gt;BERT: Pre-training of Deep Bidirectional Transformers&lt;/a&gt; — Google's 2018 paper&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2005.14165" rel="noopener noreferrer"&gt;Language Models are Few-Shot Learners&lt;/a&gt; — the GPT-3 paper&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2203.02155" rel="noopener noreferrer"&gt;Training language models to follow instructions with human feedback&lt;/a&gt; — the InstructGPT/RLHF paper&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://jalammar.github.io/illustrated-transformer/" rel="noopener noreferrer"&gt;The Illustrated Transformer&lt;/a&gt; — best visual walkthrough of attention mechanisms, no math required&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>llm</category>
    </item>
    <item>
      <title>Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin Implementation</title>
      <dc:creator>Zoricic</dc:creator>
      <pubDate>Sun, 22 Mar 2026 20:04:28 +0000</pubDate>
      <link>https://dev.to/zoricic/sub-agent-architectures-patterns-trade-offs-and-a-kotlin-implementation-13dh</link>
      <guid>https://dev.to/zoricic/sub-agent-architectures-patterns-trade-offs-and-a-kotlin-implementation-13dh</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Single-agent systems hit a wall fast. Give one agent too many tools and a complex task, and you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context bloat&lt;/strong&gt; — every intermediate result accumulates in the conversation history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool confusion&lt;/strong&gt; — a model with 20 tools tends to make worse decisions than one with 5 (tool-use accuracy degrades with scale; see the &lt;a href="https://gorilla.cs.berkeley.edu/" rel="noopener noreferrer"&gt;Gorilla LLM benchmark&lt;/a&gt; for empirical evidence)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No isolation&lt;/strong&gt; — a failed step can corrupt the entire conversation state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No parallelism&lt;/strong&gt; — sequential execution even when tasks are independent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Larger context windows reduce but don't eliminate these problems. A 1M-token context still degrades with irrelevant history, and tool confusion is a model behavior issue independent of window size. Architecture is the more scalable solution — but it's a trade-off, not a free lunch. Sub-agents add latency, coordination complexity, and cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Sub-Agent?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;sub-agent&lt;/strong&gt; is a separate LLM call with its own isolated conversation context, invoked by a parent agent to handle a specific, self-contained task. "Separate instance" is a common shorthand but can mislead — it's typically the same model, same API key, same underlying compute. The isolation is logical, not physical. Every sub-agent call is another round of LLM inference costs.&lt;/p&gt;

&lt;p&gt;What a sub-agent gets depends on how you design it. At minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A system prompt tailored to its job&lt;/li&gt;
&lt;li&gt;A lifecycle — it runs, returns a result, and is discarded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond that, context isolation is a &lt;strong&gt;design variable&lt;/strong&gt;, not a defining property of the pattern. In AIgent, sub-agents start with a fresh conversation history and never see the parent's history. In AutoGen, agents share a group chat history by default. In LangGraph, nodes can read and write shared state. Whether a sub-agent has isolated context or shared context depends entirely on the framework and implementation choices.&lt;/p&gt;

&lt;p&gt;This article describes AIgent's approach — isolated context, restricted tool sets — as one point in that design space, not the only valid one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────┐
│                  Parent Agent                    │
│  • Full tool set       • Persistent history      │
│  • User-facing         • Skill injection         │
└────────────────────┬─────────────────────────────┘
                     │
              [LLM model decides]
              to call spawn_agent
              tool with task + tools
                     │
                     ▼
          ┌──────────────────────┐
          │     Sub-Agent        │
          │  • Limited tools     │
          │  • Isolated context* │
          │  • Focused task      │
          └──────────┬───────────┘
                     │ returns result string
                     ▼
┌──────────────────────────────────────────────────┐
│           Parent Agent (continues)               │
└──────────────────────────────────────────────────┘

* Context isolation is a design choice — see note below.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  When Did Sub-Agents Emerge?
&lt;/h2&gt;

&lt;p&gt;Sub-agents didn't appear overnight. They evolved through distinct phases:&lt;/p&gt;

&lt;h3&gt;
  
  
  2022 — ReAct and Tool Use
&lt;/h3&gt;

&lt;p&gt;Tool-augmented LMs existed before 2022. What the &lt;strong&gt;ReAct&lt;/strong&gt; paper (Yao et al., 2022) contributed was empirical evidence that &lt;em&gt;interleaving&lt;/em&gt; reasoning traces with actions outperforms either alone — it beat action-only approaches (like WebGPT) and reasoning-only approaches (like chain-of-thought) on question-answering and fact-verification tasks. The interleaving structure — think, act, observe, think again — became the blueprint for agent loops that followed. Still single-agent, but foundational.&lt;/p&gt;

&lt;h3&gt;
  
  
  2023 — Autonomous Agent Experiments
&lt;/h3&gt;

&lt;p&gt;Projects like &lt;strong&gt;AutoGPT&lt;/strong&gt; and &lt;strong&gt;BabyAGI&lt;/strong&gt; showed what happens when you give a model the ability to spawn tasks and run them recursively. They were chaotic and unreliable, but they proved the concept: agents could delegate to themselves. The &lt;strong&gt;LangChain&lt;/strong&gt; and &lt;strong&gt;LlamaIndex&lt;/strong&gt; ecosystems formalized tool-using agents and chains, making sub-agent patterns accessible to developers.&lt;/p&gt;

&lt;h3&gt;
  
  
  2024 — Production-Grade Orchestration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;, &lt;strong&gt;AutoGen&lt;/strong&gt;, and &lt;strong&gt;LangGraph&lt;/strong&gt; moved the conversation from "can this work?" to "how do you build it reliably?" Role-based agent teams, graph-based execution, and structured handoffs between agents became first-class patterns. The focus shifted from raw autonomy to controlled delegation.&lt;/p&gt;

&lt;h3&gt;
  
  
  By 2025 — Native SDK Support
&lt;/h3&gt;

&lt;p&gt;API providers began shipping sub-agent patterns as first-class features. Anthropic's &lt;strong&gt;Claude Agent SDK&lt;/strong&gt; and Google's &lt;strong&gt;Gemini multi-agent APIs&lt;/strong&gt; made spawning isolated sub-agents a few lines of code rather than a framework to build yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Orchestration Patterns
&lt;/h2&gt;

&lt;p&gt;These are structural decisions about how agents are created, what work they do, and how results flow back. They're distinct from capability mechanisms (covered in the next section), which are about what information an agent has access to.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. ReAct Loop (Single Agent)
&lt;/h3&gt;

&lt;p&gt;The building block. The model reasons, acts, observes, and repeats until done.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input → [Think] → [Tool Call] → [Observe result] → [Think] → ... → Final answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every agent loop in every framework is a variation of this. In production, it's typically bounded by a max-iterations guard to prevent infinite loops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input → [Think] → [Tool Call] → [Observe] → ... (max N iterations) → Final answer
                                                              ↓
                                                    [Forced termination if N exceeded]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple to implement&lt;/li&gt;
&lt;li&gt;Easy to trace and debug&lt;/li&gt;
&lt;li&gt;No coordination overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context grows linearly with each tool call&lt;/li&gt;
&lt;li&gt;All tools visible at once — decision quality degrades with scale&lt;/li&gt;
&lt;li&gt;No parallelism&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Hierarchical Delegation (Orchestrator + Sub-Agents)
&lt;/h3&gt;

&lt;p&gt;A parent agent breaks work into tasks and delegates each to a specialized sub-agent with a restricted tool set and fresh context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                   Orchestrator                      │
│   Tools: [spawn_agent, search_web, get_time, ...]   │
└───────────┬──────────────────┬──────────────────────┘
            │                  │
            ▼                  ▼
   ┌──────────────┐   ┌──────────────────┐
   │  Sub-Agent A │   │   Sub-Agent B    │
   │  [search_web]│   │ [push_to_github] │
   └──────────────┘   └──────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight is &lt;strong&gt;tool isolation&lt;/strong&gt;: sub-agents only get the tools relevant to their task. This improves decision quality and limits blast radius.&lt;/p&gt;

&lt;p&gt;Context isolation — whether sub-agents see the parent's history or share state — is a design variable. Parallelism is structurally possible (sub-agents with isolated state have no ordering dependency), but requires explicit async wiring; a blocking implementation runs them sequentially.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool scoping improves decision quality per sub-task&lt;/li&gt;
&lt;li&gt;Parallelism is possible when sub-tasks are independent and isolated&lt;/li&gt;
&lt;li&gt;Limits damage from a misbehaving sub-agent (if contexts are isolated)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each sub-agent is a full LLM call — latency and cost multiply with every delegation&lt;/li&gt;
&lt;li&gt;Silent failures: a sub-agent that returns an empty or nonsensical result is indistinguishable from a valid response at the string boundary. Single-agent loops fail loudly (the model keeps trying); hierarchical delegation fails quietly (the parent receives garbage and reasons from it)&lt;/li&gt;
&lt;li&gt;No mechanism for sub-agents to ask for clarification; ambiguous tasks produce hallucinated results&lt;/li&gt;
&lt;li&gt;Coordination logic lives in the parent prompt — hard to test explicitly&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Plan &amp;amp; Execute
&lt;/h3&gt;

&lt;p&gt;The agent first produces a full plan (a list of steps), then executes each step — potentially spawning sub-agents per step. Planning and execution are separated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User input] → [Planning Agent] → [Step 1, Step 2, Step 3]
                                         │
                          ┌──────────────┼──────────────┐
                          ▼              ▼              ▼
                     [Executor 1]  [Executor 2]  [Executor 3]
                          └──────────────┼──────────────┘
                                         ▼
                                   [Synthesizer]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predictable execution path&lt;/li&gt;
&lt;li&gt;Easy to inspect and approve before running&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rigid — the plan can't adapt to what it discovers mid-execution&lt;/li&gt;
&lt;li&gt;Upfront planning requires good foresight from the model&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. Role-Based Multi-Agent Teams
&lt;/h3&gt;

&lt;p&gt;Multiple agents with distinct roles coordinate via shared memory or message passing. Common roles: Researcher, Writer, Critic, Executor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────┐    findings    ┌──────────┐    draft    ┌──────────┐
│Researcher│ ─────────────▶ │  Writer  │ ──────────▶ │  Critic  │
└──────────┘                └──────────┘             └──────────┘
                                                           │
                                                    feedback│
                                                           ▼
                                                    ┌──────────┐
                                                    │  Writer  │
                                                    └──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Used heavily in CrewAI and AutoGen. Great for creative or review-heavy pipelines; complex to coordinate reliably.&lt;/p&gt;




&lt;h2&gt;
  
  
  Capability Mechanisms
&lt;/h2&gt;

&lt;p&gt;These are distinct from orchestration patterns. They don't change how agents are spawned or how work is divided — they change what knowledge or behavior an agent has access to within a single context.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;How It Works&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Skill injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Relevant instructions prepended to system prompt&lt;/td&gt;
&lt;td&gt;Conditional behavior without spawning overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieved document chunks injected into the message&lt;/td&gt;
&lt;td&gt;Domain knowledge too large for static prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Previous conversation facts loaded into context&lt;/td&gt;
&lt;td&gt;Personalization, continuity across sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Skill-Based Injection
&lt;/h3&gt;

&lt;p&gt;The parent agent dynamically loads context ("skills") into its system prompt based on the user's request. No new LLM instance — just different instructions per request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Remind me to call John tomorrow at 9am"
        │
        ▼
┌──────────────────────────────────────────────────┐
│  selectRelevant("remind", allSkills)             │
│  → matches "reminders" skill                     │
│  → injects into system prompt                    │
└──────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In AIgent, &lt;code&gt;selectRelevant&lt;/code&gt; is word intersection — no embeddings, no LLM call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/main/kotlin/SkillLoader.kt&lt;/span&gt;
&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;selectRelevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;words&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lowercase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\\W+"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;toSet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;matched&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;skillWords&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" "&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lowercase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\\W+"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toSet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;intersect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skillWords&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ifEmpty&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// fallback: inject ALL skills if nothing matches&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It splits both the user input and each skill's name+description into lowercase words (≥3 chars) and checks for any overlap. Fast and cheap — no extra API calls. The fallback on the last line is significant: if nothing matches, every skill gets injected. With two skills that's fine; with twenty it's a prompt full of unrelated instructions.&lt;/p&gt;

&lt;p&gt;This is what "lightweight" means here. Alternative approaches — embedding similarity or a classifier LLM call — are more accurate but add latency and cost before the main response even starts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero extra API calls — no latency overhead&lt;/li&gt;
&lt;li&gt;Easy to add new behaviors: drop a &lt;code&gt;.md&lt;/code&gt; file in &lt;code&gt;skills/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Skills can reference specific tools, guiding the model without hard-coding logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Word intersection is brittle — "Set an alert" won't match a skill named "reminders" unless the words overlap&lt;/li&gt;
&lt;li&gt;Wrong skill injection is hard to detect: the model follows injected guidance even when irrelevant&lt;/li&gt;
&lt;li&gt;Fallback injects all skills when nothing matches — prompt can balloon unexpectedly&lt;/li&gt;
&lt;li&gt;No isolation — a buggy skill affects the whole agent, not just a sub-task&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Context Isolation&lt;/th&gt;
&lt;th&gt;Tool Scoping&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single ReAct loop&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Simple, focused tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hierarchical delegation&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;Design variable²&lt;/td&gt;
&lt;td&gt;Per sub-agent&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Multi-step workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plan &amp;amp; Execute&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Per step&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Predictable pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Role-based teams&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;Shared state&lt;/td&gt;
&lt;td&gt;Per role&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Creative, review workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill injection&lt;/td&gt;
&lt;td&gt;Capability mechanism&lt;/td&gt;
&lt;td&gt;Static (prompt-level)¹&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Conditional behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;td&gt;Capability mechanism&lt;/td&gt;
&lt;td&gt;Static (prompt-level)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Large knowledge bases&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;¹ Skill injection changes what context is active at prompt-construction time. The agent still runs in a single LLM context — there is no new instance or conversation boundary.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;² Context isolation in hierarchical delegation is a design variable. AIgent uses isolated contexts per sub-agent. AutoGen shares a group chat history by default. LangGraph nodes can read and write shared state.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use a single ReAct loop when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The task needs fewer than ~5 tool calls&lt;/li&gt;
&lt;li&gt;All tools are clearly relevant to the task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use skill injection when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent needs different behavioral modes&lt;/li&gt;
&lt;li&gt;You want guidance without spinning up new LLM instances&lt;/li&gt;
&lt;li&gt;Capabilities are well-defined but not always relevant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use sub-agents when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A sub-task is self-contained&lt;/li&gt;
&lt;li&gt;You want a model to operate with a focused tool set&lt;/li&gt;
&lt;li&gt;Tasks are independent and you're willing to implement async dispatch for parallelism&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prevent recursive spawning explicitly.&lt;/strong&gt; Don't rely on the model to know not to spawn sub-agents of sub-agents. Enforce it structurally — pass a &lt;code&gt;baseTools&lt;/code&gt; list (without the spawn tool) to sub-agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guard your ReAct loops.&lt;/strong&gt; Add a max-iterations limit. Without it, a model that gets stuck will keep calling tools until you run out of tokens or budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan for sub-agent failures.&lt;/strong&gt; The string &lt;code&gt;"(no response)"&lt;/code&gt; is indistinguishable from a valid empty result. Use a structured return type that separates success from error, and make the parent handle failure explicitly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Case Study: AIgent — A Kotlin Implementation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/filip505/AIgent" rel="noopener noreferrer"&gt;AIgent&lt;/a&gt; implements the hierarchical delegation pattern on top of Gemini's function calling API, with a skills system for capability injection. It's a working project — the code below is from the actual repo, not a simplified demo.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tool Interface
&lt;/h3&gt;

&lt;p&gt;Every capability — from web search to spawning agents — implements the same contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/main/kotlin/tools/AgentTool.kt&lt;/span&gt;
&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;AgentTool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;declaration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="n"&gt;chatId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="p"&gt;?):&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FunctionDeclaration&lt;/code&gt; is what gets passed to Gemini's function calling API. &lt;code&gt;handle()&lt;/code&gt; is what runs when the model calls the tool. Adding a new capability means implementing two things and nothing else.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Spawn Agent Tool
&lt;/h3&gt;

&lt;p&gt;The parent agent delegates by calling &lt;code&gt;spawn_agent&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/main/kotlin/tools/SpawnAgentTool.kt&lt;/span&gt;
&lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="n"&gt;chatId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="p"&gt;?):&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;task&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"error"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="s"&gt;"Missing task"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;toolNames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"error"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="s"&gt;"Missing tools"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;selectedTools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;availableTools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;orElse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;toolNames&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SubAgentService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectedTools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"result"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One scoping detail: &lt;code&gt;availableTools&lt;/code&gt; is the parent-defined pool. But &lt;code&gt;toolNames&lt;/code&gt; comes from &lt;code&gt;args["tools"]&lt;/code&gt; — meaning the &lt;strong&gt;model&lt;/strong&gt; picks which tools to request from that pool. The parent sets hard limits; the sub-agent selects within them. A hallucinating model can't exceed the pool, but it could request the wrong subset. For security-sensitive tools, validate &lt;code&gt;toolNames&lt;/code&gt; against an explicit allowlist.&lt;/p&gt;

&lt;p&gt;This &lt;code&gt;run()&lt;/code&gt; call is synchronous and blocking — sub-agents execute sequentially in the current implementation. The architecture supports parallelism (sub-agents share no state), but it requires explicit async wiring. With Kotlin coroutines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Parallel dispatch — not in current implementation, but straightforward to add&lt;/span&gt;
&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;spawnParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Pair&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&amp;gt;,&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;coroutineScope&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="nf"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dispatchers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;IO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;SubAgentService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;awaitAll&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;async&lt;/code&gt; block runs on the IO dispatcher — network-bound LLM calls benefit immediately. &lt;code&gt;awaitAll()&lt;/code&gt; collects results once all sub-agents finish.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Sub-Agent Service
&lt;/h3&gt;

&lt;p&gt;Each sub-agent runs its own ReAct loop in complete isolation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/main/kotlin/SubAgentService.kt&lt;/span&gt;
&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromParts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"You are a focused sub-agent. Complete the given task using the available tools. Be concise and return only the result."&lt;/span&gt;
        &lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;functionDeclarations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;history&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mutableListOf&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;))).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="py"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;functionCalls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;responseParts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mutableListOf&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;functionCalls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;orElse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;orElse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;args&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;orElse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;emptyMap&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"error"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="s"&gt;"Unknown function: ${fc.name().orElse("")}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;responseParts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromFunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;orElse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responseParts&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;"(no response)"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt — "Be concise and return only the result" — works well for self-contained tasks where the expected output is clear. The trade-off: the sub-agent has no way to ask for clarification. If the task is ambiguous, it will either hallucinate a plausible answer or return an empty result. The narrow prompt is a deliberate choice to keep sub-agents focused, but it means the parent must write unambiguous task descriptions. A task like "research the company" will fail silently; "find the founding year and current CEO of Stripe" will not.&lt;/p&gt;

&lt;p&gt;Notice what's structurally absent in AIgent's implementation: no parent history passed in, no access to tools outside the assigned set, no ability to spawn further sub-agents. This is a design choice — other frameworks (AutoGen, LangGraph) make different ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  Preventing Recursive Spawning
&lt;/h3&gt;

&lt;p&gt;This design decision prevents one specific class of runaway agent bug — sub-agents spawning further sub-agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/main/kotlin/AgentService.kt&lt;/span&gt;

&lt;span class="c1"&gt;// baseTools does NOT include SpawnAgentTool&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;baseTools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;githubDocs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt;
    &lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CurrentTimeTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;GoogleSearchTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;// Only the parent agent gets spawn_agent&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;AgentTool&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;baseTools&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SpawnAgentTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseTools&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sub-agents receive &lt;code&gt;baseTools&lt;/code&gt;. &lt;code&gt;SpawnAgentTool&lt;/code&gt; is only in &lt;code&gt;tools&lt;/code&gt; — the parent's list. This is enforced structurally, not by trusting the model.&lt;/p&gt;

&lt;p&gt;What it does &lt;em&gt;not&lt;/em&gt; prevent: a parent that misinterprets a sub-agent's result and spawns it again in a loop, or a sub-agent whose own ReAct &lt;code&gt;while&lt;/code&gt; loop runs indefinitely. &lt;code&gt;SubAgentService.run()&lt;/code&gt; has no max-iteration guard — a model that keeps generating function calls will keep looping until it hits the token limit or the API times out. A production implementation should add an iteration counter and a hard cutoff.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic Skill Injection
&lt;/h3&gt;

&lt;p&gt;Before each response, the agent injects relevant skills into its system prompt based on keyword matching:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/main/kotlin/AgentService.kt&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;buildConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;relevantSkills&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skillLoader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectRelevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allSkills&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildString&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;soul&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevantSkills&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n\n# Active Skills\n"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;relevantSkills&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n## ${skill.name}\n${skill.content}\n"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromParts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;functionDeclarations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills are markdown files with YAML frontmatter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reminders&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Schedule and manage reminders, alerts, and time-based notifications&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;schedule_reminder&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;list_reminders&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_current_time&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

Always call &lt;span class="sb"&gt;`get_current_time`&lt;/span&gt; first before scheduling a reminder...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero extra API calls, zero latency overhead. The brittle point is &lt;code&gt;selectRelevant&lt;/code&gt;: it matches on overlapping words between the input and the skill name/description. "Set an alert for Monday" won't match a skill named "reminders" unless "alert" or "Monday" appears in the skill's metadata.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Sub-agents solve three real problems: context that grows without bound, tools that dilute decision quality, and tasks that could run in parallel but don't. The trade-offs are real too: every sub-agent is an additional LLM call with its own latency and cost, coordination logic is hard to test, and a sub-agent that fails silently is worse than one that fails loudly.&lt;/p&gt;

&lt;p&gt;The patterns that matter most in practice: isolate tools per sub-task, prevent recursive spawning structurally, write unambiguous task descriptions, and handle errors explicitly at the boundary where sub-agents return results. The architecture is sound; the failure modes live in the implementation details.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/filip505/AIgent" rel="noopener noreferrer"&gt;AIgent on GitHub&lt;/a&gt; — The Kotlin agent that powers the case study in this article&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/a&gt; — The original 2022 paper&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://gorilla.cs.berkeley.edu/" rel="noopener noreferrer"&gt;Gorilla LLM Benchmark&lt;/a&gt; — Empirical data on tool-use accuracy at scale&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph Documentation&lt;/a&gt; — Graph-based agent orchestration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://microsoft.github.io/autogen/" rel="noopener noreferrer"&gt;AutoGen Multi-Agent Conversation Framework&lt;/a&gt; — Microsoft's role-based agent framework&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.anthropic.com/en/docs/agents" rel="noopener noreferrer"&gt;Anthropic Agent SDK&lt;/a&gt; — Native sub-agent support in Claude&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>kotlin</category>
      <category>llm</category>
    </item>
    <item>
      <title>You Built a Kotlin AI Agent. Should You Migrate to Koog?</title>
      <dc:creator>Zoricic</dc:creator>
      <pubDate>Thu, 05 Mar 2026 16:18:13 +0000</pubDate>
      <link>https://dev.to/zoricic/you-built-a-kotlin-ai-agent-should-you-migrate-to-koog-3bdi</link>
      <guid>https://dev.to/zoricic/you-built-a-kotlin-ai-agent-should-you-migrate-to-koog-3bdi</guid>
      <description>&lt;h1&gt;
  
  
  JetBrains Koog vs Google GenAI SDK — Should You Switch?
&lt;/h1&gt;

&lt;p&gt;You've built a working AI agent with Google GenAI. Now JetBrains releases Koog, a Kotlin-first AI framework. Should you migrate? Probably not — here's why.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;See also: &lt;a href="https://dev.to/zrcic/sqlite-as-a-vector-database-yes-really-4cm4"&gt;SQLite Embeddings for Local Agents&lt;/a&gt; for the RAG implementation referenced in this article.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When building AI agents in Kotlin, you have two main options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Google GenAI SDK&lt;/strong&gt; — Direct API wrapper, full control, manual tool loop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JetBrains Koog&lt;/strong&gt; — Higher-level framework, Kotlin DSL, built-in agent patterns&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both work. The question is: what do you gain and lose by switching?&lt;/p&gt;

&lt;h2&gt;
  
  
  What Koog Offers
&lt;/h2&gt;

&lt;p&gt;Koog is JetBrains' attempt to bring Kotlin-native ergonomics to AI agent development. It wraps multiple LLM providers (Gemini, OpenAI, Anthropic) behind a unified DSL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaner Tool Definitions
&lt;/h3&gt;

&lt;p&gt;With Google GenAI, tool definitions are verbose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;currentTimeDeclaration&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"get_current_time"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Get the current date and time"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emptyMap&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;())&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Koog simplifies this with a DSL approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;getCurrentTime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"get_current_time"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Get the current date and time"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;TimeResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automatic Tool Execution Loop
&lt;/h3&gt;

&lt;p&gt;With Google GenAI, you write the loop yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;functionCalls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;functionCalls&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;functionCalls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;!!&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;responseParts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mutableListOf&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;functionCalls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;handleFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;args&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;responseParts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromFunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functionResponseContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Koog handles this internally — you just define tools and let the framework orchestrate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Provider Support
&lt;/h3&gt;

&lt;p&gt;Koog abstracts the LLM provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Conceptual example — verify against current Koog docs&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;agent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simpleGeminiExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;toolRegistry&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildToolRegistry&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getCurrentTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Switching from Gemini to OpenAI means swapping the executor, not rewriting tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Lose
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fine-Grained Control
&lt;/h3&gt;

&lt;p&gt;Your manual loop lets you inject logic between iterations. For example, adding RAG context before each user message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;relevantChunks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevantChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;context&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;relevantChunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joinToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n\n"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="s"&gt;"Context:\n$context\n\nUser: $input"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Koog, you'd need to work within its abstraction — possible, but less direct.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maturity
&lt;/h3&gt;

&lt;p&gt;Google GenAI SDK is battle-tested. Koog shipped in 2024 and is still evolving. Documentation gaps and edge cases are expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simplicity
&lt;/h3&gt;

&lt;p&gt;Your current stack is lean. Adding Koog introduces another layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current:                With Koog:
┌──────────────────┐    ┌──────────────────┐
│    Your Code     │    │    Your Code     │
├──────────────────┤    ├──────────────────┤
│  Google GenAI    │    │      Koog        │
└──────────────────┘    ├──────────────────┤
                        │  Provider SDK    │
                        └──────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More layers = more things that can break.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Related Decision: Structured Output
&lt;/h2&gt;

&lt;p&gt;While evaluating frameworks, another question often comes up: should you use &lt;strong&gt;structured output&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Structured output forces the LLM to respond in a specific JSON schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;responseMimeType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;responseSchema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"intent"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="s"&gt;"confidence"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Known&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extracting specific fields from user input&lt;/li&gt;
&lt;li&gt;Building APIs that need predictable JSON responses&lt;/li&gt;
&lt;li&gt;Downstream processing requires consistent format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building conversational bots (like Telegram)&lt;/li&gt;
&lt;li&gt;Free-form responses are the goal&lt;/li&gt;
&lt;li&gt;You want the model to format naturally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a chat agent, free-form text is usually what you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Switch to Koog
&lt;/h2&gt;

&lt;p&gt;Consider Koog if:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Koog Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Supporting multiple LLM providers&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding 10+ tools with complex schemas&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using advanced patterns (subagents, planning)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Current code is becoming hard to maintain&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Just want cleaner syntax&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Don't switch just for syntactic sugar. Your 30-line tool loop isn't the bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stay with Google GenAI if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your agent works and you have full control&lt;/li&gt;
&lt;li&gt;You're only using Gemini&lt;/li&gt;
&lt;li&gt;RAG or custom logic is tightly integrated into your flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Koog if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're starting a new project with multi-provider requirements&lt;/li&gt;
&lt;li&gt;You need complex agent patterns out of the box&lt;/li&gt;
&lt;li&gt;You value Kotlin DSL ergonomics over explicit control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hybrid approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep Google GenAI as the backend&lt;/li&gt;
&lt;li&gt;Extract your tool definitions into a more DSL-like pattern&lt;/li&gt;
&lt;li&gt;Get some ergonomic wins without a full framework migration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;JetBrains Koog trades control for convenience. If you've already built a working agent with Google GenAI and manual tool orchestration, migration isn't worth it unless you need multi-provider support or advanced agent patterns. Your explicit loop gives you flexibility that abstractions hide. For new projects with complex requirements, Koog is worth evaluating — but for a functional Telegram bot with RAG, keep what works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/JetBrains/koog" rel="noopener noreferrer"&gt;JetBrains Koog GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://central.sonatype.com/artifact/com.google.genai/google-genai" rel="noopener noreferrer"&gt;Google GenAI SDK (Maven)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling" rel="noopener noreferrer"&gt;Gemini Function Calling Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/structured-output" rel="noopener noreferrer"&gt;Gemini Structured Output Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>kotlin</category>
      <category>tooling</category>
    </item>
    <item>
      <title>SQLite as a Vector Database — Yes, Really</title>
      <dc:creator>Zoricic</dc:creator>
      <pubDate>Wed, 04 Mar 2026 19:10:49 +0000</pubDate>
      <link>https://dev.to/zoricic/sqlite-as-a-vector-database-yes-really-4cm4</link>
      <guid>https://dev.to/zoricic/sqlite-as-a-vector-database-yes-really-4cm4</guid>
      <description>&lt;p&gt;Do you really need a vector database? For local AI agents, SQLite handles embeddings just fine — and it's already on your machine.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article explores a practical approach for local-first AI tools. For context on MCP transports and how agents communicate, see &lt;a href="https://dev.to/zrcic/understanding-mcp-server-transports-stdio-sse-and-http-streamable-5b1p"&gt;Understanding MCP Server Transports&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem with "Production-Grade" Vector Databases
&lt;/h2&gt;

&lt;p&gt;When building AI agents that use embeddings (for RAG, semantic search, or memory), the default advice is: "Use a vector database like Pinecone, Weaviate, or Chroma."&lt;/p&gt;

&lt;p&gt;But for local agents — tools running on your machine, personal assistants, or development prototypes — this advice creates unnecessary complexity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What you need:                    What you're told to set up:
┌─────────────────────┐          ┌─────────────────────────────┐
│ Store ~10K vectors  │          │ Docker container            │
│ Query occasionally  │    →     │ Separate database server    │
│ Works offline       │          │ API keys and auth           │
│ Easy to backup      │          │ Cloud dependencies          │
└─────────────────────┘          └─────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a mismatch between the problem and the solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQLite as an Embedding Store
&lt;/h2&gt;

&lt;p&gt;SQLite is a serverless, file-based database that ships with most operating systems. It turns out it's perfectly capable of storing and querying embeddings for local use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Approach: Store Embeddings as BLOBs
&lt;/h3&gt;

&lt;p&gt;The simplest approach stores embeddings as binary data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="nb"&gt;BLOB&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.nio.ByteBuffer&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.nio.ByteOrder&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.sql.Connection&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.serialization.json.Json&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.serialization.encodeToString&lt;/span&gt;

&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;storeEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;embeddingBlob&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ByteBuffer&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ByteOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LITTLE_ENDIAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;putFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepareStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddingBlob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeUpdate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ByteArray&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;buffer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ByteBuffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ByteOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LITTLE_ENDIAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="p"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getFloat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Similarity Search: Brute Force Works
&lt;/h3&gt;

&lt;p&gt;For local agents with thousands (not millions) of vectors, brute-force cosine similarity is fast enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Double&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;searchSimilar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;results&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mutableListOf&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepareStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"SELECT id, content, embedding, metadata FROM embeddings"&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeQuery&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;storedVec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;similarity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;storedVec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sortedByDescending&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;cosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;Double&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;dot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toList&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;sumOf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toDouble&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;normA&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kotlin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sumOf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toDouble&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;normB&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kotlin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sumOf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toDouble&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="p"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normA&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;normB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Reality Check
&lt;/h3&gt;

&lt;p&gt;How fast is brute-force search? Faster than you'd expect:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vectors&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Search Time&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;~5ms&lt;/td&gt;
&lt;td&gt;~6 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;~50ms&lt;/td&gt;
&lt;td&gt;~60 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;~500ms&lt;/td&gt;
&lt;td&gt;~600 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a local agent querying a personal knowledge base, 50ms is imperceptible. You don't need approximate nearest neighbors (ANN) algorithms until you're well past 100K vectors.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQLite-VSS: When You Need More Speed
&lt;/h2&gt;

&lt;p&gt;For larger datasets, &lt;strong&gt;sqlite-vss&lt;/strong&gt; adds vector search capabilities to SQLite using Facebook's FAISS library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Load the extension&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;load&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;vector0&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;load&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;vss0&lt;/span&gt;

&lt;span class="c1"&gt;-- Create a virtual table for vector search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VIRTUAL&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;vss_embeddings&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;vss0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- OpenAI embedding dimensions&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert vectors&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;vss_embeddings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rowid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Search with approximate nearest neighbors&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;rowid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;vss_embeddings&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;vss_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you sub-millisecond searches even at 1M+ vectors, while keeping everything in a single SQLite file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: Local-First AI Tools
&lt;/h2&gt;

&lt;p&gt;Many local-first AI tools use SQLite for codebase indexing and embeddings storage. The pattern is common — a single database file holds everything the agent needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────────────────────────────────────────┐
│                    Local AI Agent                     │
├───────────────────────────────────────────────────────┤
│                                                       │
│  ┌─────────────┐   ┌─────────────┐   ┌────────────┐  │
│  │  Embeddings │   │    Chat     │   │   Config   │  │
│  │  (vectors)  │   │   History   │   │  &amp;amp; State   │  │
│  └──────┬──────┘   └──────┬──────┘   └─────┬──────┘  │
│         │                 │                │         │
│         └─────────────────┼────────────────┘         │
│                           │                          │
│                  ┌────────▼────────┐                 │
│                  │    SQLite DB    │                 │
│                  │  (single file)  │                 │
│                  └─────────────────┘                 │
│                           │                          │
│                  ~/.agent/data.db                    │
└───────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything lives in one file. Backup? Copy the file. Reset? Delete it. Move to another machine? Take it with you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages of SQLite for Embeddings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Zero Infrastructure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pinecone setup:&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;pinecone-client
&lt;span class="c"&gt;# Create account, get API key, configure environment...&lt;/span&gt;

&lt;span class="c"&gt;# SQLite setup:&lt;/span&gt;
&lt;span class="c"&gt;# Nothing. It's already there.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No servers, no Docker, no accounts, no API keys. Your agent works offline, on airplanes, and in restricted networks.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Single-File Simplicity
&lt;/h3&gt;

&lt;p&gt;Your entire knowledge base is one &lt;code&gt;.db&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.my-agent/
├── config.json
└── knowledge.db    ← Everything: embeddings, chat history, metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes debugging trivial. You can open the database with any SQLite client and inspect exactly what's stored.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Atomic Transactions
&lt;/h3&gt;

&lt;p&gt;SQLite gives you ACID transactions for free:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;updateDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;docId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;newEmbedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;autoCommit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepareStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"DELETE FROM embeddings WHERE doc_id = ?"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeUpdate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;storeEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"doc_id"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;docId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;autoCommit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Either both succeed or both fail&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No partial updates, no orphaned embeddings, no consistency issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Rich Querying
&lt;/h3&gt;

&lt;p&gt;Combine vector search with SQL's full power. Pre-filter with SQL, then compute similarity in your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;searchRecentNonArchived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;candidates&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mutableListOf&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;// SQL handles filtering — only fetch what matches&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepareStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"""
        SELECT id, content, embedding, metadata
        FROM embeddings
        WHERE created_at &amp;gt; datetime('now', '-7 days')
          AND json_extract(metadata, '$.archived') != true
    """&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeQuery&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;storedVec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nc"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;storedVec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sortedByDescending&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try doing that with a pure vector database — most don't support SQL-level filtering before vector search.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Portable and Debuggable
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inspect your embeddings&lt;/span&gt;
sqlite3 knowledge.db &lt;span class="s2"&gt;"SELECT COUNT(*) FROM embeddings"&lt;/span&gt;

&lt;span class="c"&gt;# Backup before experimenting&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;knowledge.db knowledge.db.backup

&lt;span class="c"&gt;# Share with a colleague&lt;/span&gt;
scp knowledge.db user@other-machine:~/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Disadvantages and Limitations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. No Native Vector Operations
&lt;/h3&gt;

&lt;p&gt;SQLite doesn't understand vectors natively. You're either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Computing similarity in application code (fine for &amp;lt;100K vectors)&lt;/li&gt;
&lt;li&gt;Using sqlite-vss extension (adds complexity)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This works, but it's not "native"&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;similarities&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="nf"&gt;cosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;getEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Memory Constraints
&lt;/h3&gt;

&lt;p&gt;Brute-force search loads all vectors into memory. At 1M vectors with 1536 dimensions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,000,000 × 1,536 × 4 bytes = ~6 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's too much for most laptops. You'll need sqlite-vss or a different approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. No Built-in Sharding
&lt;/h3&gt;

&lt;p&gt;SQLite is single-file by design. If you need to scale across machines, you're on your own:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SQLite:
┌─────────┐
│ One DB  │  ← Can't split across servers
└─────────┘

Distributed vector DB:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │  ← Built-in distribution
└─────────┘ └─────────┘ └─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Write Concurrency
&lt;/h3&gt;

&lt;p&gt;SQLite uses file-level locking. Multiple processes writing simultaneously will block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Process A: Writing embeddings...&lt;/span&gt;
&lt;span class="c1"&gt;// Process B: Waiting for lock...&lt;/span&gt;
&lt;span class="c1"&gt;// Process C: Waiting for lock...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For single-user local agents, this is rarely a problem. For multi-user services, it's a showstopper.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. No Managed Updates or Filtering Indexes
&lt;/h3&gt;

&lt;p&gt;Dedicated vector databases offer features like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic re-indexing when embeddings change&lt;/li&gt;
&lt;li&gt;Pre-filtering indexes for metadata queries&lt;/li&gt;
&lt;li&gt;Quantization for memory efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With SQLite, you implement these yourself or go without.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use SQLite for Embeddings
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;SQLite?&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Personal knowledge base&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Thousands of docs, single user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local coding assistant&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Codebase fits in memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Development prototype&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast iteration, no setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat memory for agent&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Small dataset, needs persistence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production RAG service&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Need scale, concurrency, managed ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-tenant application&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Need isolation, distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time search at scale&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Need millisecond latency at 10M+ vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When NOT to Use SQLite
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Don't use SQLite when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have more than ~500K embeddings&lt;/li&gt;
&lt;li&gt;Multiple services need to write concurrently&lt;/li&gt;
&lt;li&gt;You need sub-10ms search latency at scale&lt;/li&gt;
&lt;li&gt;Your application is multi-tenant&lt;/li&gt;
&lt;li&gt;You're building a service that needs to scale horizontally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these cases, use a proper vector database: Pinecone for managed simplicity, Weaviate for open-source flexibility, or pgvector if you're already on PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Implementation Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Index Your Metadata
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_metadata_type&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.type'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_created_at&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pre-filter with SQL before computing similarities.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Batch Your Inserts
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;storeEmbeddingsBatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Triple&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;FloatArray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;?&amp;gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepareStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;blob&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ByteBuffer&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ByteOrder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LITTLE_ENDIAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;putFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeToString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addBatch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeBatch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Use WAL Mode for Better Concurrency
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;conn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DriverManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getConnection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"jdbc:sqlite:knowledge.db"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;createStatement&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PRAGMA journal_mode=WAL"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// Write-ahead logging&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PRAGMA synchronous=NORMAL"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Faster writes, still safe&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Consider Dimensionality Reduction
&lt;/h3&gt;

&lt;p&gt;If memory is tight, reduce embedding dimensions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Using Smile library for dimensionality reduction&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;smile.projection.PCA&lt;/span&gt;

&lt;span class="c1"&gt;// Reduce 1536 → 512 dimensions (loses some accuracy)&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;pca&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PCA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;originalEmbeddings&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;getProjection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;reducedEmbeddings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;originalEmbeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;pca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;SQLite is a legitimate choice for embedding storage in local AI agents. It's not a hack or a compromise — it's the right tool for single-user, offline-capable, moderate-scale use cases.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;most local agents don't need distributed vector databases&lt;/strong&gt;. They need simple, reliable, file-based storage that works without infrastructure. SQLite delivers exactly that.&lt;/p&gt;

&lt;p&gt;Start with plain SQLite and brute-force search. If you outgrow it (you probably won't for local use), add sqlite-vss. Only reach for Pinecone or Weaviate when you're building a service, not a tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.sqlite.org/docs.html" rel="noopener noreferrer"&gt;SQLite Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/asg017/sqlite-vss" rel="noopener noreferrer"&gt;sqlite-vss&lt;/a&gt; — Vector similarity search for SQLite&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/embeddings" rel="noopener noreferrer"&gt;OpenAI Embeddings Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/facebookresearch/faiss" rel="noopener noreferrer"&gt;FAISS&lt;/a&gt; — The library behind sqlite-vss&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt; — PostgreSQL alternative if you need more power&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.trychroma.com/" rel="noopener noreferrer"&gt;Chroma&lt;/a&gt; — Embedded vector database (SQLite under the hood!)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>rag</category>
      <category>sql</category>
    </item>
    <item>
      <title>Understanding MCP Server Transports: STDIO, SSE, and HTTP Streamable</title>
      <dc:creator>Zoricic</dc:creator>
      <pubDate>Tue, 03 Mar 2026 21:10:12 +0000</pubDate>
      <link>https://dev.to/zoricic/understanding-mcp-server-transports-stdio-sse-and-http-streamable-5b1p</link>
      <guid>https://dev.to/zoricic/understanding-mcp-server-transports-stdio-sse-and-http-streamable-5b1p</guid>
      <description>&lt;p&gt;If you've been exploring Claude Code or building AI tools, you've probably heard about MCP (Model Context Protocol). But when it comes to setting up your own MCP server, one question often comes up: &lt;strong&gt;which transport should I use?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article, I'll break down the three main MCP transports—STDIO, SSE, and HTTP Streamable—explain when each was introduced, what problems they solve, and help you choose the right one for your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is MCP?
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) is an open protocol that lets AI assistants like Claude connect to external tools and data sources. Think of it as a standardized way for AI to "talk" to your code.&lt;/p&gt;

&lt;p&gt;An MCP server exposes &lt;strong&gt;tools&lt;/strong&gt; (functions the AI can call) and &lt;strong&gt;resources&lt;/strong&gt; (data the AI can access). The &lt;strong&gt;transport&lt;/strong&gt; is simply how the client (like Claude Code) communicates with your server.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Transports at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transport&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Network&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;STDIO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local tools, scripts&lt;/td&gt;
&lt;td&gt;None (subprocess)&lt;/td&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web environments, legacy&lt;/td&gt;
&lt;td&gt;HTTP&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HTTP Streamable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Production, remote servers&lt;/td&gt;
&lt;td&gt;HTTP&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's dive into each one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Transport 1: STDIO (Standard Input/Output)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is it?
&lt;/h3&gt;

&lt;p&gt;STDIO is the simplest transport. Your MCP server runs as a &lt;strong&gt;subprocess&lt;/strong&gt;, and communication happens through stdin (input) and stdout (output)—the same streams you use when piping commands in a terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code  --stdin--&amp;gt;  Your Server
             &amp;lt;-stdout--
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When was it introduced?
&lt;/h3&gt;

&lt;p&gt;STDIO has been part of MCP since the protocol's initial release in &lt;strong&gt;November 2024&lt;/strong&gt;. It's the original transport for local tools and remains the most common choice for Claude Code integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  What problems does it solve?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No network setup&lt;/strong&gt;: No ports, no HTTP, no CORS issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple deployment&lt;/strong&gt;: Just run a script&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Runs locally with your permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to use STDIO
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Building local tools for Claude Code&lt;/li&gt;
&lt;li&gt;Quick prototypes and experiments&lt;/li&gt;
&lt;li&gt;Tools that access local files or system resources&lt;/li&gt;
&lt;li&gt;When you don't need network access&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Example
&lt;/h3&gt;

&lt;p&gt;Here's a minimal STDIO MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="cp"&gt;#!/usr/bin/env node
&lt;/span&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Server&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StdioServerTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/stdio.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CallToolRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ListToolsRequestSchema&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/types.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-mcp-server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Define tools&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Greet someone by name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Name to greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="c1"&gt;// Handle tool listing&lt;/span&gt;
&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ListToolsRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="c1"&gt;// Handle tool calls&lt;/span&gt;
&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;CallToolRequestSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Hello, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!`&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unknown tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt; &lt;span class="na"&gt;isError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Start server&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioServerTransport&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuration for Claude Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"my-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/path/to/server.js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros and Cons
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero network configuration&lt;/li&gt;
&lt;li&gt;Simple to implement and debug&lt;/li&gt;
&lt;li&gt;Secure by default (local only)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local only—can't share with others&lt;/li&gt;
&lt;li&gt;One client per server instance&lt;/li&gt;
&lt;li&gt;Requires the server script on the local machine&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Transport 2: SSE (Server-Sent Events)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is it?
&lt;/h3&gt;

&lt;p&gt;SSE transport uses HTTP with Server-Sent Events for real-time communication:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Client → Server&lt;/strong&gt;: HTTP POST requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server → Client&lt;/strong&gt;: SSE event stream (long-lived connection)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client  --POST /messages--&amp;gt;  Server
        &amp;lt;----SSE stream----
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When was it introduced?
&lt;/h3&gt;

&lt;p&gt;SSE transport was introduced in the MCP specification &lt;strong&gt;2024-11-05&lt;/strong&gt; as the first HTTP-based transport, enabling web-based integrations and remote access. However, it has since been &lt;strong&gt;deprecated&lt;/strong&gt; in favor of HTTP Streamable as of specification version &lt;strong&gt;2025-03-26&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What problems does it solve?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time updates&lt;/strong&gt;: Server can push messages without polling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP-friendly&lt;/strong&gt;: Works through firewalls and proxies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web compatibility&lt;/strong&gt;: Works in browsers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to use SSE
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Legacy systems that already use SSE&lt;/li&gt;
&lt;li&gt;When you specifically need the SSE pattern&lt;/li&gt;
&lt;li&gt;Web applications with existing SSE infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: For new projects, consider HTTP Streamable instead—it's the modern recommended approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="cp"&gt;#!/usr/bin/env node
&lt;/span&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;McpServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/mcp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SSEServerTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/sse.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpServer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-mcp-server-sse&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Greet someone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Name to greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Hello, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!`&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// SSE endpoint - client connects here&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/sse&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SSEServerTransport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nx"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;close&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Messages endpoint - client sends requests here&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;transports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handlePostMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No session&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3002&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SSE server on port 3002&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros and Cons
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time server push&lt;/li&gt;
&lt;li&gt;Works over HTTP (firewall-friendly)&lt;/li&gt;
&lt;li&gt;Good browser support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More complex than STDIO&lt;/li&gt;
&lt;li&gt;Requires maintaining long-lived connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deprecated&lt;/strong&gt; in favor of HTTP Streamable&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Transport 3: HTTP Streamable (Recommended)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is it?
&lt;/h3&gt;

&lt;p&gt;HTTP Streamable is the modern, recommended transport for remote MCP servers. It uses standard HTTP requests with optional streaming:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Client → Server&lt;/strong&gt;: HTTP POST requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server → Client&lt;/strong&gt;: HTTP responses (can be streamed)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client  --POST /mcp--&amp;gt;  Server
        &amp;lt;--Response---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When was it introduced?
&lt;/h3&gt;

&lt;p&gt;HTTP Streamable was introduced in the MCP specification version &lt;strong&gt;2025-03-26&lt;/strong&gt; as the recommended approach for remote servers. It replaces SSE as the primary HTTP transport and represents the evolution of MCP's HTTP transport capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  What problems does it solve?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scalability: Stateless architecture for horizontal scaling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unlike SSE which requires maintaining persistent connections, HTTP Streamable can operate in a fully stateless manner. This means you can deploy multiple instances of your MCP server behind a load balancer, and any instance can handle any request. When traffic spikes, simply spin up more server instances—no need to worry about connection affinity or sticky sessions.&lt;/p&gt;

&lt;p&gt;In practice, this looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    ┌─ MCP Server Instance 1
Client ─► Load ─────┼─ MCP Server Instance 2
          Balancer  └─ MCP Server Instance 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each request is independent, making auto-scaling straightforward with Kubernetes, AWS ECS, or any container orchestration platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication: Native HTTP auth support&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HTTP Streamable uses standard HTTP, which means you get the entire HTTP authentication ecosystem for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bearer tokens&lt;/strong&gt;: Pass JWT tokens in the &lt;code&gt;Authorization: Bearer &amp;lt;token&amp;gt;&lt;/code&gt; header&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API keys&lt;/strong&gt;: Use custom headers like &lt;code&gt;X-API-Key&lt;/code&gt; for simple integrations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth 2.0&lt;/strong&gt;: Implement full OAuth flows for enterprise applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mTLS&lt;/strong&gt;: Use client certificates for machine-to-machine authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a significant advantage over STDIO (which has no authentication) and SSE (where auth can be awkward to implement). Your existing auth middleware—whether it's Passport.js, Auth0, or a custom solution—works out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-client: One server, many users&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With HTTP Streamable, a single server deployment can handle requests from hundreds or thousands of different clients simultaneously. Each request carries its own session context, so there's no confusion about which client is which.&lt;/p&gt;

&lt;p&gt;Consider a team scenario:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developer A uses the MCP server from their laptop&lt;/li&gt;
&lt;li&gt;Developer B uses it from their desktop&lt;/li&gt;
&lt;li&gt;A CI/CD pipeline calls it for automated tasks&lt;/li&gt;
&lt;li&gt;A web dashboard makes calls for analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these clients hit the same server endpoint, each with their own authentication credentials and session state. The server doesn't need to maintain separate long-lived connections for each client.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure: Works with your existing HTTP stack&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HTTP Streamable doesn't require special infrastructure—it's just HTTP. This means you can leverage all the battle-tested tools you already know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load balancers&lt;/strong&gt;: AWS ALB, nginx, HAProxy all work without special configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API gateways&lt;/strong&gt;: Kong, AWS API Gateway, or Cloudflare can add rate limiting, caching, and monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reverse proxies&lt;/strong&gt;: Standard nginx or Traefik configs apply&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CDNs&lt;/strong&gt;: Edge caching for read-heavy operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt;: Prometheus, Datadog, New Relic—any HTTP metrics tool works&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging&lt;/strong&gt;: Standard access logs capture every request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to learn new tools or convince your ops team to adopt unfamiliar technology. HTTP Streamable fits into existing infrastructure patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use HTTP Streamable
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Production deployments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your MCP server moves from your laptop to serving real users, HTTP Streamable is the right choice. It provides the reliability, observability, and operational patterns that production systems require. You can deploy with confidence knowing that standard deployment practices—blue-green deployments, canary releases, health checks—all work as expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remote servers (not on the same machine as the client)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your MCP server needs to run somewhere other than the user's local machine, HTTP Streamable is designed for this. Whether your server lives in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A cloud VM (AWS EC2, Google Compute, DigitalOcean)&lt;/li&gt;
&lt;li&gt;A container platform (Kubernetes, ECS, Cloud Run)&lt;/li&gt;
&lt;li&gt;A serverless function (with some adaptations)&lt;/li&gt;
&lt;li&gt;A different machine on your local network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;HTTP provides the network transport you need. Unlike STDIO which requires the server to be a local subprocess, HTTP Streamable works across any network boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you need authentication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you need to answer "who is making this request?", use HTTP Streamable. Common scenarios include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Paid APIs&lt;/strong&gt;: Verify users have valid subscriptions before processing requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise deployments&lt;/strong&gt;: Integrate with corporate SSO (SAML, OIDC)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting per user&lt;/strong&gt;: Track and limit usage by authenticated identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit logging&lt;/strong&gt;: Record who did what for compliance requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature flags&lt;/strong&gt;: Enable different capabilities for different user tiers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multi-user applications&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building an MCP server that multiple people will use? HTTP Streamable handles this naturally. Each user authenticates independently, and the server can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintain per-user quotas and limits&lt;/li&gt;
&lt;li&gt;Store user-specific preferences or context&lt;/li&gt;
&lt;li&gt;Isolate data between users for security&lt;/li&gt;
&lt;li&gt;Scale to handle concurrent users without connection management headaches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cloud-hosted MCP servers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're deploying to any cloud platform, HTTP Streamable is the natural fit. Cloud providers are built around HTTP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless&lt;/strong&gt;: AWS Lambda, Google Cloud Functions, and Azure Functions all trigger on HTTP requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container services&lt;/strong&gt;: Cloud Run, App Runner, and Fargate expect HTTP workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PaaS platforms&lt;/strong&gt;: Heroku, Railway, and Render deploy HTTP apps seamlessly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You'll also benefit from cloud-native features like automatic HTTPS, DDoS protection, and global distribution—all without extra configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="cp"&gt;#!/usr/bin/env node
&lt;/span&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;McpServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/mcp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StreamableHTTPServerTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/streamableHttp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpServer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-mcp-server-http&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Greet someone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Name to greet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Hello, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!`&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleMcp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcp-session-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StreamableHTTPServerTransport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;sessionIdGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;onsessioninitialized&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No session&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;handleMcp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;handleMcp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcp-session-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;HTTP server on port 3001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuration for Claude Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"my-server-http"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:3001/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros and Cons
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production-ready&lt;/li&gt;
&lt;li&gt;Supports authentication&lt;/li&gt;
&lt;li&gt;Scalable (can be stateless)&lt;/li&gt;
&lt;li&gt;Works with standard HTTP infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More setup than STDIO&lt;/li&gt;
&lt;li&gt;Requires network access&lt;/li&gt;
&lt;li&gt;Need to handle session management&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want to understand the technical differences between SSE and HTTP Streamable?&lt;/strong&gt; Read the companion article: &lt;a href="https://dev.to/zrcic/deep-dive-sse-vs-http-streamable-whats-the-difference-1829"&gt;Deep Dive: SSE vs HTTP Streamable — What's the Difference?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Comparison: Which Transport Should You Choose?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;STDIO&lt;/th&gt;
&lt;th&gt;SSE&lt;/th&gt;
&lt;th&gt;HTTP Streamable&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Building a local tool?&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need remote access?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need authentication?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple clients?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production deployment?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simplest setup?&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Decision Flowchart
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Is this for local use only?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → Use &lt;strong&gt;STDIO&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No → Continue...&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Do you need production features (auth, scaling)?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → Use &lt;strong&gt;HTTP Streamable&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No → Continue...&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Do you have existing SSE infrastructure you must integrate with?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → Use &lt;strong&gt;SSE&lt;/strong&gt; (but plan to migrate)&lt;/li&gt;
&lt;li&gt;No → Use &lt;strong&gt;HTTP Streamable&lt;/strong&gt; (it's the modern default)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Getting Started with Templates
&lt;/h2&gt;

&lt;p&gt;I've created a template repository with all three transports: &lt;a href="https://github.com/filip505/my-mcp-server" rel="noopener noreferrer"&gt;my-mcp-server&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-mcp-server/
├── stdio/     # STDIO transport (local tools)
├── sse/       # SSE transport (Server-Sent Events)
└── http/      # HTTP Streamable transport (recommended for remote)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/fzoricic/my-mcp-server.git
&lt;span class="nb"&gt;cd &lt;/span&gt;my-mcp-server

&lt;span class="c"&gt;# Try STDIO (local)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;stdio &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm start

&lt;span class="c"&gt;# Or try HTTP Streamable (remote)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;http &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each template includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Working server with a &lt;code&gt;greet&lt;/code&gt; tool&lt;/li&gt;
&lt;li&gt;Clear comments explaining each part&lt;/li&gt;
&lt;li&gt;README with setup instructions&lt;/li&gt;
&lt;li&gt;Instructions for adding your own tools&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;MCP transports might seem confusing at first, but the choice is usually straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local tools?&lt;/strong&gt; → STDIO&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote/production?&lt;/strong&gt; → HTTP Streamable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy SSE system?&lt;/strong&gt; → SSE (but plan migration)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Start with STDIO for experimentation—it's the simplest. When you're ready to deploy remotely or need authentication, upgrade to HTTP Streamable.&lt;/p&gt;

&lt;p&gt;Happy building!&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/learn/architecture" rel="noopener noreferrer"&gt;MCP Architecture Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modelcontextprotocol.io/docs/changelog" rel="noopener noreferrer"&gt;MCP Specification Changelog&lt;/a&gt; — Track transport changes across versions&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/develop/build-server" rel="noopener noreferrer"&gt;Building MCP Servers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/@modelcontextprotocol/sdk" rel="noopener noreferrer"&gt;MCP SDK on npm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Deep Dive: SSE vs HTTP Streamable — What's the Difference?</title>
      <dc:creator>Zoricic</dc:creator>
      <pubDate>Tue, 03 Mar 2026 21:08:41 +0000</pubDate>
      <link>https://dev.to/zoricic/deep-dive-sse-vs-http-streamable-whats-the-difference-1829</link>
      <guid>https://dev.to/zoricic/deep-dive-sse-vs-http-streamable-whats-the-difference-1829</guid>
      <description>&lt;p&gt;You might be wondering: "Both SSE and HTTP Streamable work over HTTP, so why can't SSE do everything HTTP Streamable does?" Great question. The answer lies in how they manage connections and state.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is a companion article to &lt;a href="https://dev.to/zrcic/understanding-mcp-server-transports-stdio-sse-and-http-streamable-5b1p"&gt;Understanding MCP Server Transports: STDIO, SSE, and HTTP Streamable&lt;/a&gt;. If you're new to MCP transports, start there first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How SSE Works Under the Hood
&lt;/h2&gt;

&lt;p&gt;SSE uses a &lt;strong&gt;persistent, long-lived connection&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Client opens connection    GET /sse HTTP/1.1
                              Accept: text/event-stream

2. Server keeps it open       HTTP/1.1 200 OK
                              Content-Type: text/event-stream

                              data: {"event": "connected"}

                              data: {"event": "update"}

                              ... connection stays open ...

3. Client sends requests      POST /messages?sessionId=abc123
   on separate channel        {"method": "callTool", ...}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection-based&lt;/strong&gt;: Server maintains an open connection for each client&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateful by nature&lt;/strong&gt;: Server must track each connection in memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two channels&lt;/strong&gt;: SSE stream for server→client, POST requests for client→server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session affinity required&lt;/strong&gt;: The client's POST requests must reach the same server instance holding their SSE connection&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How HTTP Streamable Works Under the Hood
&lt;/h2&gt;

&lt;p&gt;HTTP Streamable uses &lt;strong&gt;standard request/response with optional streaming&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Client sends request       POST /mcp HTTP/1.1
                              Content-Type: application/json
                              Mcp-Session-Id: abc123

                              {"method": "callTool", ...}

2. Server responds            HTTP/1.1 200 OK
   (can stream if needed)     Content-Type: application/json

                              {"result": ...}

3. Connection closes          (or streams multiple responses)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Request/response based&lt;/strong&gt;: Each interaction is a complete HTTP request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless capable&lt;/strong&gt;: Server doesn't need to hold connections open&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single channel&lt;/strong&gt;: Everything goes through the same endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No affinity required&lt;/strong&gt;: Any server instance can handle any request&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why SSE Can't Match HTTP Streamable's Capabilities
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Persistent Connection Problem
&lt;/h3&gt;

&lt;p&gt;SSE requires the server to maintain an open connection for every client. This creates real problems at scale:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SSE with 1,000 clients:
┌─────────────────────────────────────┐
│           SSE Server                │
│  ┌─────┐ ┌─────┐ ┌─────┐           │
│  │Conn1│ │Conn2│ │...  │ │Conn1000││
│  └─────┘ └─────┘ └─────┘           │
│  Memory: ~1KB × 1000 = 1MB+        │
│  File descriptors: 1000            │
│  Cannot simply restart server      │
└─────────────────────────────────────┘

HTTP Streamable with 1,000 clients:
┌─────────────────────────────────────┐
│      HTTP Streamable Server         │
│                                     │
│  No persistent connections          │
│  Memory: only during requests       │
│  Can restart/deploy anytime         │
└─────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With SSE, if your server restarts (for a deployment, crash, or scaling event), all 1,000 clients lose their connections and must reconnect. With HTTP Streamable, clients just send their next request — they don't even notice.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Load Balancing Problem
&lt;/h3&gt;

&lt;p&gt;With SSE, you need "sticky sessions" — the load balancer must route each client to the same server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SSE Load Balancing (Complex):
                         ┌─ Server A (holds Client 1's connection)
Client 1 ─► Load ────────┤
            Balancer     │  Must remember: Client 1 → Server A
            (sticky)     │                 Client 2 → Server B
Client 2 ─►─────────────┼─ Server B (holds Client 2's connection)

If Server A dies, Client 1 loses connection and must reconnect!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP Streamable Load Balancing (Simple):
                         ┌─ Server A
Client 1 ─► Load ────────┤
            Balancer     │  Any request → Any server
            (stateless)  │
Client 2 ─►─────────────┼─ Server B

If Server A dies, next request just goes to Server B!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. The Authentication Problem
&lt;/h3&gt;

&lt;p&gt;SSE authentication is awkward because the EventSource API (used in browsers) doesn't support custom headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SSE: Can't set Authorization header easily&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/sse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// No header support!&lt;/span&gt;

&lt;span class="c1"&gt;// Workarounds are hacky:&lt;/span&gt;
&lt;span class="c1"&gt;// - Pass token in URL: /sse?token=xyz (visible in logs, insecure)&lt;/span&gt;
&lt;span class="c1"&gt;// - Use cookies (but what about non-browser clients?)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HTTP Streamable uses standard HTTP requests where auth is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// HTTP Streamable: Standard auth headers&lt;/span&gt;
&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/mcp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Bearer xyz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Clean and standard&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({...})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. The Infrastructure Problem
&lt;/h3&gt;

&lt;p&gt;Many infrastructure tools assume short-lived HTTP connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Infrastructure&lt;/th&gt;
&lt;th&gt;SSE Support&lt;/th&gt;
&lt;th&gt;HTTP Streamable Support&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS Lambda&lt;/td&gt;
&lt;td&gt;❌ Times out&lt;/td&gt;
&lt;td&gt;✅ Perfect fit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Run&lt;/td&gt;
&lt;td&gt;⚠️ Needs config&lt;/td&gt;
&lt;td&gt;✅ Works by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare&lt;/td&gt;
&lt;td&gt;⚠️ Connection limits&lt;/td&gt;
&lt;td&gt;✅ Standard HTTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateways&lt;/td&gt;
&lt;td&gt;⚠️ Often problematic&lt;/td&gt;
&lt;td&gt;✅ Native support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Corporate proxies&lt;/td&gt;
&lt;td&gt;⚠️ May kill long connections&lt;/td&gt;
&lt;td&gt;✅ No issues&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When SSE Still Makes Sense
&lt;/h2&gt;

&lt;p&gt;Despite these limitations, SSE isn't obsolete. Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need true push notifications&lt;/strong&gt;: SSE excels when the server needs to push updates unprompted (though HTTP Streamable can also stream)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser-native simplicity&lt;/strong&gt;: EventSource API is dead simple for basic use cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Existing SSE infrastructure&lt;/strong&gt;: If you've already built around SSE, migration may not be worth it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-server deployments&lt;/strong&gt;: The scaling issues don't matter if you only have one server&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;SSE and HTTP Streamable both use HTTP, but they represent fundamentally different architectural patterns:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;SSE&lt;/th&gt;
&lt;th&gt;HTTP Streamable&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Connection model&lt;/td&gt;
&lt;td&gt;Persistent&lt;/td&gt;
&lt;td&gt;Request/Response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server state&lt;/td&gt;
&lt;td&gt;Must maintain connections&lt;/td&gt;
&lt;td&gt;Can be stateless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaling model&lt;/td&gt;
&lt;td&gt;Vertical (bigger servers)&lt;/td&gt;
&lt;td&gt;Horizontal (more servers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load balancing&lt;/td&gt;
&lt;td&gt;Sticky sessions required&lt;/td&gt;
&lt;td&gt;Stateless routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth support&lt;/td&gt;
&lt;td&gt;Awkward&lt;/td&gt;
&lt;td&gt;Native HTTP auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure fit&lt;/td&gt;
&lt;td&gt;Specialized&lt;/td&gt;
&lt;td&gt;Universal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;SSE = Keep a phone line open&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;HTTP Streamable = Send text messages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both communicate, but one requires maintaining an open line while the other sends discrete messages that can be routed anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Should You Choose?
&lt;/h2&gt;

&lt;p&gt;For most new MCP projects, &lt;strong&gt;HTTP Streamable is the better choice&lt;/strong&gt;. It's more flexible, scales better, and works with modern cloud infrastructure out of the box.&lt;/p&gt;

&lt;p&gt;Use SSE only if you have a specific reason—like existing infrastructure or a genuine need for server-initiated push without polling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/yourhandle/understanding-mcp-server-transports-stdio-sse-and-http-streamable"&gt;Understanding MCP Server Transports&lt;/a&gt; — Overview of all three transports&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/filip505/my-mcp-server" rel="noopener noreferrer"&gt;MCP Template Repository&lt;/a&gt; — Working examples of all transports&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>api</category>
      <category>mcp</category>
      <category>networking</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
