<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: WonderLab</title>
    <description>The latest articles on DEV Community by WonderLab (@wonderlab).</description>
    <link>https://dev.to/wonderlab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3797373%2F25beba30-d8d4-4d2e-9ec6-170356089350.jpg</url>
      <title>DEV Community: WonderLab</title>
      <link>https://dev.to/wonderlab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wonderlab"/>
    <language>en</language>
    <item>
      <title>Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Sat, 23 May 2026 00:40:05 +0000</pubDate>
      <link>https://dev.to/wonderlab/agent-series-2-react-the-most-important-agent-reasoning-paradigm-2b7k</link>
      <guid>https://dev.to/wonderlab/agent-series-2-react-the-most-important-agent-reasoning-paradigm-2b7k</guid>
      <description>&lt;h2&gt;
  
  
  You Think Your Agent Is "Thinking." It's Actually Just Predicting Tokens.
&lt;/h2&gt;

&lt;p&gt;Here's a scenario that happens more often than you'd think.&lt;/p&gt;

&lt;p&gt;You ask an Agent to write a competitive analysis report. It confidently outputs three professional-looking pages — complete with data, conclusions, and strategic recommendations.&lt;/p&gt;

&lt;p&gt;There's just one problem: every number comes from its training data, which may be a year old. It didn't search. It didn't verify. It just generated text that sounds authoritative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's not thinking. That's fluent hallucination.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Chain-of-Thought (CoT) has the same fundamental problem. CoT prompting tells the model to "reason step by step" before answering, and it genuinely does improve accuracy on many tasks. But the model is still reasoning entirely within language space. It can generate a very coherent chain of thought that leads to a completely wrong answer — because its only information source is training data.&lt;/p&gt;

&lt;p&gt;ReAct was built to solve this.&lt;/p&gt;




&lt;h2&gt;
  
  
  ReAct: Reasoning + Acting, Interleaved
&lt;/h2&gt;

&lt;p&gt;In 2022, researchers from Princeton and Google published &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The core idea is elegantly simple: &lt;strong&gt;let the model alternate between reasoning and acting, rather than reasoning first then acting, or acting without reasoning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The concrete form is a three-part loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thought  →  Action  →  Observation
   ↑                         │
   └─────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Thought&lt;/strong&gt;: What the model is "thinking" — current analysis, what to do next, why&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action&lt;/strong&gt;: The actual tool call and parameters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observation&lt;/strong&gt;: The real result returned by the tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical mechanism: &lt;strong&gt;Observation is fed back into the model as new context&lt;/strong&gt;, allowing it to reason based on actual results. This creates the "think → act → observe → think again" loop.&lt;/p&gt;

&lt;p&gt;This one loop fixes CoT's core flaw: &lt;strong&gt;the model is no longer reasoning in isolation. It can interact with the real world and update its reasoning based on real feedback.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  A Concrete Example: Watching an Agent "Think"
&lt;/h2&gt;

&lt;p&gt;I built a complete ReAct Agent demo using LangGraph + GLM-4-Flash with two tools: &lt;code&gt;calculator&lt;/code&gt; (safe math evaluator) and &lt;code&gt;web_search&lt;/code&gt; (Bing search).&lt;/p&gt;

&lt;p&gt;Code: &lt;a href="https://github.com/chendongqi/llm-in-action/tree/main/agent-01-react-agent" rel="noopener noreferrer"&gt;agent-01-react-agent/react_agent.py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's an actual execution trace — Demo 3: search for the areas of Beijing and Shanghai, then calculate the difference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;════════════════════════════════════════════════════════════
  Demo 3 ▸ Multi-Round Search (Same Tool, Multiple Calls)
════════════════════════════════════════════════════════════

[User Question]
  First search for Beijing's area, then Shanghai's area,
  then calculate how much larger Beijing is in km².
────────────────────────────────────────────────────────────

[Step 1] THOUGHT → ACTION
  Action  : web_search(query='北京面积 平方公里')

  Observation : • Beijing area: Total area 16,410.54 km²...
────────────────────────────────────────────────────────────

[Step 2] THOUGHT → ACTION
  Action  : web_search(query='上海面积 平方公里')

  Observation : • Shanghai area: Land area approximately 6,340.5 km²...
────────────────────────────────────────────────────────────

[Step 3] THOUGHT → ACTION
  Action  : calculator(expression='16410.54 - 6340.5')

  Observation : 10070.04
────────────────────────────────────────────────────────────

[Final Answer]
  Beijing's area is approximately 16,410.54 km², Shanghai's is
  approximately 6,340.5 km². Beijing is about 10,070.04 km² larger.
════════════════════════════════════════════════════════════
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what happened here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Agent &lt;strong&gt;decided on its own&lt;/strong&gt; to search Beijing first, then Shanghai, then calculate — no hardcoded execution order&lt;/li&gt;
&lt;li&gt;Each search result (Observation) was read by the model and used to determine the next step&lt;/li&gt;
&lt;li&gt;The final calculation used real numbers extracted from real searches&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is ReAct's value: &lt;strong&gt;the execution path is planned dynamically at runtime, not hardcoded by the developer in advance.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ReAct vs. Chain-of-Thought: A Direct Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Chain-of-Thought&lt;/th&gt;
&lt;th&gt;ReAct&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Information source&lt;/td&gt;
&lt;td&gt;Training data only&lt;/td&gt;
&lt;td&gt;Training data + tool results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution path&lt;/td&gt;
&lt;td&gt;Reasoning in language space&lt;/td&gt;
&lt;td&gt;Think → real action → observe results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can access real-time data&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓ (via tools)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can execute computation/code&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✓ (via tools)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning verifiable&lt;/td&gt;
&lt;td&gt;Hard to verify&lt;/td&gt;
&lt;td&gt;Each Observation is a real result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk of side effects&lt;/td&gt;
&lt;td&gt;Low (no actions)&lt;/td&gt;
&lt;td&gt;High (requires safety boundaries)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One sentence summary: &lt;strong&gt;CoT makes the model think clearly. ReAct makes it think while doing.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a ReAct Agent with LangGraph
&lt;/h2&gt;

&lt;p&gt;Here's the core implementation. The code uses LangGraph's &lt;code&gt;create_react_agent&lt;/code&gt; — one of the cleanest ReAct implementations available.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Safe Calculator Tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="n"&gt;_SAFE_OPS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sub&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Div&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;truediv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mod&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;USub&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_eval_ast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AST&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Constant&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BinOp&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;op_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_SAFE_OPS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op_fn&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unsupported operator: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;op_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_eval_ast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;_eval_ast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnaryOp&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;op_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_SAFE_OPS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;op_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_eval_ast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;operand&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unsupported AST node: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluate a math expression. Supports + - * / ** % and parentheses.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_eval_ast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;SyntaxError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;ZeroDivisionError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calculation error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Why not just use &lt;code&gt;eval()&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;eval("__import__('os').system('rm -rf /')")&lt;/code&gt; — that line will execute a deletion on your machine. Tools are the Agent's "hands." Once an attacker manipulates the LLM through prompt injection, &lt;code&gt;eval()&lt;/code&gt; becomes a direct path to your system.&lt;/p&gt;

&lt;p&gt;AST parsing only allows math operation nodes — everything else is rejected. This is the foundational principle of safe tool design.&lt;br&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Web Search Tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;urllib.parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt;

&lt;span class="n"&gt;_BING_HEADERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept-Language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US,en;q=0.9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web and return the 3 most relevant snippets.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.bing.com/search?q=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;setlang=zh-CN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_BING_HEADERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;snippets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;li&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;li&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b_algo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;h2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;li&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;h2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;h2&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
            &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;li&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;snippets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;• &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;snippets&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;snippets&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No results found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequestException&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Building the Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="c1"&gt;# LangGraph V1.0 moved create_react_agent to chat_agent_executor submodule
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt.chat_agent_executor&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-4-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How much larger is Beijing than Shanghai in km²? Search and calculate.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]},&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion_limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three core lines: define tools → bind LLM → run. LangGraph handles all the message routing, tool call dispatch, result injection, and loop control under the hood.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;The correct import path for &lt;code&gt;create_react_agent&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangGraph V1.0 moved this function to &lt;code&gt;langgraph.prebuilt.chat_agent_executor&lt;/code&gt;. Importing from &lt;code&gt;langgraph.prebuilt&lt;/code&gt; triggers a &lt;code&gt;LangGraphDeprecatedSinceV10&lt;/code&gt; warning. Use the new path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Recommended
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt.chat_agent_executor&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;

&lt;span class="c1"&gt;# ⚠️ Triggers deprecation warning
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;








&lt;h2&gt;
  
  
  How the Message Flow Actually Works
&lt;/h2&gt;

&lt;p&gt;To truly understand ReAct, you need to see the underlying message sequence. Here's what the LLM receives at the start of each cycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Context passed to LLM at round N:
┌─────────────────────────────────────────────────────┐
│ [System]  You are an assistant with these tools:    │
│           calculator, web_search                    │
│                                                     │
│ [Human]   Question: How much larger is Beijing?     │
│                                                     │
│ [AI]      (tool call) web_search("Beijing area")   │  ← Round 1 Action
│ [Tool]    Beijing area: 16,410 km²                 │  ← Round 1 Observation
│                                                     │
│ [AI]      (tool call) web_search("Shanghai area")  │  ← Round 2 Action
│ [Tool]    Shanghai area: 6,340 km²                 │  ← Round 2 Observation
│                                                     │
│ ← LLM decides what to do next here →               │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each cycle, the entire history is passed to the LLM. The model "sees" all previous thoughts and observations, then decides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continue calling tools (more information needed)&lt;/li&gt;
&lt;li&gt;Stop and deliver a final answer (enough information gathered)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why it's called a loop — &lt;strong&gt;the model itself is the loop's termination condition.&lt;/strong&gt; It decides when to stop.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Things Go Wrong: Failure Modes and Guards
&lt;/h2&gt;

&lt;p&gt;The same "decide when to stop" design that makes ReAct powerful also introduces a risk: &lt;strong&gt;if the model misjudges, the loop never terminates.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Common runaway scenarios:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Tool keeps failing, model keeps retrying&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Action: web_search("vague ambiguous query")
Observation: No results found
Thought: Let me try different keywords
Action: web_search("different keywords")
Observation: No results found
Thought: Maybe one more variation...
(infinite loop)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scenario 2: Model misunderstands the task and pursues the wrong direction&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thought: I need the exact value of X
Action: calculator("...")
Observation: Approximate result
Thought: Not precise enough, I need more decimal places
Action: calculator("...")
(infinite pursuit of "precision")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scenario 3: Tools form a circular dependency&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thought: I need to know A before I can look up B
Action: search(A)
Observation: Requires knowing B first
Thought: I need to know B before I can look up A
(circular dependency)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangGraph's &lt;code&gt;recursion_limit&lt;/code&gt; parameter is the hard safety net:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)]},&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion_limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Force-stop after 5 steps
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the step count exceeds the limit, LangGraph raises &lt;code&gt;GraphRecursionError&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[recursion_limit triggered]
  Exception type: GraphRecursionError
  Message: Recursion limit of 5 reached without hitting a stop condition...

→ Conclusion: Always set a reasonable recursion_limit in production (15~25 recommended)
→ Too low: legitimate tasks get cut off; Too high: runaway Agent burns massive tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
&lt;strong&gt;How to set &lt;code&gt;recursion_limit&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple tasks (single tool call): 5–8 steps is enough&lt;/li&gt;
&lt;li&gt;Medium tasks (multi-tool, multi-step): 10–15 steps&lt;/li&gt;
&lt;li&gt;Complex research tasks: 20–25 steps&lt;/li&gt;
&lt;li&gt;Tasks requiring 30+ steps should reconsider architecture — you may need multi-Agent collaboration (covered in a later article)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rule of thumb: &lt;strong&gt;set it to roughly 2× the number of steps a successful execution needs.&lt;/strong&gt; Room to breathe, but a real ceiling.&lt;br&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  Five Demo Scenarios: From Simple to Complex
&lt;/h2&gt;

&lt;p&gt;The complete code includes 5 progressive demos covering the main ReAct usage patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo 1: Pure Calculation (single tool, single step)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: Calculate (1024 * 768) + (1920 * 1080)
Steps: calculator('(1024 * 768) + (1920 * 1080)') → 2860032
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Validates the basic tool-calling pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo 2: Search + Calculate (multi-tool, multi-step)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: What year were Python and JavaScript first released? Calculate the difference.
Steps: web_search("Python release year") → web_search("JavaScript release year") → calculator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows the Agent autonomously orchestrating different tools in the right order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo 3: Multi-round Search (same tool, multiple calls)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: How much larger is Beijing than Shanghai in km²?
Steps: web_search("Beijing area") → web_search("Shanghai area") → calculator → 10070.04
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows the Agent deciding what to search second based on what it found first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo 4: No Tools Needed (direct answer)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: Explain the ReAct paradigm in one sentence.
Steps: No tool calls — direct answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows the Agent knowing when &lt;strong&gt;not&lt;/strong&gt; to call tools. This matters as much as knowing when to call them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo 5: Trigger &lt;code&gt;recursion_limit&lt;/code&gt; (safety net demo)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: Search Python/Java/C release years, calculate the sum (~10 steps needed)
Limit: recursion_limit=5
Result: GraphRecursionError (correctly triggered)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Production safety mechanism verification.&lt;/p&gt;




&lt;h2&gt;
  
  
  An Interesting Observation: Agents Can "Luck Into" Correct Answers
&lt;/h2&gt;

&lt;p&gt;Demo 2 produced a result worth documenting carefully.&lt;/p&gt;

&lt;p&gt;The Agent searched for JavaScript's release year. The Bing snippet it received came from an article published in 2023 that mentioned Python's 1991 origin. The model appears to have confused "2023" (article publication date) with JavaScript's release year. The calculation step ran &lt;code&gt;2023 - 1991 = 32&lt;/code&gt;, returning 32.&lt;/p&gt;

&lt;p&gt;But the final answer was correct: "Python was released in 1991, JavaScript in 1995 — a 4-year difference."&lt;/p&gt;

&lt;p&gt;The model overrode its (incorrect) calculation result with its internal training knowledge and delivered the right answer.&lt;/p&gt;

&lt;p&gt;This reveals a subtle property of ReAct: &lt;strong&gt;an Agent's reasoning chain and its final answer can be decoupled.&lt;/strong&gt; The model may make errors during tool calls, then "self-correct" in the final answer generation using built-in knowledge.&lt;/p&gt;

&lt;p&gt;As an outcome, this is fine — you got the right answer. From an engineering perspective, it's a problem. &lt;strong&gt;If you need traceable, verifiable conclusions, "it happened to be correct" isn't sufficient.&lt;/strong&gt; This is one of the challenges that Harness Engineering addresses (covered in a later article in this series).&lt;/p&gt;




&lt;h2&gt;
  
  
  Trace Visualization: Making Agent Reasoning Observable
&lt;/h2&gt;

&lt;p&gt;A common production pain point: when something goes wrong, you don't know which step failed, because only the final answer is visible by default.&lt;/p&gt;

&lt;p&gt;Good practice: print the full Thought/Action/Observation sequence as a readable Trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolMessage&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;print_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[USER] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;repr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ACTION] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[FINAL ANSWER] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;obs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[OBSERVATION] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;obs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
&lt;strong&gt;GLM-4-Flash content field pollution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When using GLM-4-Flash, you may occasionally see raw JSON in &lt;code&gt;AIMessage.content&lt;/code&gt; — something like &lt;code&gt;{"index": 0, "delta": ...}&lt;/code&gt;. This is the model leaking internal streaming delta data into the content field.&lt;/p&gt;

&lt;p&gt;Fix: detect when content starts with &lt;code&gt;{&lt;/code&gt; or &lt;code&gt;[&lt;/code&gt; and can be parsed by &lt;code&gt;json.loads()&lt;/code&gt;, then discard it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_clean_thought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stripped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stripped&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;stripped&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stripped&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;  &lt;span class="c1"&gt;# leaked JSON, discard
&lt;/span&gt;        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete demo code already includes this handling.&lt;br&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  The Limitations of ReAct
&lt;/h2&gt;

&lt;p&gt;ReAct is powerful, but it's not a silver bullet. Knowing its limits helps you use it correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context window fills up fast&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each cycle packs the entire history into context. Step count grows, token consumption spikes. Complex tasks (20+ steps) may fail on models with limited context windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Tool descriptions drive everything — write them well&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ReAct relies entirely on the LLM understanding tool documentation to decide which tool to call and with what parameters. Vague docstrings lead to wrong tool selection. Tool descriptions are the invisible API of a ReAct system — treat them like API documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. No global planning capability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standard ReAct is greedy: each step only looks at the current state to decide the next move, with no "plan the whole thing first, then execute" capability. For tasks requiring long-horizon planning (like writing an entire codebase), this can get stuck in local optima. This is what the Plan-and-Solve paradigm addresses (Article 3 in this series).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Poor fault tolerance for tool failures&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a tool returns an error, the model has to infer the next step from the error message alone. There's no predefined retry strategy or fallback logic. This needs to be handled at the tool design level and the Harness layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Interview Prep: Articulate How Your Agent "Thinks"
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Common question: How does your Agent decide its next action?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many candidates answer "it calls tools." But what the interviewer actually wants to hear is: &lt;strong&gt;who decides which tool to call, and when does it stop?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A clear answer framework:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We use the ReAct paradigm. The core is a Thought → Action → Observation loop. At each step, the LLM looks at the full context — user question plus all previous Observations — and decides the next Action. The tool runs, its result is injected as a ToolMessage, and the model reasons again.&lt;/p&gt;

&lt;p&gt;The loop terminates when the LLM judges it has enough information and stops calling tools, generating the final answer directly.&lt;/p&gt;

&lt;p&gt;To prevent runaway loops, we set &lt;code&gt;recursion_limit&lt;/code&gt; (typically 15–25). When it's exceeded, we catch the exception and fall back to a degraded response. We also log the full Trace — every Action and Observation — so we can replay the entire reasoning chain when debugging."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Key differentiators: &lt;strong&gt;mentioning Trace observability and &lt;code&gt;recursion_limit&lt;/code&gt; shows you've thought beyond demos and considered production stability.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Three things from this article:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ReAct = Reasoning + Acting, interleaved&lt;/strong&gt;: The Thought → Action → Observation loop lets Agents update their reasoning based on real-world feedback. The fundamental difference from CoT: actions produce real results that feed back into the reasoning process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool design is ReAct's invisible interface&lt;/strong&gt;: Docstring quality directly determines how accurately the LLM selects tools. Safe implementation (AST instead of eval) determines whether the system boundary holds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;recursion_limit&lt;/code&gt; is a required production setting&lt;/strong&gt;: The model decides when to stop — that's inherently risky. &lt;code&gt;recursion_limit&lt;/code&gt; is the last line of defense. Recommended value: roughly 2× the steps needed for successful completion.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Next up&lt;/strong&gt;: Agent Series Article 3 — &lt;strong&gt;Plan-and-Solve: When ReAct Isn't Enough, How Agents Plan Before Acting&lt;/strong&gt;. We'll see where ReAct's greedy strategy hits its ceiling on complex tasks, and how introducing an explicit planning layer breaks through it.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Yao et al., &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/a&gt;, ICLR 2023&lt;/li&gt;
&lt;li&gt;&lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/datawhalechina/Hello-Agents" rel="noopener noreferrer"&gt;hello-agents Open Tutorial&lt;/a&gt; (Chapter 4)&lt;/li&gt;
&lt;li&gt;Demo code for this article: &lt;a href="https://github.com/chendongqi/llm-in-action/tree/main/agent-01-react-agent" rel="noopener noreferrer"&gt;agent-01-react-agent&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Welcome to visit my &lt;a href="https://home.wonlab.top" rel="noopener noreferrer"&gt;personal homepage&lt;/a&gt; for more useful knowledge and interesting products&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>llm</category>
      <category>langchain</category>
      <category>ai</category>
    </item>
    <item>
      <title>Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Sat, 23 May 2026 00:37:58 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-no73-sub2api-all-in-one-claudeopenaigemini-subscription-to-api-relay-235n</link>
      <guid>https://dev.to/wonderlab/open-source-project-no73-sub2api-all-in-one-claudeopenaigemini-subscription-to-api-relay-235n</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Fluidize your AI subscription quotas and maximize the value of every penny."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the No.73 article in the "One Open Source Project per Day" series. Today, we are exploring &lt;strong&gt;Sub2API&lt;/strong&gt; (also known as CRS2).&lt;/p&gt;

&lt;p&gt;With the rise of native AI power-tools like Claude Code and GitHub Copilot, many developers find themselves with multiple AI subscriptions (such as Claude Pro or OpenAI Plus). However, these subscriptions often come with usage limits or idle quotas. How can you consolidate these scattered subscription resources and distribute costs efficiently across different tools and users? &lt;strong&gt;Sub2API&lt;/strong&gt; provides a perfect open-source solution.&lt;/p&gt;

&lt;p&gt;It is more than just a simple forwarder; it is a full-featured API proxy platform that handles the entire pipeline — from account management and quota distribution to automated billing and built-in payments. It is particularly well-suited for team sharing, "carpooling" (shared-cost usage), or individual multi-account integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Will Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The core positioning of Sub2API and the pain points it solves.&lt;/li&gt;
&lt;li&gt;Supported mainstream models and subscription types.&lt;/li&gt;
&lt;li&gt;Core features: Multi-account management, intelligent scheduling, and Token-level billing.&lt;/li&gt;
&lt;li&gt;Rapid deployment methods: Script installation and Docker Compose.&lt;/li&gt;
&lt;li&gt;How to use Sub2API to build your own AI API relay service.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Basic understanding of AI model APIs (OpenAI, Claude, Gemini, etc.).&lt;/li&gt;
&lt;li&gt;Fundamental Linux command-line experience.&lt;/li&gt;
&lt;li&gt;Familiarity with Docker or containerized deployment concepts.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Overview
&lt;/h3&gt;

&lt;p&gt;Sub2API is an AI API gateway platform developed in Go. Its core logic is to "pool" AI subscriptions from various channels (including OAuth-authenticated accounts, Session Keys, or standard API Keys).&lt;/p&gt;

&lt;p&gt;With Sub2API, you can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Aggregate Resources&lt;/strong&gt;: Plug in multiple Claude Pro or OpenAI accounts and output a single, unified standard API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Share Costs&lt;/strong&gt;: Supports a "carpooling" mechanism with a built-in billing system to charge by usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration&lt;/strong&gt;: The generated APIs work flawlessly with native tools like Claude Code, OpenClaw, and more, without complex client-side configuration.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Author/Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maintained by&lt;/strong&gt;: &lt;a href="https://github.com/Wei-Shaw" rel="noopener noreferrer"&gt;Wei-Shaw&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ecosystem&lt;/strong&gt;: The project enjoys active community support, including a mobile admin console (sub2api-mobile) and other surrounding tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📄 &lt;strong&gt;Core Repository&lt;/strong&gt;: &lt;a href="https://github.com/Wei-Shaw/sub2api" rel="noopener noreferrer"&gt;Wei-Shaw/sub2api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🛠️ &lt;strong&gt;Tech Stack&lt;/strong&gt;: Go (Gin, Ent), Vue 3, PostgreSQL, Redis&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;License&lt;/strong&gt;: LGPL-3.0&lt;/li&gt;
&lt;li&gt;📈 &lt;strong&gt;Stats&lt;/strong&gt;: Over 22k Stars on GitHub (Note: May include historical repository data or high community interest).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Main Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Utility
&lt;/h3&gt;

&lt;p&gt;Sub2API solves the "resource island" problem in AI usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Upstream Account Pool (Claude, OpenAI, Gemini)
      ↓ Integration
Sub2API Platform Layer (Auth, Billing, Load Balancing, Session Persistence)
      ↓ Unified Distribution
Downstream Applications (IDEs, Chat clients, Scripts)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Account Management&lt;/strong&gt;: Supports various upstream account types and automatically handles session persistence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precise Billing&lt;/strong&gt;: Token-level usage tracking and cost calculation with customizable rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Scheduling&lt;/strong&gt;: Supports Sticky Sessions and load balancing to ensure continuity in long conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in Payment System&lt;/strong&gt;: Native support for Alipay, WeChat Pay, Stripe, etc., allowing users to top up autonomously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency &amp;amp; Rate Limiting&lt;/strong&gt;: Configure per-user and per-account limits to protect your resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin Dashboard&lt;/strong&gt;: Provides an intuitive Web UI for real-time monitoring and management.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Method 1: One-Click Script Installation (Recommended)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Recommended for clean Ubuntu/Debian systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/Wei-Shaw/sub2api/main/deploy/install.sh | &lt;span class="nb"&gt;sudo &lt;/span&gt;bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires PostgreSQL 15+ and Redis 7+ to be pre-installed.&lt;/li&gt;
&lt;li&gt;The script installs the binary to &lt;code&gt;/opt/sub2api&lt;/code&gt; and creates a systemd service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Method 2: Docker Compose Deployment&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create deployment directory&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; sub2api-deploy &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;sub2api-deploy

&lt;span class="c"&gt;# Download and run deployment preparation script&lt;/span&gt;
curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/Wei-Shaw/sub2api/main/deploy/docker-deploy.sh | bash

&lt;span class="c"&gt;# Spin up services&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once installed, access the admin dashboard at &lt;code&gt;http://YOUR_SERVER_IP:8080&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Detailed Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Architecture: Why is this more than just a Reverse Proxy?
&lt;/h3&gt;

&lt;p&gt;The design priority of Sub2API is &lt;strong&gt;"Account State Management."&lt;/strong&gt; Traditional reverse proxy tools (like Nginx) lack the ability to understand application-layer sessions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sticky Sessions&lt;/strong&gt;:
For tools like Claude Code that require context continuity, Sub2API uses the &lt;code&gt;session_id&lt;/code&gt; in the Header to lock a request to a specific upstream account, ensuring conversations aren't interrupted by account switching.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If using Nginx as a reverse proxy, ensure &lt;code&gt;underscores_in_headers on;&lt;/code&gt; is enabled to support session headers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pooling Logic&lt;/strong&gt;:&lt;br&gt;
The system abstracts multiple accounts into a single "resource pool." When one account hits a Rate Limit, the scheduler automatically routes traffic away from it, maximizing uptime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrated Ecosystem&lt;/strong&gt;:&lt;br&gt;
While most relay tools require an external payment gateway, Sub2API’s built-in integration significantly reduces the operational complexity for small teams or community-led "carpools."&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Technical Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: Go ensures high concurrency handling and ease of deployment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt;: PostgreSQL handles complex relationships and billing records.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache&lt;/strong&gt;: Redis manages rate limiting and real-time state synchronization.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Address and Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/Wei-Shaw/sub2api" rel="noopener noreferrer"&gt;Wei-Shaw/sub2api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Documentation&lt;/strong&gt;: Detailed guides on deployment, payment configuration, and API usage are available in the repository.&lt;/li&gt;
&lt;li&gt;🖥️ &lt;strong&gt;Demo&lt;/strong&gt;: &lt;a href="https://demo.sub2api.org/" rel="noopener noreferrer"&gt;https://demo.sub2api.org/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Target Audience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Carpool Leads&lt;/strong&gt;: Organizers looking to split the cost of Claude/OpenAI Plus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developers / Power Users&lt;/strong&gt;: Individuals wanting to consolidate multiple account quotas for native CLI tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal Enterprise Teams&lt;/strong&gt;: Teams needing to distribute and audit AI resource usage internally.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary and Outlook
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clear Focus&lt;/strong&gt;: Specialized in converting subscription-based quotas into standard API services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All-in-One&lt;/strong&gt;: A closed-loop for management, scheduling, billing, and payments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment Friendly&lt;/strong&gt;: Multiple options including script-based and Docker-based setups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliable Performance&lt;/strong&gt;: Built on a solid Go/PostgreSQL/Redis foundation suitable for medium-to-large distribution.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  One-Sentence Review
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Sub2API is currently the most tightly integrated open-source solution combining "resource aggregation" with a "commercial model," making it a powerful tool for achieving "subscription freedom."&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Visit my &lt;a href="https://home.wonlab.top" rel="noopener noreferrer"&gt;homepage&lt;/a&gt; for more useful insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>apigateway</category>
      <category>claude</category>
    </item>
    <item>
      <title>One Open Source Project a Day (No. 72): Andrej Karpathy Skills — Fix Four Chronic LLM Coding Problems With a Single CLAUDE.md</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Fri, 22 May 2026 07:00:16 +0000</pubDate>
      <link>https://dev.to/wonderlab/one-open-source-project-a-day-no-72-andrej-karpathy-skills-fix-four-chronic-llm-coding-4afo</link>
      <guid>https://dev.to/wonderlab/one-open-source-project-a-day-no-72-andrej-karpathy-skills-fix-four-chronic-llm-coding-4afo</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"LLMs excel at looping until they meet specific goals — so provide success criteria rather than imperative instructions."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the NO.72 article in the "One Open Source Project a Day" series. Today we are exploring &lt;strong&gt;andrej-karpathy-skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This project is unusual: its core is not a tool, framework, or library — &lt;strong&gt;it is a single CLAUDE.md file&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The story starts with Andrej Karpathy posting on X after heavy Claude Code usage, documenting failure patterns he observed in LLM coding: diving into implementation without clarification, engineering simple problems into complex solutions, making unrequested changes to adjacent code.&lt;/p&gt;

&lt;p&gt;The multica-ai team distilled those observations into four actionable behavioral principles and packaged them into a CLAUDE.md — drop it in a project, and Claude Code changes how it behaves. It also ships in Claude Code plugin and Cursor rules formats, covering the two main AI coding tools.&lt;/p&gt;

&lt;p&gt;The project answers a frequently overlooked question: &lt;strong&gt;rather than teaching an LLM exactly what to do, teach it how to think&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Will Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The four LLM coding failure modes Karpathy identified&lt;/li&gt;
&lt;li&gt;The content and real-world examples behind each of the four principles&lt;/li&gt;
&lt;li&gt;Three installation methods: standalone CLAUDE.md / Claude Code plugin / Cursor rules&lt;/li&gt;
&lt;li&gt;Why "give success criteria" is more effective than "give step-by-step instructions"&lt;/li&gt;
&lt;li&gt;How to verify the guidelines are actually working&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with Claude Code, Cursor, or similar AI coding tools&lt;/li&gt;
&lt;li&gt;Enough hands-on coding experience to recognize LLM coding pain points&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Introduction
&lt;/h3&gt;

&lt;p&gt;andrej-karpathy-skills is fundamentally a behavioral configuration file. Its design philosophy stems from one key insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;LLM coding problems are often not about capability — they are about unconstrained behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model is capable of writing simple code, but nothing tells it "don't write complex code." It is capable of asking for clarification first, but there is no pressure making it do so. It knows it shouldn't touch unrelated code, but the habit of "while I'm at it" is hard to suppress.&lt;/p&gt;

&lt;p&gt;This CLAUDE.md file makes those constraints explicit, injecting them into every conversation as context.&lt;/p&gt;

&lt;p&gt;The project ships in three formats to cover different workflows and tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author/Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maintainer&lt;/strong&gt;: &lt;a href="https://github.com/multica-ai" rel="noopener noreferrer"&gt;multica-ai&lt;/a&gt; (Multica AI team)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inspiration&lt;/strong&gt;: Andrej Karpathy's observations shared on X about LLM coding usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Original author&lt;/strong&gt;: The CLAUDE.md content was originally compiled by &lt;a href="https://github.com/forrestchang" rel="noopener noreferrer"&gt;forrestchang&lt;/a&gt;; multica-ai extended it into a full plugin ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;About Andrej Karpathy: Co-founder of OpenAI, former Tesla AI Director, now independent researcher. Known for nanoGPT, &lt;em&gt;Neural Networks: Zero to Hero&lt;/em&gt;, and other educational projects widely followed in the AI community. His practical feedback on AI tools carries significant weight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📄 Core file: &lt;code&gt;CLAUDE.md&lt;/code&gt; (behavioral guidelines)&lt;/li&gt;
&lt;li&gt;🔌 Claude Code plugin + Cursor rules&lt;/li&gt;
&lt;li&gt;📖 Includes: &lt;code&gt;EXAMPLES.md&lt;/code&gt; (contrast examples for each principle)&lt;/li&gt;
&lt;li&gt;📄 License: MIT&lt;/li&gt;
&lt;li&gt;🌐 Repository: &lt;a href="https://github.com/multica-ai/andrej-karpathy-skills" rel="noopener noreferrer"&gt;multica-ai/andrej-karpathy-skills&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Main Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Utility
&lt;/h3&gt;

&lt;p&gt;This CLAUDE.md directly targets the four most common LLM coding failure modes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Common LLM coding failures
  ├── Silent assumptions (dive into code without clarifying)
  ├── Over-engineering (turn simple problems into complex ones)
  ├── Scope creep (touch unrelated code while "in the area")
  └── Vague execution (no verifiable definition of done)
        ↓ CLAUDE.md injects four behavioral principles
  Changed behavior
  ├── Think Before Coding
  ├── Simplicity First
  ├── Surgical Changes
  └── Goal-Driven Execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Method 1: Claude Code plugin (recommended, global)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;andrej-karpathy-skills@karpathy-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applies to all projects after install — no per-project setup needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Method 2: Per-project CLAUDE.md&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
curl &lt;span class="nt"&gt;-o&lt;/span&gt; CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your project already has a CLAUDE.md, merge the content manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Method 3: Cursor rules&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The repo ships &lt;code&gt;.cursor/rules/karpathy-guidelines.mdc&lt;/code&gt; with &lt;code&gt;alwaysApply: true&lt;/code&gt;. Copy it into your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; .cursor/rules
&lt;span class="nb"&gt;cp &lt;/span&gt;path/to/karpathy-guidelines.mdc .cursor/rules/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify it's active: Cursor → Settings → Rules — the guideline should appear in the list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signs the guidelines are working&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude asks clarifying questions before implementing, rather than diving in&lt;/li&gt;
&lt;li&gt;PR diffs are cleaner — only the requested changes appear&lt;/li&gt;
&lt;li&gt;No more "while I was there, I also improved..." in commit messages&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Four Principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Principle 1: Think Before Coding
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem it targets&lt;/strong&gt;: When faced with ambiguous requirements, LLMs tend to make implicit assumptions and execute silently, rather than surfacing the uncertainty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong behavior example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Add a feature to export user data
LLM: [immediately starts writing code]
     → Assumes CSV format, assumes all fields, assumes synchronous download
     → But the user actually needed a background job + specific fields + JSON format
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Correct behavior&lt;/strong&gt; (the principle requires surfacing uncertainty first):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Add a feature to export user data
LLM: Before I start, a few things to clarify:
     1. Scope: Export all users or a filtered subset?
     2. Format: CSV / JSON / Excel?
     3. Delivery: Browser download, background job via email, or API endpoint?
     4. Fields: Which fields to include? Any sensitive data restrictions?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Another canonical example&lt;/strong&gt; — "Make search faster" has three completely different valid meanings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Interpretation&lt;/th&gt;
&lt;th&gt;Actual Need&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Response time&lt;/td&gt;
&lt;td&gt;API returns slowly&lt;/td&gt;
&lt;td&gt;Add indexes, caching, optimize queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;High concurrency&lt;/td&gt;
&lt;td&gt;Horizontal scaling, queuing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Perceived UX speed&lt;/td&gt;
&lt;td&gt;User feels it is slow&lt;/td&gt;
&lt;td&gt;Preloading, skeleton screens, instant feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three require fundamentally different approaches. None can be "defaulted to."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the principle requires&lt;/strong&gt;: When multiple reasonable interpretations exist, present all of them and let the user choose. When genuinely confused, stop and say "I'm not sure how to handle this" rather than pushing through.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 2: Simplicity First
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem it targets&lt;/strong&gt;: LLMs have a strong over-engineering tendency — introducing abstractions, frameworks, and "flexibility" before complexity is actually needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong behavior example&lt;/strong&gt; — discount calculation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# User request: implement a simple discount calculation
&lt;/span&gt;
&lt;span class="c1"&gt;# ❌ LLM's "solution" (10x the code needed):
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DiscountStrategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ABC&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@abstractmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PercentageDiscount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DiscountStrategy&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DiscountConfig&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DiscountCalculator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DiscountStrategy&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Cart&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;# ...plus factory class, config class, registry...
&lt;/span&gt;
&lt;span class="c1"&gt;# ✅ What was actually needed (one function):
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply_discount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;discount_pct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;discount_pct&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Another example&lt;/strong&gt; — "Save user preferences":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ LLM implemented:
   - A caching layer with expiry (nobody asked)
   - Input validation (no bad data has appeared yet)
   - Conflict merging logic (nobody hit this problem)
   - A change notification system (nobody mentioned it)

✅ What was actually needed:
   - One function that writes preferences to the database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The benchmark the principle provides&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Would a senior engineer look at this and say it's overcomplicated?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If 200 lines could be 50, rewrite it as 50.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core maxim&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Good code is code that solves today's problem simply, not tomorrow's problem prematurely."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Premature complexity is not just wasteful — it makes code harder to understand, introduces more bugs, and slows development, even when it follows recognized design patterns.&lt;/p&gt;




&lt;h3&gt;
  
  
  Principle 3: Surgical Changes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem it targets&lt;/strong&gt;: LLMs do "drive-by refactoring" — while fixing a bug, they also update quote styles, add type annotations, rename variables, and reorganize imports.&lt;/p&gt;

&lt;p&gt;This behavior feels helpful but has two serious problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Makes diffs hard to review&lt;/strong&gt;: Reviewers cannot distinguish which changes are bug fixes from which are "while I was there" improvements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Introduces unexpected regressions&lt;/strong&gt;: Every unrequested change is a potential risk point&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The right approach&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Original code (has a bug, and some "imperfections")
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_total&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# single quotes
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;                 &lt;span class="c1"&gt;# no type annotation
&lt;/span&gt;
&lt;span class="c1"&gt;# ❌ LLM's "comprehensive improvement":
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_total&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# added type annotation
&lt;/span&gt;    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate total price of items.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;          &lt;span class="c1"&gt;# added docstring
&lt;/span&gt;    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;                             &lt;span class="c1"&gt;# changed variable type
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# changed to double quotes ("style consistency")
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Only fix the bug (suppose the bug is empty items):
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_total&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;               &lt;span class="c1"&gt;# only this one line added
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# original style preserved
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Specific requirements of the principle&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every changed line must trace directly to the user's request&lt;/li&gt;
&lt;li&gt;Match existing code style even if you prefer a different one&lt;/li&gt;
&lt;li&gt;Do not improve code you happen to pass through unless explicitly asked&lt;/li&gt;
&lt;li&gt;Only clean up unused imports/variables that &lt;strong&gt;your changes&lt;/strong&gt; created — leave pre-existing dead code alone&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Principle 4: Goal-Driven Execution
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem it targets&lt;/strong&gt;: Given a vague task, LLMs produce plans that look comprehensive but lack verifiable outcomes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A vague plan example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "Refactor the auth module"

❌ Vague plan:
   1. Review existing code
   2. Identify problems
   3. Improve structure
   4. Run tests
   → Not a single step has a clear definition of "done"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The principle requires converting tasks to verifiable goals&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "Fix the login bug"

✅ Goal-driven plan:
   Step 1: Write a failing test that reproduces the bug
           Checkpoint: Test actually fails on current code
   Step 2: Implement the fix
           Checkpoint: The test now passes
   Step 3: Run the full test suite
           Checkpoint: No new regressions
   → Every step has a clear, objective definition of "complete"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Another example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "Refactor the auth module"

✅ Concretized:
   1. All existing tests pass (record baseline)
   2. Extract TokenService (checkpoint: standalone unit tests pass)
   3. Refactor AuthController to use TokenService (checkpoint: integration tests pass)
   4. All original tests still pass (no regressions)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Karpathy's core insight&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;LLMs excel at "looping until they meet specific goals" — so providing success criteria is more effective than providing imperative instructions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Imperative instructions ("do A, then B, then C") leave the LLM without guidance when something goes wrong. Declarative goals ("this test must pass," "this interface must be callable") let the LLM choose its own path while giving it a clear completion criterion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a File, Not a Tool?
&lt;/h2&gt;

&lt;p&gt;The design choice of this project is worth reflecting on. Faced with LLM coding behavior problems, many solutions are possible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build an agent framework to constrain behavior&lt;/li&gt;
&lt;li&gt;Develop post-processing tools to detect and correct issues&lt;/li&gt;
&lt;li&gt;Fine-tune a better model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;andrej-karpathy-skills chose &lt;strong&gt;the simplest one&lt;/strong&gt;: a text file, placed in the project, which the LLM reads and follows itself.&lt;/p&gt;

&lt;p&gt;This choice is itself the best demonstration of "Simplicity First" — minimum mechanism, today's problem solved. And a text file has one advantage no tool can match: &lt;strong&gt;it can be read, understood, and modified by anyone at any time&lt;/strong&gt;, with no black box.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Links &amp;amp; Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/multica-ai/andrej-karpathy-skills" rel="noopener noreferrer"&gt;https://github.com/multica-ai/andrej-karpathy-skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;Direct CLAUDE.md download&lt;/strong&gt;: Available via &lt;code&gt;curl&lt;/code&gt; (see Quick Start above)&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Examples&lt;/strong&gt;: &lt;code&gt;EXAMPLES.md&lt;/code&gt; in the repo (contrast examples for each principle — recommended reading)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Target Audience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily Claude Code / Cursor users&lt;/strong&gt;: Who want to reduce LLM over-engineering and unnecessary code changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team engineering productivity leads&lt;/strong&gt;: Looking to integrate behavioral standards into a shared CLAUDE.md and standardize AI-assisted coding across a team&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developers who care about reviewable PRs&lt;/strong&gt;: Who are tired of LLM-generated "super diffs" and want clean, focused pull requests containing only the requested changes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Origin&lt;/strong&gt;: Directly distilled from Karpathy's first-hand observations of LLM coding failure modes — grounded in real usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Four principles&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Think Before Coding&lt;/strong&gt;: Make implicit assumptions explicit questions rather than silently picking one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity First&lt;/strong&gt;: Write the minimum code to solve today's problem, don't pre-build "flexibility"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Surgical Changes&lt;/strong&gt;: Every changed line traces to the request; no drive-by refactoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal-Driven Execution&lt;/strong&gt;: Provide success criteria, not step-by-step instructions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three install formats&lt;/strong&gt;: CLAUDE.md (per-project) / Claude Code plugin (global) / Cursor rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core philosophy&lt;/strong&gt;: LLMs excel at looping toward goals — give them goals, not procedures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-demonstrating&lt;/strong&gt;: The simplest possible solution (one file) to the problem it addresses — Simplicity First, embodied&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  One-Line Review
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;andrej-karpathy-skills does something deceptively small but far-reaching: compresses the engineering wisdom of "how to use LLMs well for coding" into a single text file anyone can read, understand, and drop into any project — and that file itself is the best proof of the simple-first philosophy it advocates.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>claude</category>
      <category>karpathy</category>
    </item>
    <item>
      <title>Building Reliable AI Agents: Harness Engineering and Multi-Agent Architecture in Practice</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Fri, 22 May 2026 06:58:31 +0000</pubDate>
      <link>https://dev.to/wonderlab/building-reliable-ai-agents-harness-engineering-and-multi-agent-architecture-in-practice-3jbn</link>
      <guid>https://dev.to/wonderlab/building-reliable-ai-agents-harness-engineering-and-multi-agent-architecture-in-practice-3jbn</guid>
      <description>&lt;h2&gt;
  
  
  The Problems We Actually Ran Into
&lt;/h2&gt;

&lt;p&gt;If you've built an AI-assisted analysis tool, you've probably hit these two walls:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall #1: Inconsistent output quality.&lt;/strong&gt; The longer the task chain, the more the AI drifts — its language stays precise, its tone stays confident, but the conclusions don't hold up. Ask it "are you sure?" and it'll double down with even more conviction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall #2: Token costs keep climbing.&lt;/strong&gt; The more history you accumulate, the more you have to re-feed the model on every new session. Token consumption grows linearly. Analysis quality doesn't follow.&lt;/p&gt;

&lt;p&gt;These are real problems we encountered building a CarPlay bug analysis tool. AI performance on multi-project, long-chain bug analysis was wildly inconsistent. When we introduced a multi-agent architecture to fix that, token consumption and runtime shot up instead.&lt;/p&gt;

&lt;p&gt;Those two problems pushed us to find a systematic engineering solution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: Harness Engineering — Putting a Leash on the Model
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Non-determinism Is an Agent's Original Sin
&lt;/h3&gt;

&lt;p&gt;LLMs are inherently probabilistic. In a single-turn conversation, that randomness is what makes them creative. In a long-chain task, it's what makes them dangerous.&lt;/p&gt;

&lt;p&gt;A typical failure scenario: you ask an Agent to complete a 10-step development task. Steps 1–7 go fine. On step 8, the Agent drifts slightly. Step 9 builds on the drift. By step 10, you receive a result that looks complete but is entirely off-target. And you almost don't notice — because the Agent wrote a very convincing summary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When AI starts executing autonomously across many steps, the central engineering challenge becomes: how do you supervise it and course-correct before the damage is done?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Agent = Model + Harness
&lt;/h3&gt;

&lt;p&gt;In February 2026, Martin Fowler's team (author: Birgitta Böckeler) introduced the concept of &lt;strong&gt;Harness Engineering&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent = Model + Harness
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Harness&lt;/strong&gt; is everything in an Agent that isn't the model itself — prompts, tool definitions, rules, context management, validation mechanisms, feedback loops. All of it is the Harness.&lt;/p&gt;

&lt;p&gt;This definition sounds unremarkable at first. But it carries an important shift in thinking: &lt;strong&gt;to improve Agent reliability, don't swap in a better model — design a better Harness.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Harness has two components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Guides (feedforward):&lt;/strong&gt; Give the AI the right inputs before it acts — clear instructions, relevant context, structured task descriptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensors (feedback):&lt;/strong&gt; Validate the AI's outputs after it acts — independent validators, quality evaluators, anomaly detectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LangChain's experiment makes this concrete: &lt;strong&gt;without changing the underlying model, using Harness Engineering alone, their Agent's benchmark ranking jumped from outside the top 30 to top 5.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Goal: Fix Problems Before They Reach Human Eyes
&lt;/h3&gt;

&lt;p&gt;One sentence captures what Harness Engineering is for:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To make AI Coding Agents work with less human supervision, you need to systematically build an "external control framework" — the Harness. It's composed of feedforward Guides and feedback Sensors, with the goal of &lt;strong&gt;automatically correcting problems before they ever reach a human reviewer.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Part 2: Two Failure Modes That Break Single Agents
&lt;/h2&gt;

&lt;p&gt;With the framework established, let's look at exactly where single agents fail. Anthropic's article &lt;em&gt;Harness Design for Long-Running Application Development&lt;/em&gt; identifies two core failure modes:&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure Mode 1: Context Anxiety
&lt;/h3&gt;

&lt;p&gt;As a task gets long and the Agent approaches its context window limit, it starts to "panic" and rush to finish — marking incomplete work as done, writing vague analyses with false certainty.&lt;/p&gt;

&lt;p&gt;This isn't a bug. It's the model using "confident closure" as a coping mechanism for "uncertain continuation."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Don't use Compact (context compression). Use &lt;strong&gt;Context Reset&lt;/strong&gt; — completely clear the context, start a new Agent with a structured handoff document, and let the new Agent take over.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure Mode 2: Self-Evaluation Breakdown
&lt;/h3&gt;

&lt;p&gt;Ask an Agent to evaluate its own output and it becomes pathologically optimistic — it'll give itself high marks even when the work is poor, because it's evaluating using the same cognitive framework that produced the output.&lt;/p&gt;

&lt;p&gt;It's like asking someone to take an exam and grade it themselves. Almost guaranteed to score well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Introduce an &lt;strong&gt;independent Evaluator Agent&lt;/strong&gt;, deliberately prompted to be critical and skeptical, with no shared context with the Generator.&lt;/p&gt;

&lt;p&gt;These two discoveries directly motivate the first and most fundamental multi-agent pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: Multi-Agent Architecture — Five Coordination Patterns
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Some teams choose a pattern based on how sophisticated it sounds, not on whether it fits the problem at hand. Start with the &lt;strong&gt;simplest pattern that might work&lt;/strong&gt;, see where it struggles, then evolve from there.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Pattern 1: Generator-Validator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Tasks where output quality is critical and evaluation criteria can be stated explicitly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Generator produces output → Validator evaluates against explicit criteria → if rejected, Generator gets specific feedback → loop until accepted or max iterations reached.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generator → Output → Validator ──pass──→ Done
                          │
                     fail + feedback
                          │
                          └─→ Generator (next round)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Typical use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code generation (Generator writes code, Validator writes and runs tests)&lt;/li&gt;
&lt;li&gt;Customer support replies (Validator checks accuracy, tone, completeness)&lt;/li&gt;
&lt;li&gt;Compliance review (Validator checks output against rules line by line)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Critical caveat:&lt;/strong&gt; The Validator must have &lt;strong&gt;specific, explicit criteria&lt;/strong&gt; — not "check if it's good." A Validator without concrete standards will just rubber-stamp the Generator's output. Also set a maximum iteration count with a fallback strategy to prevent infinite oscillation.&lt;/p&gt;




&lt;h3&gt;
  
  
  Pattern 2: Orchestrator-Subagent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Tasks that decompose cleanly into independent subtasks with minimal interdependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Orchestrator handles global planning and task delegation. Subagents each own a specific responsibility and report results back. Orchestrator integrates and produces final output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Orchestrator ──delegate──→ Subagent A (security check)
             ──delegate──→ Subagent B (code style)
             ──delegate──→ Subagent C (test coverage)
                   ←──── collect all results ────
                   → integrate into final report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code uses this pattern: the main Agent handles the primary workflow while dispatching subagents in the background to search codebases or investigate independent questions — &lt;strong&gt;keeping the Orchestrator's context focused on the main task&lt;/strong&gt; while parallel work happens elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitation:&lt;/strong&gt; The Orchestrator is an information bottleneck. Information discovered by one subagent that's relevant to another must route through the Orchestrator. Key details get lost or over-summarized after a few hops.&lt;/p&gt;




&lt;h3&gt;
  
  
  Pattern 3: Agent Team
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Tasks that decompose into long-running, independent subtasks where each worker benefits from accumulating domain context over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Coordinator distributes work via a shared queue. Multiple Workers each pick up tasks, run autonomously through multi-step work, and signal on completion. Unlike Pattern 2, Workers &lt;strong&gt;persist across tasks&lt;/strong&gt; — they keep accumulating context rather than starting fresh each time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Coordinator ──→ Task queue
Worker A ←── pick up task ──→ complete ──→ signal
Worker B ←── pick up task ──→ complete ──→ signal
Worker C ←── pick up task ──→ complete ──→ signal
                                ↓
              Coordinator collects → integration tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Typical use case:&lt;/strong&gt; Migrating a large codebase from one framework to another, with each Worker independently migrating one service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitation:&lt;/strong&gt; Independence is a prerequisite. Workers can't easily share intermediate findings. Careful task partitioning and conflict resolution mechanisms are required.&lt;/p&gt;




&lt;h3&gt;
  
  
  Pattern 4: Message Bus
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Event-driven pipelines where the agent ecosystem is expected to keep growing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Agents communicate through publish/subscribe events, decoupled from each other. A router delivers matching messages. New agents can join by subscribing to topics without modifying existing connections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Alert sources → Triage Agent → Router
                                ├──→ Network Investigation Agent
                                ├──→ Identity Analysis Agent
                                └──→ Context Enrichment Agent
                                            ↓
                                Response Coordination Agent → Actions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best fit when:&lt;/strong&gt; Work is triggered by events rather than a predetermined sequence, and teams need to develop and deploy individual agents independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitation:&lt;/strong&gt; The longer the event chain, the harder debugging becomes. A misrouted message causes silent failures — the system doesn't crash, it just doesn't process the event.&lt;/p&gt;




&lt;h3&gt;
  
  
  Pattern 5: Shared State
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Collaborative tasks where agents need to build on each other's discoveries, without a central coordinator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Agents run autonomously, coordinating through a shared persistent store (database, filesystem, or document). No central Orchestrator. Each agent reads what others have written, takes action based on those findings, and writes its own discoveries back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent A (academic literature) ─┐
Agent B (industry reports)    ─┤──→ Shared knowledge store
Agent C (patent filings)      ─┘    ↑ agents read each other's findings
                                    → iteratively deepen research
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Limitation:&lt;/strong&gt; Without explicit coordination, agents may duplicate work. The hardest failure mode is the &lt;strong&gt;reactive loop&lt;/strong&gt;: Agent A writes a finding → Agent B responds → Agent A reacts again — burning tokens indefinitely. &lt;strong&gt;Termination conditions must be first-class citizens:&lt;/strong&gt; time budgets, convergence thresholds (no new findings after N cycles), or a dedicated "am I done?" judge agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: Solving the "Goldfish Memory" Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: The Context Tax
&lt;/h3&gt;

&lt;p&gt;Claude Code has a fundamental limitation: it's &lt;strong&gt;stateless&lt;/strong&gt;. Close the conversation window, memory resets.&lt;/p&gt;

&lt;p&gt;Every new session, you have to re-explain everything — project architecture, past decisions, coding style preferences, bugs you've already ruled out. You're paying a &lt;strong&gt;context tax&lt;/strong&gt; on every session just to get the AI back up to speed.&lt;/p&gt;

&lt;p&gt;Worse: this repeated loading is &lt;strong&gt;billed by token&lt;/strong&gt;. You're paying for compute that produces zero new value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution 1: Claude Code's Native Memory (CLAUDE.md + Auto Memory)
&lt;/h3&gt;

&lt;p&gt;Claude Code offers two mechanisms for carrying knowledge across sessions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CLAUDE.md Files&lt;/th&gt;
&lt;th&gt;Auto Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Written by&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You&lt;/td&gt;
&lt;td&gt;Claude automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Contains&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instructions and rules&lt;/td&gt;
&lt;td&gt;Learned patterns and preferences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Coding standards, architecture, workflows&lt;/td&gt;
&lt;td&gt;Build commands, debugging insights, behavior preferences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Loaded each session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;First 200 lines&lt;/td&gt;
&lt;td&gt;First 200 lines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md placement&lt;/strong&gt; determines scope, from most to least specific:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.claude/CLAUDE.md&lt;/code&gt; (project-level, shared via version control)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt; (user-level, applies across all projects)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For larger projects, split rules into &lt;code&gt;.claude/rules/&lt;/code&gt; with one file per topic. Rules can also be &lt;strong&gt;path-scoped&lt;/strong&gt; using YAML frontmatter — only loaded when Claude is working on matching files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/api/**/*.ts"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# API Development Rules&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;All endpoints must include input validation&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Use the standard error response format&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Path-scoped rules reduce context noise — the relevant rules load when relevant, and stay out of the way otherwise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto Memory&lt;/strong&gt; stores Claude's self-generated notes at &lt;code&gt;~/.claude/projects/&amp;lt;project&amp;gt;/memory/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;memory/&lt;/span&gt;
&lt;span class="s"&gt;├── MEMORY.md&lt;/span&gt;          &lt;span class="c1"&gt;# Index file, loaded every session&lt;/span&gt;
&lt;span class="s"&gt;├── debugging.md&lt;/span&gt;       &lt;span class="c1"&gt;# Detailed debugging notes&lt;/span&gt;
&lt;span class="s"&gt;├── api-conventions.md&lt;/span&gt; &lt;span class="c1"&gt;# API design decisions&lt;/span&gt;
&lt;span class="s"&gt;└── ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;/init&lt;/code&gt; to auto-generate a starter CLAUDE.md. Claude will update Auto Memory over time based on your corrections and preferences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution 2: claude-mem — The Community's Context Tax Workaround
&lt;/h3&gt;

&lt;p&gt;The native solution is reactive (you correct → Claude updates). The open-source project &lt;a href="https://github.com/thedotmack/claude-mem" rel="noopener noreferrer"&gt;claude-mem&lt;/a&gt; is more aggressive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx claude-mem &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Core mechanism:&lt;/strong&gt; Attach a local memory store outside Claude Code. Hooks intercept every tool call, compress the interaction into a summary stored in SQLite, and on the next session inject only the semantically relevant history — not everything.&lt;/p&gt;

&lt;p&gt;Data flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool call → PostToolUse Hook captures
         → Claude API call (Observer role)
         → Compresses to XML-format observation
         → Stored in SQLite + Chroma vector DB
         ↓
New session → SessionStart
         → Query last 50 observations + 10 summaries
         → Inject into Claude context
         ↓
User submits prompt → UserPromptSubmit
         → Semantic search for top 5 relevant observations
         → Precision-inject relevant history
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reported result from the project itself: &lt;strong&gt;95% token reduction&lt;/strong&gt; — 6 observations consumed 2,911 tokens to deliver work that would have taken 56,291 tokens with full context re-loading.&lt;/p&gt;

&lt;p&gt;The Observer uses a structured prompt to produce parseable XML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;observation&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;type&amp;gt;&lt;/span&gt;bugfix&lt;span class="nt"&gt;&amp;lt;/type&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;CarPlay startup disconnect root cause identified&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;narrative&amp;gt;&lt;/span&gt;Root cause was IOKit initialization timing. Fix was to...&lt;span class="nt"&gt;&amp;lt;/narrative&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;facts&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;fact&amp;gt;&lt;/span&gt;Disconnect occurs 200ms after kIOMessageServiceIsTerminated event&lt;span class="nt"&gt;&amp;lt;/fact&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;fact&amp;gt;&lt;/span&gt;CarPlay framework begins handshake before driver init completes&lt;span class="nt"&gt;&amp;lt;/fact&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/facts&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/observation&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 5: Teaching the AI Your Habits — Continuous Learning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: AI Has No Muscle Memory
&lt;/h3&gt;

&lt;p&gt;You've used Claude Code for three months. It still doesn't know your coding style. Every new session, it's a new employee who knows nothing about you — doesn't know you prefer functional over OOP, doesn't remember your project's quirky conventions, has no record of how you solved a similar problem last week.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Hooks + Instinct System
&lt;/h3&gt;

&lt;p&gt;This architecture comes from the &lt;strong&gt;claude-code-everything&lt;/strong&gt; project and consists of two independent subsystems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Subsystem A: Memory Persistence
  → Answers "what did I do last session?" 
  → Short-term memory, restores work state across sessions

Subsystem B: Instinct Learning (Continuous Learning)
  → Answers "what are the user's habits?"
  → Long-term learning, accumulates behavioral preferences
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both are triggered via Hooks and inject their results into Claude's context at session start.&lt;/p&gt;

&lt;h4&gt;
  
  
  What Are Hooks?
&lt;/h4&gt;

&lt;p&gt;Hooks are &lt;strong&gt;event-driven triggers&lt;/strong&gt; that fire before and after Claude Code tool calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User request → Claude selects tool → PreToolUse hook → Tool executes → PostToolUse hook
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hook Type&lt;/th&gt;
&lt;th&gt;Fires When&lt;/th&gt;
&lt;th&gt;Key Input&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PreToolUse&lt;/td&gt;
&lt;td&gt;Before tool execution&lt;/td&gt;
&lt;td&gt;tool_name, tool_input&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostToolUse&lt;/td&gt;
&lt;td&gt;After tool completes&lt;/td&gt;
&lt;td&gt;tool_name, tool_input, tool_output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stop&lt;/td&gt;
&lt;td&gt;After each Claude response&lt;/td&gt;
&lt;td&gt;transcript_path (full session JSONL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SessionStart&lt;/td&gt;
&lt;td&gt;Session begins&lt;/td&gt;
&lt;td&gt;session_id, cwd (can inject additionalContext)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SessionEnd&lt;/td&gt;
&lt;td&gt;Session ends&lt;/td&gt;
&lt;td&gt;session_id&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;PreToolUse hooks can control whether the tool runs: exit 0 continues, exit 2 aborts and surfaces the error to Claude.&lt;/p&gt;

&lt;h4&gt;
  
  
  What Does an Instinct Look Like?
&lt;/h4&gt;

&lt;p&gt;An Instinct is a single &lt;strong&gt;atomic behavioral preference&lt;/strong&gt; stored as a YAML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grep-before-edit&lt;/span&gt;
&lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;when&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;modifying&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;existing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;code"&lt;/span&gt;
&lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.7&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;workflow&lt;/span&gt;
&lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;project&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;# Grep Before Edit&lt;/span&gt;

&lt;span class="c1"&gt;## Action&lt;/span&gt;
&lt;span class="s"&gt;Use Grep to locate code before Edit to confirm exact location.&lt;/span&gt;

&lt;span class="c1"&gt;## Evidence&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Observed 8 times across sessions&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Grep → Read → Edit sequence repeated consistently&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Last observed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026-04-16&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;confidence&lt;/code&gt;: 0.3–0.9, controls whether the instinct gets injected (threshold ≥ 0.7)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scope&lt;/code&gt;: &lt;code&gt;project&lt;/code&gt; (this project only) or &lt;code&gt;global&lt;/code&gt; (all projects)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;trigger + action&lt;/code&gt;: what Claude actually sees when this instinct is active&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Full Learning Pipeline
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;Tool call&lt;/span&gt;
  &lt;span class="s"&gt;→ observe.sh (async, non-blocking)&lt;/span&gt;
      &lt;span class="s"&gt;→ append to observations.jsonl&lt;/span&gt;
      &lt;span class="s"&gt;→ increment counter, send SIGUSR1 every 20 calls&lt;/span&gt;

&lt;span class="s"&gt;observer-loop.sh (background daemon)&lt;/span&gt;
  &lt;span class="s"&gt;→ receives SIGUSR1&lt;/span&gt;
  &lt;span class="s"&gt;→ take last 500 observations&lt;/span&gt;
  &lt;span class="s"&gt;→ spawn Claude Haiku analysis (claude --model haiku --print)&lt;/span&gt;
      &lt;span class="s"&gt;→ Haiku identifies behavioral patterns&lt;/span&gt;
      &lt;span class="s"&gt;→ writes instinct YAML by rule&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
          &lt;span class="s"&gt;3–5 occurrences  → confidence &lt;/span&gt;&lt;span class="m"&gt;0.5&lt;/span&gt;
          &lt;span class="s"&gt;6–10 occurrences → confidence &lt;/span&gt;&lt;span class="m"&gt;0.7&lt;/span&gt;
          &lt;span class="s"&gt;11+ occurrences  → confidence &lt;/span&gt;&lt;span class="m"&gt;0.85&lt;/span&gt;
  &lt;span class="s"&gt;→ archive analyzed observations&lt;/span&gt;

&lt;span class="s"&gt;New session → SessionStart&lt;/span&gt;
  &lt;span class="s"&gt;→ session-start.js reads instinct YAML files&lt;/span&gt;
  &lt;span class="s"&gt;→ filter confidence ≥ 0.7, take top &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;
  &lt;span class="na"&gt;→ inject as additionalContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Active&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;instincts:&lt;/span&gt;
&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;[project&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;70%]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Grep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;locate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Edit&lt;/span&gt;
&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;[global&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;85%]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Grep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Edit,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Read&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Write"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Haiku instead of Sonnet for analysis is a deliberate cost decision — pattern recognition doesn't need the most powerful model, and this process fires every 20 tool calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It Together
&lt;/h2&gt;

&lt;p&gt;These four mechanisms address four distinct layers of AI Agent engineering:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI drifts off-track on long tasks&lt;/td&gt;
&lt;td&gt;Harness Engineering (Guides + Sensors)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reliability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single agent failure modes, self-evaluation blindspot&lt;/td&gt;
&lt;td&gt;Multi-agent architecture (match pattern to problem)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Continuity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every session starts from zero&lt;/td&gt;
&lt;td&gt;CLAUDE.md + Auto Memory + claude-mem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Growth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI can't accumulate behavioral habits&lt;/td&gt;
&lt;td&gt;Hooks + Instinct continuous learning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The direction of travel is clear: from "re-teach the AI every session" to "the more you use it, the more it knows you." None of these solutions are theoretical novelties. They're engineering practices that emerged from real projects hitting real walls — and finding ways through them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is based on the CarPlay bug analysis tool development experience from the Connected Car team. Shared for learning and discussion.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>harness</category>
      <category>multiagent</category>
      <category>claude</category>
    </item>
    <item>
      <title>RAG Series (24): Code RAG — Teaching AI to Understand Your Codebase</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Thu, 21 May 2026 12:52:10 +0000</pubDate>
      <link>https://dev.to/wonderlab/rag-series-24-code-rag-teaching-ai-to-understand-your-codebase-1jm6</link>
      <guid>https://dev.to/wonderlab/rag-series-24-code-rag-teaching-ai-to-understand-your-codebase-1jm6</guid>
      <description>&lt;h2&gt;
  
  
  The Difference Between Code and Documents
&lt;/h2&gt;

&lt;p&gt;Split a Python file into 1000-character chunks with &lt;code&gt;RecursiveCharacterTextSplitter&lt;/code&gt;, embed them, run vector search — this is the most common "code RAG" implementation. The problem is that it treats code as text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_rag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluate RAG system quality&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="err"&gt;（&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="err"&gt;）&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Character-based chunking will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split functions in half (first half in chunk A, second half in chunk B)&lt;/li&gt;
&lt;li&gt;Lose function boundary information (this IS &lt;code&gt;evaluate_rag&lt;/code&gt;, not random text)&lt;/li&gt;
&lt;li&gt;Ignore call relationships (what this function calls, who calls it)&lt;/li&gt;
&lt;li&gt;Destroy structural hierarchy (this is a method of &lt;code&gt;RAGPipeline&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code carries three layers of information: &lt;strong&gt;semantics&lt;/strong&gt; (what it does), &lt;strong&gt;structure&lt;/strong&gt; (function/class/module boundaries), &lt;strong&gt;call relationships&lt;/strong&gt; (who calls whom). Good code RAG models all three.&lt;/p&gt;

&lt;p&gt;This article uses &lt;a href="https://github.com/chendongqi/llm-in-action" rel="noopener noreferrer"&gt;llm-in-action&lt;/a&gt; as the target and builds a code RAG system capable of answering "how is this function used?" and "show me all call chains through this function."&lt;/p&gt;




&lt;h2&gt;
  
  
  Parse Code with AST, Not Character Offsets
&lt;/h2&gt;

&lt;p&gt;Python's &lt;code&gt;ast&lt;/code&gt; module parses source files into syntax trees. A function definition is a node (&lt;code&gt;ast.FunctionDef&lt;/code&gt;) with its exact start line, end line, and decorator list. Chunking at AST boundaries guarantees splits at function edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;_FuncExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NodeVisitor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rel_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lines&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_rel_path&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rel_path&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_class_stack&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CodeUnit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;visit_ClassDef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClassDef&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Track current class so methods know their parent_class
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_class_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generic_visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_class_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_visit_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract source by line number, not character offset
&lt;/span&gt;        &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lineno&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_lineno&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;unit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeUnit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;kind&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_class_stack&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;file&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_rel_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;start_line&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lineno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;end_line&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_lineno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;source&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;docstring&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_docstring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parent_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_class_stack&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_class_stack&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;calls&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_calls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generic_visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;visit_FunctionDef&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_visit_func&lt;/span&gt;
    &lt;span class="n"&gt;visit_AsyncFunctionDef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_visit_func&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call relationships are extracted from &lt;code&gt;ast.Call&lt;/code&gt; nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_calls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# direct call: foo()
&lt;/span&gt;            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Attribute&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# attribute call: self.foo()
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extraction results on llm-in-action
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scanned: /mnt/hdd/Database/03_Projects/LLM/llm-in-action
Time: 0.13 seconds

Python files:   22
Functions:      188 (top-level)
Methods:         37 (class methods)
Total units:    225
Article dirs:    18
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;0.13 seconds to scan the entire codebase. AST parsing doesn't execute code, so there are zero side effects.&lt;/p&gt;




&lt;h2&gt;
  
  
  Call Graph: Understanding Who Calls Whom
&lt;/h2&gt;

&lt;p&gt;Once function call relationships are extracted, build a bidirectional adjacency map — queryable in both directions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CallGraph&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CodeUnit&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callees&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# caller → called
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# callee → caller
&lt;/span&gt;
        &lt;span class="n"&gt;known&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;callee&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;callee&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;known&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;           &lt;span class="c1"&gt;# intra-repo edges only
&lt;/span&gt;                    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callees&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;callee&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;callee&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;downstream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;All functions transitively called by name (BFS).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_bfs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callees&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upstream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;All functions that transitively call name (BFS).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_bfs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;shortest_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Shortest call path from start → end.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
        &lt;span class="n"&gt;visited&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popleft&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;nxt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;nxt&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;visited&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;visited&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nxt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;nxt&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Call graph analysis results
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Call graph statistics:
  Functions with outgoing edges:  78  (they call others)
  Functions with incoming edges:  92  (they are called)
  Total edges:                   168
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Most-called functions&lt;/strong&gt; (the codebase's core utilities):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get               ← called from 48 places  (cache reads throughout all articles)
set               ← called from 10 places  (cache writes)
split_documents   ← called from  5 places  (shared chunking helper)
build_embeddings  ← called from  4 places
query             ← called from  4 places
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;get&lt;/code&gt; appearing 48 times reflects Python duck typing — cache &lt;code&gt;.get()&lt;/code&gt; calls across &lt;code&gt;SemanticCache&lt;/code&gt;, &lt;code&gt;InMemoryCache&lt;/code&gt;, and similar types all collapse to the same name in static analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functions with the most outgoing calls&lt;/strong&gt; (orchestrators):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main                → 54 direct calls
build_self_rag_graph →  6 direct calls
build_index          →  5 direct calls
build_ragas_dataset  →  5 direct calls
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;main&lt;/code&gt; calling 54 functions is the signature of an entry point — it orchestrates the full pipeline by calling every sub-step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Call chain traversal
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;build_self_rag_graph&lt;/code&gt; (14-self-rag/self_rag.py) full downstream:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;build_self_rag_graph
  ├── make_retrieve_node
  ├── make_filter_node
  ├── make_decide_node
  ├── make_support_node
  ├── make_rag_generate_node
  └── make_direct_generate_node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is exactly the Self-RAG StateGraph builder pattern: one factory function assembles all graph nodes, each node is an independent small function. The call graph makes this structure immediately visible.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;build_index&lt;/code&gt; (08-ragas-eval/rag_pipeline.py) downstream chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;build_index
  → load_documents
  → build_llm
  → build_embeddings
  → split_documents
  → get  (cache)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A canonical RAG initialization sequence: load docs → build LLM → build embeddings → chunk → cache.&lt;/p&gt;




&lt;h2&gt;
  
  
  Vector Store: Semantic Code Search
&lt;/h2&gt;

&lt;p&gt;Code vectorization has one engineering constraint: function source can be long (50–200 lines), but embedding APIs commonly have a 512-token limit.&lt;/p&gt;

&lt;p&gt;Solution: &lt;strong&gt;separate the retrieval unit from the Q&amp;amp;A context&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding content&lt;/strong&gt;: function name + docstring (short, semantically precise, fits in token budget)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata&lt;/strong&gt;: complete source code (stored in Chroma's metadata field, read at Q&amp;amp;A time for LLM context)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sig_line&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;embed_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;full_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;docstring&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;sig_line&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embed_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# vectorized — used for retrieval
&lt;/span&gt;    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;   &lt;span class="c1"&gt;# not vectorized — used for LLM context
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At Q&amp;amp;A time, retrieval finds relevant functions, then the full source is read from metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;docs&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Semantic search results
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query: "RAGAS evaluation metrics calculation"
  0.488  RAGPipeline.build_index   (08-ragas-eval/rag_pipeline.py:95)
  0.476  create_ragas_embeddings   (08-ragas-eval/evaluate.py:50)
  0.467  RAGPipeline.query         (08-ragas-eval/rag_pipeline.py:141)

Query: "rate limiting and access control in enterprise RAG"
  0.504  RAGPipeline.__init__      (08-ragas-eval/rag_pipeline.py:78)
  0.497  RateLimiter.__init__      (20-enterprise-rag/enterprise_rag.py:118)

Query: "incremental document indexing with record manager"
  0.296  generate_testset          (08-ragas-eval/generate_qa.py:51)

Query: "conversational history aware retriever"
  0.400  make_ds                   (18-conversational-rag/conversational_rag.py:428)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RAGAS and enterprise RAG rate limiting queries found the right files. Incremental update didn't — because the functions in &lt;code&gt;19-incremental-update/&lt;/code&gt; don't mention "record manager" in their docstrings, only in their source code bodies. This is the core limitation of docstring-only embedding: &lt;strong&gt;search quality is bounded by docstring quality&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing a Code Embedding Model
&lt;/h2&gt;

&lt;p&gt;General-purpose text embedding models (BGE, text-embedding-3) are "adequate but not great" for code. They can retrieve by docstring, but don't understand that &lt;code&gt;for i in range(n): acc += arr[i]&lt;/code&gt; is an accumulation.&lt;/p&gt;

&lt;p&gt;Specialized code embedding models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;microsoft/codebert-base&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code + documentation dual-tower; understands variable names and signatures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Salesforce/codet5-base&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generative model; suited for code completion + retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nomic-ai/nomic-embed-text-v1.5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;General model with strong code performance; 8192-token limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voyage-code-2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Voyage AI's code-specialized model; among the best available&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Recommended: if token limits aren't a concern (e.g., &lt;code&gt;nomic-embed-text-v1.5&lt;/code&gt; supports 8192 tokens), embed the complete function source directly — no need to split docstrings from source.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Complete Code RAG Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Build a code RAG system
&lt;/span&gt;
&lt;span class="c1"&gt;# 1. AST extraction: all functions and methods
&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_repo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repo_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Call graph: bidirectional adjacency
&lt;/span&gt;&lt;span class="n"&gt;cg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CallGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Vector store: docstrings for retrieval, source_code in metadata for Q&amp;amp;A
&lt;/span&gt;&lt;span class="n"&gt;vs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_vectorstore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Three query modes
&lt;/span&gt;
&lt;span class="c1"&gt;# A: Semantic search — find functions by meaning
&lt;/span&gt;&lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding caching&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# B: Call chain — given a function name, find all upstream/downstream
&lt;/span&gt;&lt;span class="n"&gt;callers&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upstream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;build_embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → who calls it
&lt;/span&gt;&lt;span class="n"&gt;callees&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downstream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="c1"&gt;# → what it calls
&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shortest_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → how main reaches get
&lt;/span&gt;
&lt;span class="c1"&gt;# C: LLM Q&amp;amp;A — retrieve context, generate answer
&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm_code_qa&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How is incremental update implemented?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Results Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python files&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code units extracted&lt;/td&gt;
&lt;td&gt;225 (188 functions + 37 methods)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AST parse time&lt;/td&gt;
&lt;td&gt;0.13 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Call graph edges&lt;/td&gt;
&lt;td&gt;168&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vectorization time&lt;/td&gt;
&lt;td&gt;5.8 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Most-called function&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;get&lt;/code&gt; (48 places)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Widest caller&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;main&lt;/code&gt; (54 direct calls)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Full Code
&lt;/h2&gt;

&lt;p&gt;Complete code is open-sourced at:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/chendongqi/llm-in-action/tree/main/24-code-rag" rel="noopener noreferrer"&gt;https://github.com/chendongqi/llm-in-action/tree/main/24-code-rag&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;code_rag.py&lt;/code&gt; — AST extraction, call graph, vectorization, search, report&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/chendongqi/llm-in-action
&lt;span class="nb"&gt;cd &lt;/span&gt;24-code-rag
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python code_rag.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The core difference between code RAG and document RAG:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Document RAG&lt;/th&gt;
&lt;th&gt;Code RAG&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chunk unit&lt;/td&gt;
&lt;td&gt;Fixed-size text blocks&lt;/td&gt;
&lt;td&gt;Functions/methods (AST boundaries)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Class hierarchy, module hierarchy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Call relationships&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Call graph (bidirectional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding content&lt;/td&gt;
&lt;td&gt;Full text&lt;/td&gt;
&lt;td&gt;Docstring + signature (or full source)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query types&lt;/td&gt;
&lt;td&gt;Semantic search&lt;/td&gt;
&lt;td&gt;Semantic search + call chain traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three key tradeoffs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AST vs text chunking&lt;/strong&gt;: AST cuts at function boundaries and preserves complete units. Text chunking is faster but destroys structure. For production code RAG, use AST — there's no reason not to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docstring vs full source embedding&lt;/strong&gt;: Under token constraints, embed docstrings (short and semantically focused) — but quality depends on docstring completeness. With a long-context embedding model, embed the full source directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call graph vs pure vector retrieval&lt;/strong&gt;: Vector retrieval finds semantically similar functions; the call graph answers "what does X call?" and "who uses X?" — they're complementary, not interchangeable.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;This is the final article in the RAG series. Twenty-four articles covering the complete path from "what is RAG?" to "how do you teach AI to understand a codebase?" All code is open-sourced at &lt;a href="https://github.com/chendongqi/llm-in-action" rel="noopener noreferrer"&gt;llm-in-action&lt;/a&gt; — every article has a runnable demo and a real benchmark report.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.python.org/3/library/ast.html" rel="noopener noreferrer"&gt;Python ast module documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2002.08155" rel="noopener noreferrer"&gt;CodeBERT: A Pre-Trained Model for Programming and Natural Languages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2004.12832" rel="noopener noreferrer"&gt;ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/use_cases/code_understanding/" rel="noopener noreferrer"&gt;LangChain Code Understanding&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
    </item>
    <item>
      <title>Agent Series (1): What Is an Agent — It's Not Just an LLM That Can Call Tools</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Thu, 21 May 2026 07:43:56 +0000</pubDate>
      <link>https://dev.to/wonderlab/agent-series-1-what-is-an-agent-its-not-just-an-llm-that-can-call-tools-4plh</link>
      <guid>https://dev.to/wonderlab/agent-series-1-what-is-an-agent-its-not-just-an-llm-that-can-call-tools-4plh</guid>
      <description>&lt;h2&gt;
  
  
  You Think You're Using an Agent. You're Not.
&lt;/h2&gt;

&lt;p&gt;In 2023, "AI Agent" became a buzzword overnight. Every company claimed they built an Agent. Every product slapped the Agent label on it.&lt;/p&gt;

&lt;p&gt;But ask them: &lt;strong&gt;What's the fundamental difference between your Agent and a regular LLM call?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most people go quiet for three seconds, then say: "...it can call tools."&lt;/p&gt;

&lt;p&gt;Is that wrong? No. But it's missing the point. It's like answering "what's the difference between a car and a bicycle" with "a car has four wheels" — technically correct, but you forgot to mention the engine.&lt;/p&gt;

&lt;p&gt;This article has one goal: &lt;strong&gt;help you understand what an Agent actually is&lt;/strong&gt; — and why it's fundamentally different from an LLM or a Chatbot. Get this right, and you'll make better technical decisions instead of wrapping an LLM API call and calling it "our Agent system."&lt;/p&gt;




&lt;h2&gt;
  
  
  Start With a Scenario
&lt;/h2&gt;

&lt;p&gt;Say you want to build an AI tool that analyzes competitors for users. The user types a company name, and the tool generates a competitive analysis report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Direct LLM call&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input: Analyze Notion's competitors
↓
LLM generates report directly
↓
Output (based on training data, potentially outdated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B: Chatbot&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input: Analyze Notion's competitors
↓
LLM generates reply, remembers conversation history
User follow-up: Focus on pricing strategy
↓
LLM continues with context
↓
Multi-turn conversation, still based on training data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option C: Agent&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input: Analyze Notion's competitors
↓
Agent thinks: I need fresh data, let me search first
↓
Calls search tool → gets latest competitor info
↓
Agent thinks: I should compare pricing, let me calculate
↓
Calls calculation tool → gets result
↓
Agent thinks: I have enough information now
↓
Outputs report (grounded in real-time data, with sources)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the difference? &lt;strong&gt;An Agent actively thinks "what do I need to do" and autonomously decides the next action.&lt;/strong&gt; That's the core — not whether it can call tools, but &lt;strong&gt;who decides which tool to call and when&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Concepts, Three Levels
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LLM: A "Brain" with Language Ability
&lt;/h3&gt;

&lt;p&gt;A Large Language Model is fundamentally a &lt;strong&gt;function&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input: text (prompt)
Output: predicted next token (repeated until done)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its capabilities come from statistical patterns learned from massive amounts of text. It understands language, it can reason — but it has &lt;strong&gt;no memory, no perception, no action capability&lt;/strong&gt;. Every call is stateless. It has no idea what you talked about last time.&lt;/p&gt;

&lt;p&gt;A standalone LLM is like a brilliant scholar who only answers questions: deeply knowledgeable, but locked in a room with no windows, unaware of what's happening outside, unable to proactively do anything for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chatbot: An LLM with Memory
&lt;/h3&gt;

&lt;p&gt;Chatbot = LLM + &lt;strong&gt;conversation history management&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It solves one simple problem: making the LLM "remember" what was said in this conversation. The implementation is also simple — prepend conversation history to every prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pseudocode: the core logic of a Chatbot
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_user_input&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# send full history to LLM
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The limitation of a Chatbot: &lt;strong&gt;it can converse, but it can't act&lt;/strong&gt;. It can't proactively look up information, call APIs, or run code — it can only answer based on what it already knows.&lt;/p&gt;

&lt;p&gt;If the LLM is the brilliant scholar, the Chatbot is that scholar with a phone — you can finally have a conversation, but they're still in their room.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent: An Autonomous Actor
&lt;/h3&gt;

&lt;p&gt;An Agent adds two critical capabilities on top of a Chatbot: &lt;strong&gt;tool use&lt;/strong&gt; and &lt;strong&gt;autonomous decision-making loop&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But here's the key point that's often misunderstood: &lt;strong&gt;the tools themselves aren't what makes something an Agent. What matters is who decides which tool to use and when.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chatbot with tools&lt;/strong&gt; = you tell it "check the weather," it calls the weather API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt; = it decides on its own that "to answer this question, I need to check the weather," then proactively calls it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the difference between &lt;strong&gt;passive response&lt;/strong&gt; and &lt;strong&gt;active planning&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Elements of an Agent
&lt;/h2&gt;

&lt;p&gt;The clearest framework for understanding an Agent is to break it into four components. This framework draws from cognitive science research on intelligent behavior and represents the mainstream engineering understanding today.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│                        Agent                            │
│                                                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐  │
│  │  Perception │    │   Memory    │    │   Action    │  │
│  │             │    │             │    │             │  │
│  │ · User msgs │    │ · Chat hist │    │ · Call tools│  │
│  │ · Tool results    · Tool results    · Run code   │  │
│  │ · Environment│   · External KB│    │ · Call APIs │  │
│  └──────┬──────┘    └──────┬──────┘    └──────▲──────┘  │
│         │                  │                  │         │
│         └──────────┬───────┘                  │         │
│                    ▼                           │         │
│             ┌─────────────┐                   │         │
│             │  Reasoning  │───────────────────┘         │
│             │             │                             │
│             │ · Plan steps│                             │
│             │ · Choose tool                             │
│             │ · Decide done                             │
│             └─────────────┘                             │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Perception&lt;/strong&gt;: What the Agent can "see." At minimum, user input. More advanced: tool return values, database query results, screenshots, file contents. Perception defines the Agent's awareness — what it can't see, it can't act on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt;: What the Agent can "remember." This operates at several levels: current conversation history (short-term memory), past experiences stored in vector databases (long-term memory), and static external knowledge bases (semantic memory). We'll dedicate a full article to memory systems later in this series.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning&lt;/strong&gt;: The Agent's "brain," and the most essential difference from a Chatbot. The LLM here acts as a &lt;strong&gt;controller&lt;/strong&gt;, not a "question answerer." Its job: decompose the task, plan the steps, choose which tool to use next, decide when the task is complete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: What the Agent can "do." Tool calls are the most common action — search, query a database, send an email, execute code. The range of actions defines the Agent's capability boundary — more tools means more tasks it can handle, but also higher risk of things going wrong (this is what Harness Engineering, a later topic, addresses).&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
These four elements are the foundation for understanding Agent architecture, and the central thread running through the rest of this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Perception + Memory → Article 6: &lt;em&gt;Memory Management&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Reasoning → Article 2: &lt;em&gt;The ReAct Paradigm&lt;/em&gt;, Article 3: &lt;em&gt;Plan-and-Solve&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Action → Article 4: &lt;em&gt;Tool Calling&lt;/em&gt;, Article 5: &lt;em&gt;Intent Recognition&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Two Agent Paradigms: Assembly Line vs. Expedition Guide
&lt;/h2&gt;

&lt;p&gt;Real-world Agent systems break into two camps based on who controls the execution flow:&lt;/p&gt;
&lt;h3&gt;
  
  
  Workflow-Driven Agent
&lt;/h3&gt;

&lt;p&gt;Representative tools: Dify, n8n, Coze, Zapier AI&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core idea&lt;/strong&gt;: The developer draws the flowchart; LLM is one node in it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Flowchart (defined by developer in advance):
Receive user question
    ↓
[LLM node] Classify question type
    ↓
If "billing question"  → [Tool node] Query billing system
If "complaint"         → [Tool node] Create support ticket
    ↓
[LLM node] Generate final reply
    ↓
Send to user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution path is &lt;strong&gt;pre-designed by the developer&lt;/strong&gt;. The LLM handles natural language understanding and generation, but the "what happens next" logic is hardcoded in the flowchart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Behavior is predictable; every path can be fully tested before launch&lt;/li&gt;
&lt;li&gt;Easy to debug when something goes wrong (broken node is obvious)&lt;/li&gt;
&lt;li&gt;Doesn't require the LLM to understand complex task planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer service bots (question types are fixed, processes are known)&lt;/li&gt;
&lt;li&gt;Approval flow automation (steps are fixed, conditions are clear)&lt;/li&gt;
&lt;li&gt;Form processing, data ETL (structured, predictable)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI Native Agent
&lt;/h3&gt;

&lt;p&gt;Representative frameworks: LangGraph, AutoGen, CrewAI&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core idea&lt;/strong&gt;: LLM is the control center and decides what to do.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User question arrives
    ↓
[LLM reasoning]: I need current data, should search first
    ↓
[Calls search tool] → results returned
    ↓
[LLM reasoning]: A number in the results needs verification
    ↓
[Calls calculation tool] → result returned
    ↓
[LLM reasoning]: I have enough to answer now
    ↓
Final answer output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every "what to do next" step is dynamically decided by the LLM at runtime. Nobody hardcoded the flow. This is the essence of AI Native Agent: &lt;strong&gt;the LLM isn't a tool — the LLM is the conductor&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles open-ended tasks with unclear boundaries&lt;/li&gt;
&lt;li&gt;Adapts strategy based on intermediate results&lt;/li&gt;
&lt;li&gt;Suited for problems requiring multi-step reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open-ended research (user questions are diverse, impossible to enumerate)&lt;/li&gt;
&lt;li&gt;Automated bug fixing (requires dynamic decisions based on code analysis)&lt;/li&gt;
&lt;li&gt;Complex data analysis (needs multiple rounds of retrieval and computation)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  An Analogy to Remember
&lt;/h3&gt;

&lt;p&gt;Imagine planning a trip:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow-Driven Agent&lt;/strong&gt; = high-speed rail. Fixed tracks, fixed stops, fixed departure times. Highly efficient, never gets lost — but can only go where the rails go.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Native Agent&lt;/strong&gt; = an experienced travel guide. You say "I want somewhere with historical character," and they ask a few questions, check reviews in real time, adjust the itinerary based on today's weather, and handle "the attraction is temporarily closed" on the fly. Flexible — but they might also take you on a detour.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use an Agent vs. a Plain LLM Call
&lt;/h2&gt;

&lt;p&gt;This is the most important engineering judgment you'll make — and the most common place for over-engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap many fall into&lt;/strong&gt;: Agent sounds sophisticated, so people reach for it regardless of the problem. But Agents have costs — longer response times, higher token consumption, more complex debugging.&lt;/p&gt;

&lt;p&gt;Use this decision tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Does your task need an Agent?
│
├─ Are the task steps fixed and enumerable?
│   └─ Yes → use LLM + fixed Prompt, or Workflow-Driven Agent
│
├─ Does the task only need a single LLM call (no tools)?
│   └─ Yes → call the LLM API directly, no Agent needed
│
├─ Does the task need to decide the next step based on intermediate results?
│   └─ Yes → needs an Agent
│
├─ Does the task have more than 3 interdependent steps?
│   └─ Yes → needs an Agent
│
└─ Does the task need to handle situations you can't predict in advance?
    └─ Yes → needs an Agent (and specifically AI Native Agent)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real examples:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended Approach&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Article summarization&lt;/td&gt;
&lt;td&gt;Direct LLM call&lt;/td&gt;
&lt;td&gt;Single call, fixed prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAQ chatbot&lt;/td&gt;
&lt;td&gt;Chatbot&lt;/td&gt;
&lt;td&gt;Multi-turn needed, no tools required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer service routing&lt;/td&gt;
&lt;td&gt;Workflow-Driven Agent&lt;/td&gt;
&lt;td&gt;Fixed flow, enumerable cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automated bug analysis &amp;amp; fix&lt;/td&gt;
&lt;td&gt;AI Native Agent&lt;/td&gt;
&lt;td&gt;Dynamic decisions based on code analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Competitive research report&lt;/td&gt;
&lt;td&gt;AI Native Agent&lt;/td&gt;
&lt;td&gt;Open-ended, needs multi-round search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code review&lt;/td&gt;
&lt;td&gt;AI Native Agent&lt;/td&gt;
&lt;td&gt;Dynamic, depends on code structure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Don't use an Agent just because you can&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your task can be solved with a well-crafted Prompt, use the Prompt. The added complexity of an Agent (harder to debug, higher latency, higher cost) is only worth it when the task genuinely requires dynamic decision-making.&lt;/p&gt;

&lt;p&gt;Anthropic's official guidance says it plainly: &lt;em&gt;"LLMs should only be used as autonomous agents when autonomy and flexibility genuinely provide value — otherwise, direct API calls are more reliable and predictable."&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Minimal AI Native Agent Looks Like
&lt;/h2&gt;

&lt;p&gt;Enough theory — here's real code. Below is a minimal ReAct Agent built with LangGraph (this is the most foundational AI Native Agent paradigm; the next article covers it in depth):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Dependencies: pip install langchain-anthropic langgraph
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define tools (the Agent's "hands")
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for current information&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In real use, connect to a real search API (e.g., Tavily)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search results: latest information about &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluate a mathematical expression&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Note: don't use eval in production
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calculation error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Create the Agent (LLM is the control center)
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Run
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is Apple&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s current market cap? How much more is that than $1 trillion?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this code, the Agent will automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Decide it needs to search for Apple's market cap&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;search_web&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;See the result, decide it needs to compute the difference&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;calculate&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Combine the results into a final answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Nobody told it to search first and then calculate.&lt;/strong&gt; It planned that on its own. That's how an AI Native Agent works.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Notes on the code above&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;eval(expression)&lt;/code&gt; has security implications in production; replace with a safe math library (e.g., &lt;code&gt;numexpr&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;A real search tool requires connecting to a search API like Tavily or SerpAPI&lt;/li&gt;
&lt;li&gt;The model &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; is the recommended version as of this article's writing (May 2026); adjust as needed
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How to Explain This in an Interview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Common interview question: Is your system an Agent or a Workflow? What's the difference?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many candidates stumble here because they've never seriously considered which one they actually built.&lt;/p&gt;

&lt;p&gt;A clear response framework:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Our system uses an AI Native Agent architecture, and the core distinction is &lt;strong&gt;who controls the execution flow&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In a Workflow-Driven approach, the developer pre-defines all possible paths, and the LLM is just one processing node — it's more predictable and well-suited for fixed-step scenarios.&lt;/p&gt;

&lt;p&gt;We chose AI Native Agent because our tasks (like automated bug analysis) have unclear boundaries — the code might span multiple modules, and we need to dynamically decide what to retrieve next based on each intermediate analysis result. A Workflow-Driven approach can't enumerate all possible code scenarios.&lt;/p&gt;

&lt;p&gt;Of course, the more autonomous the Agent, the higher the risk. That's why we added execution boundary controls (Harness Engineering) to ensure it never performs operations beyond its authorized scope."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The key to this answer: &lt;strong&gt;don't just say "I used an Agent." Explain why you chose it and show that you're aware of the trade-offs.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Three things from this article:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The hierarchy of LLM, Chatbot, and Agent&lt;/strong&gt;: LLM is the brain, Chatbot is the brain with memory, Agent is a complete system that can autonomously plan and act. The core difference isn't "can it call tools" — it's "who decides when to call which tool."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The four elements of an Agent&lt;/strong&gt;: Perception (what it sees), Memory (what it remembers), Reasoning (what it plans), Action (what it executes). The LLM plays the role of "conductor," not "executor."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The selection logic between the two paradigms&lt;/strong&gt;: Workflow-Driven suits fixed, predictable tasks; AI Native Agent suits open-ended tasks requiring dynamic decision-making. Don't use an Agent because it sounds impressive — use the right tool for the job.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Next up&lt;/strong&gt;: Agent Series Article 2 — &lt;strong&gt;ReAct: The Most Important Reasoning Paradigm for Agents&lt;/strong&gt;. We'll dig into the Thought → Action → Observation loop, explore "what the Agent is thinking," and explain why Chain-of-Thought alone isn't enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/datawhalechina/Hello-Agents" rel="noopener noreferrer"&gt;hello-agents Open Tutorial&lt;/a&gt; (Chapter 1 and Chapter 4)&lt;/li&gt;
&lt;li&gt;Anthropic, &lt;em&gt;Building Effective Agents&lt;/em&gt;, 2024&lt;/li&gt;
&lt;li&gt;OpenAI, &lt;em&gt;A Practical Guide to Building Agents&lt;/em&gt;, 2025&lt;/li&gt;
&lt;li&gt;LangGraph Documentation: &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;langchain-ai.github.io/langgraph&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This is the first article in the Agent Engineering series. If you're just starting out with Agents, start here and read in order. Questions or feedback? Leave a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>llm</category>
      <category>ai</category>
      <category>langchain</category>
    </item>
    <item>
      <title>One Open Source Project a Day (No. 71): CodeGraph — Pre-Index Your Codebase for AI Agents, Save 35% Cost and 70% Tool Calls</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Thu, 21 May 2026 01:51:49 +0000</pubDate>
      <link>https://dev.to/wonderlab/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-agents-save-35-50f3</link>
      <guid>https://dev.to/wonderlab/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-agents-save-35-50f3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"~35% cheaper · ~70% fewer tool calls · 100% local"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the No.71 article in the "One Open Source Project a Day" series. Today we are exploring &lt;strong&gt;CodeGraph&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Start with a scenario: you ask Claude Code "How is AuthService being called?" Without any assistance, Claude's approach is: glob-scan directories, run multiple greps, read several files — then finally answer. The whole process might trigger 10–15 tool calls and consume hundreds of thousands of tokens.&lt;/p&gt;

&lt;p&gt;CodeGraph's insight is to &lt;strong&gt;front-load this work&lt;/strong&gt;: before you start, it has already parsed your codebase with tree-sitter into a semantic graph stored in a local SQLite database, then exposes 8 query tools to AI agents via MCP. When the agent needs to understand code, a single &lt;code&gt;codegraph_context&lt;/code&gt; call returns entry points, related symbols, and code snippets — &lt;strong&gt;no file reading required&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;9.6k Stars, 588 Forks. Benchmarks across 7 real open-source projects: average 35% cost savings, 70% fewer tool calls, 49% speed improvement. On VS Code's large TypeScript repository, one architecture Q&amp;amp;A dropped from 1.4M tokens to 393k — cost from $0.64 to $0.42.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Will Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CodeGraph's four-stage pipeline: Extract → Store → Resolve → Auto-Sync&lt;/li&gt;
&lt;li&gt;The 8 MCP tools and when to use each&lt;/li&gt;
&lt;li&gt;A detailed breakdown of benchmark results across 7 projects: why do larger codebases benefit more?&lt;/li&gt;
&lt;li&gt;How 19-language support and 13-framework route recognition work&lt;/li&gt;
&lt;li&gt;Complete setup walkthrough from installation to Claude Code integration&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;codegraph affected&lt;/code&gt;: using dependency tracing for smart CI test selection&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with Claude Code, Cursor, or similar AI coding tools&lt;/li&gt;
&lt;li&gt;Basic understanding of MCP (Model Context Protocol)&lt;/li&gt;
&lt;li&gt;Node.js experience&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Introduction
&lt;/h3&gt;

&lt;p&gt;CodeGraph is a &lt;strong&gt;local semantic code knowledge graph&lt;/strong&gt; tool designed specifically to improve AI coding agent efficiency. Its core insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI agents spend a massive amount of tokens and time in the "discovery phase" — scanning directories, searching for symbols, reading files — rather than on the actual reasoning and generation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;CodeGraph's solution is to &lt;strong&gt;outsource the discovery phase to a pre-built index&lt;/strong&gt;: before you start working, the index is already ready, letting AI agents pull structured code knowledge directly instead of exploring the file system from scratch.&lt;/p&gt;

&lt;p&gt;The technology choices are pragmatic: tree-sitter for AST parsing (mature, multi-language, high-performance), SQLite FTS5 for full-text search (zero external dependencies, fully local), and native OS file events for live sync (FSEvents/inotify/ReadDirectoryChangesW).&lt;/p&gt;

&lt;h3&gt;
  
  
  Author/Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Author&lt;/strong&gt;: Colby McHenry (GitHub: colbymchenry)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repository&lt;/strong&gt;: &lt;a href="https://github.com/colbymchenry/codegraph" rel="noopener noreferrer"&gt;colbymchenry/codegraph&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distribution&lt;/strong&gt;: npm package &lt;code&gt;@colbymchenry/codegraph&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;9,600+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: &lt;strong&gt;588&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;📦 npm package: &lt;code&gt;@colbymchenry/codegraph&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;🔧 Runtime: Node.js 20–24&lt;/li&gt;
&lt;li&gt;💻 Platforms: Windows, macOS, Linux&lt;/li&gt;
&lt;li&gt;📄 License: MIT&lt;/li&gt;
&lt;li&gt;🌐 Repository: &lt;a href="https://github.com/colbymchenry/codegraph" rel="noopener noreferrer"&gt;colbymchenry/codegraph&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Main Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Utility
&lt;/h3&gt;

&lt;p&gt;CodeGraph inserts a pre-built index layer between AI agents and codebases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Codebase (TypeScript / Python / Go / ...)
        ↓ tree-sitter parsing
  Semantic graph (symbols + relationships + call chains)
        ↓ stored in SQLite FTS5
  Local knowledge base
        ↓ exposed via MCP
  AI coding agents (Claude Code / Cursor / Codex CLI / OpenCode)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Without CodeGraph&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;User: &lt;span class="s2"&gt;"How is AuthService being called?"&lt;/span&gt;
→ Agent: glob&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"src/**/*.ts"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;         &lt;span class="c"&gt;# Tool call 1&lt;/span&gt;
→ Agent: &lt;span class="nb"&gt;grep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"AuthService"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;         &lt;span class="c"&gt;# Tool call 2&lt;/span&gt;
→ Agent: &lt;span class="nb"&gt;read&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"auth.service.ts"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;     &lt;span class="c"&gt;# Tool call 3&lt;/span&gt;
→ Agent: &lt;span class="nb"&gt;grep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"import.*Auth"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;        &lt;span class="c"&gt;# Tool call 4&lt;/span&gt;
→ Agent: &lt;span class="nb"&gt;read&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"user.controller.ts"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c"&gt;# Tool call 5&lt;/span&gt;
→ Agent: &lt;span class="nb"&gt;read&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"app.module.ts"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;# Tool call 6&lt;/span&gt;
... 10–15 total tool calls, massive token consumption
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With CodeGraph&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;User: &lt;span class="s2"&gt;"How is AuthService being called?"&lt;/span&gt;
→ Agent: codegraph_callers&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"AuthService"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;   &lt;span class="c"&gt;# Tool call 1&lt;/span&gt;
→ Returns: full &lt;span class="nb"&gt;caller &lt;/span&gt;list + call sites + code snippets
→ Agent answers directly, no file reading needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;One-command install (recommended)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run the interactive installer — auto-detects installed AI agents and configures them&lt;/span&gt;
npx @colbymchenry/codegraph

&lt;span class="c"&gt;# Initialize in your project (-i for interactive)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
codegraph init &lt;span class="nt"&gt;-i&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Non-interactive install (CI environments)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Auto-detect all installed agents, global install&lt;/span&gt;
codegraph &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Target specific agents&lt;/span&gt;
codegraph &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cursor,claude &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Project-local install&lt;/span&gt;
codegraph &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;auto &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Manual Claude Code configuration&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @colbymchenry/codegraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude.json&lt;/code&gt; (or project-level &lt;code&gt;.claude.json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"codegraph"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"codegraph"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verify installation&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codegraph status          &lt;span class="c"&gt;# Check index status and stats&lt;/span&gt;
codegraph query &lt;span class="s2"&gt;"UserService"&lt;/span&gt;  &lt;span class="c"&gt;# Test symbol search&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The 8 MCP Tools
&lt;/h3&gt;

&lt;p&gt;The complete toolset CodeGraph exposes to AI agents:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Typical Invocation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Find symbols by name&lt;/td&gt;
&lt;td&gt;"Find all functions called authenticate"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_context&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Build code context for a task&lt;/td&gt;
&lt;td&gt;"What code is relevant to the login flow?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_callers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Find what calls a function&lt;/td&gt;
&lt;td&gt;"What calls AuthService?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_callees&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Find what a function calls&lt;/td&gt;
&lt;td&gt;"What does processPayment call internally?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_impact&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Analyze change impact radius&lt;/td&gt;
&lt;td&gt;"What breaks if I change this function?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_node&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get details about a specific symbol&lt;/td&gt;
&lt;td&gt;"Show me UserController's full signature"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_files&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get indexed file structure&lt;/td&gt;
&lt;td&gt;"What is the overall project structure?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codegraph_status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check index health and stats&lt;/td&gt;
&lt;td&gt;"How many symbols are indexed? Last sync?"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;codegraph_context&lt;/code&gt; is the most important tool&lt;/strong&gt; — it doesn't just return search results; it intelligently assembles a comprehensive context package for a given task, including entry points, related symbols, and code snippets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Command-line equivalent&lt;/span&gt;
codegraph context &lt;span class="s2"&gt;"fix user login bug"&lt;/span&gt;
&lt;span class="c"&gt;# → Automatically finds login-related functions, call chains, and relevant files&lt;/span&gt;
&lt;span class="c"&gt;#   packaged into context Claude can consume directly&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Project Advantages
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;CodeGraph&lt;/th&gt;
&lt;th&gt;Native AI Agent (no assist)&lt;/th&gt;
&lt;th&gt;Other code indexers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool call count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~70% fewer&lt;/td&gt;
&lt;td&gt;High (re-scans each task)&lt;/td&gt;
&lt;td&gt;Partial reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token usage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~59% fewer&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Partial reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100% local&lt;/td&gt;
&lt;td&gt;Depends on agent&lt;/td&gt;
&lt;td&gt;Most require uploads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time sync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native OS file events&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Usually polling or manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;19+ languages&lt;/td&gt;
&lt;td&gt;Depends on agent&lt;/td&gt;
&lt;td&gt;Usually 3–5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Framework route detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;13 frameworks&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Installation complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One npx command&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Usually requires server&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Detailed Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Four-Stage Pipeline
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Extraction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;tree-sitter parses source files into ASTs, extracting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Symbols&lt;/strong&gt;: functions, classes, methods, interfaces, variable definitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relationships&lt;/strong&gt;: function calls, module imports, class inheritance, interface implementations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;tree-sitter's key advantage: it is a &lt;strong&gt;fault-tolerant parser&lt;/strong&gt; — it can extract partial structure even when code has syntax errors. This is critical for indexing files that are actively being edited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All data lands in a local SQLite database using the FTS5 (Full-Text Search 5) extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Symbols table (simplified)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VIRTUAL&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;symbols&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;fts5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;-- Symbol name&lt;/span&gt;
  &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;-- function/class/method/...&lt;/span&gt;
  &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- Source file&lt;/span&gt;
  &lt;span class="n"&gt;line_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- Starting line&lt;/span&gt;
  &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- Function signature&lt;/span&gt;
  &lt;span class="n"&gt;docstring&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- Documentation comment&lt;/span&gt;
  &lt;span class="n"&gt;code_snippet&lt;/span&gt;   &lt;span class="c1"&gt;-- Code excerpt&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Relationships table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;from_id&lt;/span&gt;  &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- Caller symbol ID&lt;/span&gt;
  &lt;span class="n"&gt;to_id&lt;/span&gt;    &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- Callee symbol ID&lt;/span&gt;
  &lt;span class="n"&gt;kind&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- calls/imports/inherits/implements&lt;/span&gt;
  &lt;span class="n"&gt;file&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;line&lt;/span&gt;     &lt;span class="nb"&gt;INTEGER&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Stage 3: Resolution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The critical step: resolving abstract "called something named X" into concrete "called the definition in file Y at line Z."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Source&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AuthService&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./auth.service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
             &lt;span class="p"&gt;...&lt;/span&gt;
             &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="err"&gt;↓&lt;/span&gt; &lt;span class="nx"&gt;resolution&lt;/span&gt;
&lt;span class="nx"&gt;Graph&lt;/span&gt; &lt;span class="nx"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;UserController&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;login&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nx"&gt;AuthService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;login &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
             &lt;span class="nx"&gt;UserController&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="nc"&gt;AuthService &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imports&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Stage 4: Auto-Sync&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Uses native OS file events (not polling!) to detect changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;macOS: &lt;code&gt;FSEvents&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Linux: &lt;code&gt;inotify&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Windows: &lt;code&gt;ReadDirectoryChangesW&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;2-second debounce&lt;/strong&gt; prevents triggering mass rebuilds when files change rapidly — it waits for changes to settle before doing incremental updates.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Benchmark Deep Dive
&lt;/h3&gt;

&lt;p&gt;Test conditions: Claude Code (headless, Opus 4.7) answering architecture questions. Each result is the median of 4 runs on the same question, across 7 real open-source repositories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Project        Language       Size            Cost ↓  Token ↓  Speed ↑  Tool Calls ↓
──────────────────────────────────────────────────────────────────────────────────────
VS Code        TypeScript     ~10k files      35%     73%      41%      72%
Excalidraw     TypeScript     ~600 files      47%     73%      60%      86%
Django         Python         ~2.7k files     34%     64%      59%      81%
Tokio          Rust           ~700 files      52%     81%      63%      89%
OkHttp         Java           ~640 files      17%     41%      36%      64%
Gin            Go             ~150 files      22%     23%      34%      19%
Alamofire      Swift          ~100 files      38%     59%      51%      77%
──────────────────────────────────────────────────────────────────────────────────────
Average                                       35%     59%      49%      70%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Patterns worth noting&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tokio (Rust, 700 files) sees the biggest gains&lt;/strong&gt; (81% token reduction, 89% fewer tool calls): Rust's type system is complex — agents originally needed extensive file exploration to understand trait implementations and generic relationships. CodeGraph's pre-built relationships make this dramatically cheaper.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gin (Go, 150 files) sees the smallest gains&lt;/strong&gt; (23% token reduction, 19% fewer tool calls): Small Go projects have simple file structures. Agents can already navigate them efficiently, so CodeGraph's marginal value is lower.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;VS Code's absolute numbers are the most striking&lt;/strong&gt;: the same question costs $0.64 (1.4M tokens) without CodeGraph, $0.42 (393k tokens) with it. A single task saves $0.22.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: &lt;strong&gt;The larger the codebase, the more complex the dependencies, and the richer the language's type system, the greater CodeGraph's benefit&lt;/strong&gt;. For developers using Claude Code heavily on large projects, the ROI is clear.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. 19 Languages + 13 Framework Route Detection
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Language support&lt;/strong&gt; (via tree-sitter grammars):&lt;/p&gt;

&lt;p&gt;TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Vue, Liquid, Pascal/Delphi, Scala&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Framework route detection&lt;/strong&gt; is a differentiating feature — CodeGraph doesn't just recognize symbols, it understands the mapping between URL routes and their handler functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Django
&lt;/span&gt;&lt;span class="n"&gt;urlpatterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;users/&amp;lt;int:pk&amp;gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UserDetailView&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_view&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# → CodeGraph knows GET /users/{id}/ maps to UserDetailView
&lt;/span&gt;
&lt;span class="c1"&gt;# FastAPI
&lt;/span&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/items/{item_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="c1"&gt;# → CodeGraph knows GET /items/{id} maps to read_item()
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 13 supported frameworks: Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, Gin/chi/gorilla/mux, Axum/actix/Rocket, ASP.NET, Vapor, React Router/SvelteKit.&lt;/p&gt;

&lt;p&gt;This means AI agents can ask "Where is the handler for &lt;code&gt;/api/users/:id&lt;/code&gt;?" and get a precise answer, without needing to scan routing config files.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. &lt;code&gt;codegraph affected&lt;/code&gt; — Smart CI Test Selection
&lt;/h3&gt;

&lt;p&gt;An underappreciated feature: by tracing import dependencies, it identifies which test files are actually affected by changed source files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CI scenario: only run tests affected by this change&lt;/span&gt;
git diff &lt;span class="nt"&gt;--name-only&lt;/span&gt; | codegraph affected &lt;span class="nt"&gt;--stdin&lt;/span&gt;

&lt;span class="c"&gt;# Manually specify changed files&lt;/span&gt;
codegraph affected src/auth.ts

&lt;span class="c"&gt;# With filter (only e2e tests)&lt;/span&gt;
codegraph affected src/auth.ts &lt;span class="nt"&gt;--filter&lt;/span&gt; &lt;span class="s2"&gt;"e2e/*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Changed: src/auth.ts
  ↓ CodeGraph queries the dependency graph
  Direct importers: user.service.ts, auth.controller.ts
  Indirect importers: app.module.ts, integration.test.ts
  ↓ Filter to &lt;span class="nb"&gt;test &lt;/span&gt;files only
  Affected tests: auth.spec.ts, user.service.spec.ts, integration.test.ts
  ↓ Output
  &lt;span class="o"&gt;[&lt;/span&gt;these files] ← run only these, not the full &lt;span class="nb"&gt;test &lt;/span&gt;suite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On large projects, this can compress CI test time from tens of minutes to a few minutes.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Configuration and Performance Notes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Project config file&lt;/strong&gt; (&lt;code&gt;.codegraph/config.json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"languages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"typescript"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"javascript"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exclude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"node_modules/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dist/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"build/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*.min.js"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"maxFileSize"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1048576&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"extractDocstrings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trackCallSites"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQLite backend selection&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;CodeGraph ships with two SQLite backends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native &lt;code&gt;better-sqlite3&lt;/code&gt;&lt;/strong&gt; (default, recommended): High performance, supports concurrent reads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WASM fallback&lt;/strong&gt;: Better compatibility, but 5–10x slower than native, and concurrent operations may produce &lt;code&gt;database is locked&lt;/code&gt; errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you encounter performance issues or lock errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Rebuild the native module&lt;/span&gt;
npm rebuild better-sqlite3

&lt;span class="c"&gt;# Check which backend is active&lt;/span&gt;
codegraph status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  CLI Reference
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codegraph                        &lt;span class="c"&gt;# Run interactive installer&lt;/span&gt;
codegraph init &lt;span class="o"&gt;[&lt;/span&gt;path]            &lt;span class="c"&gt;# Initialize in a project&lt;/span&gt;
codegraph uninit &lt;span class="o"&gt;[&lt;/span&gt;path]          &lt;span class="c"&gt;# Remove CodeGraph from a project&lt;/span&gt;
codegraph index &lt;span class="o"&gt;[&lt;/span&gt;path]           &lt;span class="c"&gt;# Full index (--force to rebuild)&lt;/span&gt;
codegraph &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;path]            &lt;span class="c"&gt;# Incremental update&lt;/span&gt;
codegraph status &lt;span class="o"&gt;[&lt;/span&gt;path]          &lt;span class="c"&gt;# Show statistics&lt;/span&gt;
codegraph query &amp;lt;search&amp;gt;         &lt;span class="c"&gt;# Search symbols&lt;/span&gt;
codegraph files &lt;span class="o"&gt;[&lt;/span&gt;path]           &lt;span class="c"&gt;# Show file structure&lt;/span&gt;
codegraph context &amp;lt;task&amp;gt;         &lt;span class="c"&gt;# Build AI context for a task&lt;/span&gt;
codegraph affected &lt;span class="o"&gt;[&lt;/span&gt;files]       &lt;span class="c"&gt;# Find affected test files&lt;/span&gt;
codegraph serve &lt;span class="nt"&gt;--mcp&lt;/span&gt;            &lt;span class="c"&gt;# Start MCP server&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Library API&lt;/strong&gt; (embed CodeGraph in your own tools):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;CodeGraph&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@colbymchenry/codegraph&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;CodeGraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/path/to/project&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Full index with progress callbacks&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexAll&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;onProgress&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;phase&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Search symbols&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;searchNodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;UserService&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Get call chain&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;callers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getCallers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Build AI context&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;buildContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fix login bug&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;maxNodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;includeCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Impact radius analysis (depth 2)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;impact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getImpactRadius&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;watch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;   &lt;span class="c1"&gt;// Start file watching for auto-sync&lt;/span&gt;
&lt;span class="nx"&gt;cg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;   &lt;span class="c1"&gt;// Clean up resources&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Project Links &amp;amp; Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/colbymchenry/codegraph" rel="noopener noreferrer"&gt;https://github.com/colbymchenry/codegraph&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;npm&lt;/strong&gt;: &lt;a href="https://www.npmjs.com/package/@colbymchenry/codegraph" rel="noopener noreferrer"&gt;@colbymchenry/codegraph&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Quick install&lt;/strong&gt;: &lt;code&gt;npx @colbymchenry/codegraph&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Target Audience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Heavy Claude Code / Cursor users&lt;/strong&gt;: Working on large projects and looking to reduce cost and improve response speed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large TypeScript/Rust/Python project developers&lt;/strong&gt;: Codebases large enough that AI agent file-scanning overhead is noticeable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD engineers&lt;/strong&gt;: Using &lt;code&gt;codegraph affected&lt;/code&gt; for smart test selection to eliminate unnecessary full test runs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Toolchain developers&lt;/strong&gt;: Embedding code semantic analysis into their own tools via the Library API&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Core value&lt;/strong&gt;: Inserts a pre-built semantic index between AI agents and codebases — average 35% cost savings, 70% fewer tool calls, 49% speed improvement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technology choices&lt;/strong&gt;: tree-sitter (AST parsing) + SQLite FTS5 (full-text search) + native OS file events (live sync) — zero external service dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 MCP tools&lt;/strong&gt;: &lt;code&gt;codegraph_context&lt;/code&gt; is the most critical — one call returns a complete context package for the task at hand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;19 languages + 13 framework route detection&lt;/strong&gt; covering mainstream development stacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;codegraph affected&lt;/code&gt;&lt;/strong&gt;: dependency-traced smart test selection, a CI acceleration tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gains scale with codebase size&lt;/strong&gt;: Tokio (Rust, 700 files) reaches 89% fewer tool calls; small Go projects see ~19%&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  One-Line Review
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;CodeGraph does something deceptively simple yet extremely practical: it converts the code discovery work that AI agents redo on every task into a reusable local index — not a feature addition, but a workflow architecture optimization.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>claude</category>
      <category>mcp</category>
      <category>sqlite</category>
    </item>
    <item>
      <title>RAG Series (23): Multimodal RAG — Images and Tables Can Be Retrieved Too</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Wed, 20 May 2026 12:24:18 +0000</pubDate>
      <link>https://dev.to/wonderlab/rag-series-23-multimodal-rag-images-and-tables-can-be-retrieved-too-3gj5</link>
      <guid>https://dev.to/wonderlab/rag-series-23-multimodal-rag-images-and-tables-can-be-retrieved-too-3gj5</guid>
      <description>&lt;h2&gt;
  
  
  What Text RAG Can't See
&lt;/h2&gt;

&lt;p&gt;Upload an annual report PDF. It contains revenue trend charts, product comparison tables, architecture diagrams. What does traditional RAG do?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A PDF parser extracts text&lt;/li&gt;
&lt;li&gt;Text is chunked, embedded, stored in the vector store&lt;/li&gt;
&lt;li&gt;User asks: "What was the revenue growth in Q3?"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem: &lt;strong&gt;the revenue chart is an image.&lt;/strong&gt; The PDF parser extracts its alt text (usually empty) or filename. The numbers are in the image, not the text. RAG will never find them.&lt;/p&gt;

&lt;p&gt;Tables are slightly better, but still problematic: parsers often flatten tables into lines of text, destroying the row/column structure and garbling the semantics.&lt;/p&gt;

&lt;p&gt;This is a real business pain point. Roughly 30–50% of the information in real-world documents exists in non-plain-text form.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Approaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Approach 1: Extract and Textualize
&lt;/h3&gt;

&lt;p&gt;The most direct and most mature approach: convert images and tables into text descriptions, then run standard text RAG.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Images: use a Vision Language Model (VLM) to generate descriptions&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;describe_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/png;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe this image in detail, including all numbers, labels, trends, and key information. If this is a chart, list all data points.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tables: use &lt;code&gt;pdfplumber&lt;/code&gt; to preserve structure, convert to Markdown&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pdfplumber&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_tables_as_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;tables_md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pdfplumber&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_tables&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;
                &lt;span class="n"&gt;header&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
                &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;| &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; |&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;| &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; |&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;| &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; |&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;tables_md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Page &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; table]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tables_md&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Integrate into the RAG pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.documents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# 1. Extract plain text
&lt;/span&gt;    &lt;span class="n"&gt;text_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text_chunks&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Extract images → VLM descriptions
&lt;/span&gt;    &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_images_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;img_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;describe_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;img_path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Extract tables → Markdown
&lt;/span&gt;    &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_tables_as_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;table_md&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;table_md&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Compatible with all existing text RAG infrastructure; no changes to the vector store.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: VLM captioning adds cost and latency; description quality directly affects retrieval quality; OCR is sensitive to scan quality.&lt;/p&gt;


&lt;h3&gt;
  
  
  Approach 2: CLIP Multimodal Embeddings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Principle&lt;/strong&gt;: CLIP (Contrastive Language–Image Pre-training, OpenAI 2021) projects both text and images into the &lt;strong&gt;same vector space&lt;/strong&gt;. The embedding of the phrase "revenue trend chart" will be close to the embedding of an actual revenue trend chart image.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_experimental.open_clip&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenCLIPEmbeddings&lt;/span&gt;

&lt;span class="n"&gt;clip_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenCLIPEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ViT-H-14&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;laion2b_s32b_b79k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Embed text
&lt;/span&gt;&lt;span class="n"&gt;text_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clip_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q3 revenue trend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Embed image
&lt;/span&gt;&lt;span class="n"&gt;image_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clip_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_image&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/chart.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Both are in the same vector space — similarity is meaningful
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;numpy.linalg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;norm&lt;/span&gt;
&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_embedding&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_embedding&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Similarity: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# typically &amp;gt; 0.3 for semantically related pairs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Build a mixed text+image vector store:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="c1"&gt;# Images stored with their CLIP embeddings
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;img_path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;image_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;img_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clip_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_image&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;img_path&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;doc_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;image_vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[IMAGE]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;img_embedding&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;img_path&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dual-path retrieval at query time:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multimodal_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Text retrieval
&lt;/span&gt;    &lt;span class="n"&gt;text_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Image retrieval (via CLIP's text encoder)
&lt;/span&gt;    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clip_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;image_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search_by_vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_results&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;image_results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Images don't need pre-captioning; retrieval operates on visual content directly.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: CLIP performs well on natural photographs but poorly on professional charts and graphs — those require understanding numerical relationships, not just visual recognition.&lt;/p&gt;


&lt;h3&gt;
  
  
  Approach 3: ColPali (The 2024 Breakthrough)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;: Traditional document RAG follows this pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF → extract text/images → textualize → embed → retrieve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every step loses information or introduces noise. ColPali (Google Research, 2024) took a different approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF → screenshot each page → vision language model → page-level embeddings → retrieve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Process each PDF page directly as an image. Bypass text extraction entirely.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Key components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backbone&lt;/strong&gt;: PaliGemma 3B (Google's vision language model)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Late Interaction&lt;/strong&gt; (from ColBERT): each page is divided into 1,030 patches; each patch gets its own embedding; queries generate token-level embeddings; retrieval scores via fine-grained patch × token similarity, then aggregates&lt;/li&gt;
&lt;li&gt;The result: ColPali can pinpoint which part of a page answers a question
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using the byaldi library (Python interface for ColPali)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;byaldi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RAGMultiModalModel&lt;/span&gt;

&lt;span class="c1"&gt;# Load ColPali
&lt;/span&gt;&lt;span class="n"&gt;RAG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RAGMultiModalModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vidore/colpali-v1.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Index a PDF directory (screenshots each page, generates patch embeddings)
&lt;/span&gt;&lt;span class="n"&gt;RAG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;input_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./financial_reports/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reports_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;store_collection_with_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# save original images for answer generation
&lt;/span&gt;    &lt;span class="n"&gt;overwrite&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieve (returns the most relevant pages)
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RAG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q3 revenue quarter-over-quarter growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;doc_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Page: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;page_num&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Generate an answer from the retrieved page image:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer_with_page_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;img_b64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/png;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;img_b64&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Based on this page, answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The full ColPali flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User question → ColPali retrieves most relevant pages → extract page images → send to VLM → generate answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles charts, formulas, and mixed layouts natively — no OCR required&lt;/li&gt;
&lt;li&gt;Page-level understanding preserves visual layout&lt;/li&gt;
&lt;li&gt;Significantly outperforms traditional methods on visually dense documents (research papers, financial reports)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heavy model (PaliGemma 3B); retrieval latency higher than vector lookup&lt;/li&gt;
&lt;li&gt;Requires NVIDIA GPU; not suitable for CPU-only deployments&lt;/li&gt;
&lt;li&gt;Long index-build time (each page requires a forward pass)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Dedicated Table Handling
&lt;/h2&gt;

&lt;p&gt;Tables are different from images — they have structured semantics and deserve specialized treatment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 1: Preserve Markdown structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;table_to_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;header&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;| &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; |&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;| &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:---:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; |&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]:&lt;/span&gt;
        &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;| &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; |&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good LLMs can reason across rows and columns in Markdown format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 2: Summary for retrieval + full table for generation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_md&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;table_metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Use LLM to generate a retrieval-friendly summary
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the key information in this table in one sentence (under 50 words):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_md&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Store summary as the retrieval unit, full table in metadata
&lt;/span&gt;    &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;table_metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;full_table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;table_md&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retrieve by summary; send the full table Markdown to the LLM for answer generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 3: Structured extraction → natural language
&lt;/h3&gt;

&lt;p&gt;For high-value tables (financials, product specs), extract as structured data then convert to natural language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Table → JSON
&lt;/span&gt;&lt;span class="n"&gt;table_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;columns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quarter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revenue ($B)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QoQ Growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rows&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quarter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revenue ($B)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;12.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QoQ Growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+5.2%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quarter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revenue ($B)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;14.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QoQ Growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+14.6%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quarter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revenue ($B)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;13.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QoQ Growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-2.1%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# JSON → natural language (better for semantic retrieval)
&lt;/span&gt;&lt;span class="n"&gt;nl_description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quarterly revenue data: Q1 $12.3B, Q2 $14.1B (up 14.6% QoQ), &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q3 $13.8B (down 2.1% QoQ).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Natural language is more retrieval-friendly and can be directly quoted in the LLM's answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Which Approach to Choose
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Extract + Textualize&lt;/th&gt;
&lt;th&gt;CLIP Multimodal&lt;/th&gt;
&lt;th&gt;ColPali&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document types&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;Image-heavy&lt;/td&gt;
&lt;td&gt;Visually dense (reports, academic PDFs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standard text RAG&lt;/td&gt;
&lt;td&gt;Requires CLIP&lt;/td&gt;
&lt;td&gt;Requires GPU, heavy model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chart understanding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Depends on VLM caption quality&lt;/td&gt;
&lt;td&gt;Weak (charts ≠ natural photos)&lt;/td&gt;
&lt;td&gt;Strong (page-level understanding)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Update cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High (re-indexing is expensive)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Engineering complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VLM captioning fees&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Model inference cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Practical recommendations for most scenarios:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario                              Recommended approach
──────────────────────────────────────────────────────────────
Standard enterprise docs (few images)  Text RAG, OCR or ignore images
Product docs (architecture diagrams)   Extract + GPT-4V caption
Financial/research reports (charts)    ColPali
E-commerce image search                CLIP
Quick knowledge base prototype         Extract + textualize (simplest)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  A Complete Multimodal RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;Combining the approaches into a unified pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;enum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Enum&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DocElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;TEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;IMAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MultimodalRAGPipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clip_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_embeddings&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clip_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clip_embeddings&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text_embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_all_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# text / images / tables
&lt;/span&gt;        &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;DocElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;DocElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IMAGE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;caption&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_generate_caption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;caption&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;DocElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;table_to_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_generate_caption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;describe_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# calls GPT-4V
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;images_to_show&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Image description] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;images_to_show&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer based on the following:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;---&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;images_to_show&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Multimodal RAG is fundamentally about &lt;strong&gt;converting non-text information into a retrievable form&lt;/strong&gt;, then returning the original content to the LLM at answer-generation time. Three approaches:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extract and textualize&lt;/strong&gt;: most mature, engineering-simple, but dependent on OCR/VLM quality — suitable for most scenarios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLIP multimodal embeddings&lt;/strong&gt;: unified vector space for text and images; good for natural photograph retrieval; limited on professional charts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ColPali&lt;/strong&gt;: direct visual page processing; best results for chart-heavy documents; requires GPU and higher engineering investment&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tables are often simpler than images: preserve Markdown structure + generate a retrieval summary, and standard text RAG handles them well.&lt;/p&gt;

&lt;p&gt;Next (and final) in this series: &lt;strong&gt;Code RAG&lt;/strong&gt; — helping AI understand your codebase, including AST-based splitting, code embedding models, and representing call graphs with knowledge graphs.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2407.01449" rel="noopener noreferrer"&gt;ColPali: Efficient Document Retrieval with Vision Language Models (2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2103.00020" rel="noopener noreferrer"&gt;CLIP: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/AnswerDotAI/byaldi" rel="noopener noreferrer"&gt;byaldi: Python interface for ColPali&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/how_to/multimodal_inputs/" rel="noopener noreferrer"&gt;LangChain Multi-modal RAG Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jsvine/pdfplumber" rel="noopener noreferrer"&gt;pdfplumber: Structured PDF extraction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>multimodal</category>
      <category>vision</category>
      <category>llm</category>
    </item>
    <item>
      <title>One Open Source Project a Day (No. 70): Claude Plugins Official — A Complete Tour of Anthropic's Official Claude Code Plugin Ecosystem</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Wed, 20 May 2026 01:59:45 +0000</pubDate>
      <link>https://dev.to/wonderlab/one-open-source-project-a-day-no-70-claude-plugins-official-a-complete-tour-of-anthropics-4lgo</link>
      <guid>https://dev.to/wonderlab/one-open-source-project-a-day-no-70-claude-plugins-official-a-complete-tour-of-anthropics-4lgo</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Plugins extend Claude Code with commands, agents, skills, hooks, and MCP servers."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the NO.70 article in the "One Open Source Project a Day" series. Today we are exploring &lt;strong&gt;claude-plugins-official&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is Anthropic's officially maintained plugin registry on GitHub. If you have been using Claude Code lately, you have almost certainly seen the &lt;code&gt;/plugin install&lt;/code&gt; command — this repository is the official plugin source behind that command.&lt;/p&gt;

&lt;p&gt;20.2k Stars, 2.5k Forks, 669 open issues. Behind these numbers is a rapidly maturing ecosystem: Anthropic engineers have contributed 30+ plugins covering LSP support for 12 languages, a multi-agent PR review toolkit, Git workflow automation, and code quality analysis. 15 external partners — including GitHub, Firebase, Linear, and Terraform — have already joined.&lt;/p&gt;

&lt;p&gt;But this article is not just about "what plugins exist." What deserves equal attention is the &lt;strong&gt;design of the plugin specification itself&lt;/strong&gt;: a single plugin.json file, three extension mechanisms (Skills/Commands/MCP), and two trigger modes (user-invoked vs. model-invoked). Understanding this architecture means understanding the boundaries and possibilities of Claude Code extensibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Will Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The complete plugin directory structure of &lt;code&gt;claude-plugins-official&lt;/code&gt; (internal vs. external plugins)&lt;/li&gt;
&lt;li&gt;The Claude Code plugin specification: from &lt;code&gt;plugin.json&lt;/code&gt; to Skills/Commands/MCP&lt;/li&gt;
&lt;li&gt;Deep dives into five key plugins: &lt;code&gt;pr-review-toolkit&lt;/code&gt;, &lt;code&gt;agent-sdk-dev&lt;/code&gt;, &lt;code&gt;code-review&lt;/code&gt;, &lt;code&gt;hookify&lt;/code&gt;, &lt;code&gt;commit-commands&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;How to build a spec-compliant Claude Code plugin from scratch&lt;/li&gt;
&lt;li&gt;The current state of the external partner plugin ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with Claude Code (basic slash command usage)&lt;/li&gt;
&lt;li&gt;A basic understanding of MCP (Model Context Protocol)&lt;/li&gt;
&lt;li&gt;Interest in building your own Claude Code extensions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Introduction
&lt;/h3&gt;

&lt;p&gt;claude-plugins-official is Anthropic's official plugin registry, serving the Claude Code plugin ecosystem. It plays two roles simultaneously:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Plugin registry&lt;/strong&gt;: Users install plugins via &lt;code&gt;/plugin install &amp;lt;name&amp;gt;@claude-plugins-official&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development specification reference&lt;/strong&gt;: &lt;code&gt;plugins/example-plugin&lt;/code&gt; is the official complete reference implementation, demonstrating all extension mechanisms&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The repository is divided into two main directories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/plugins&lt;/code&gt;&lt;/strong&gt;: Developed by Anthropic engineers, covering LSP, development workflows, code quality, output styles, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/external_plugins&lt;/code&gt;&lt;/strong&gt;: Submitted by partners and the community, subject to quality and security review&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Author/Team Introduction
&lt;/h3&gt;

&lt;p&gt;This is a multi-contributor Anthropic internal project, with different engineers leading each plugin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;pr-review-toolkit&lt;/strong&gt;: Daisy (&lt;a href="mailto:daisy@anthropic.com"&gt;daisy@anthropic.com&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;agent-sdk-dev&lt;/strong&gt;: Ashwin Bhat (&lt;a href="mailto:ashwin@anthropic.com"&gt;ashwin@anthropic.com&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;code-review&lt;/strong&gt;: Boris Cherny (&lt;a href="mailto:boris@anthropic.com"&gt;boris@anthropic.com&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;frontend-design&lt;/strong&gt;: Prithvi Rajasekaran + Alexander Bricken&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;commit-commands&lt;/strong&gt;: Anthropic team&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;20,200+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: &lt;strong&gt;2,500+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;👁️ Watchers: &lt;strong&gt;147&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🐛 Open Issues: &lt;strong&gt;669&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;💻 Languages: Python (31.6%), TypeScript (28.9%), HTML (19.5%), Shell (13.0%), JavaScript (7.0%)&lt;/li&gt;
&lt;li&gt;🏷️ Topics: &lt;code&gt;skills&lt;/code&gt;, &lt;code&gt;mcp&lt;/code&gt;, &lt;code&gt;claude-code&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;🌐 Repository: &lt;a href="https://github.com/anthropics/claude-plugins-official" rel="noopener noreferrer"&gt;anthropics/claude-plugins-official&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Docs: &lt;a href="https://code.claude.com/docs/en/plugins" rel="noopener noreferrer"&gt;code.claude.com/docs/en/plugins&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Main Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Utility
&lt;/h3&gt;

&lt;p&gt;claude-plugins-official provides a standardized way to extend Claude Code's capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Native Claude Code capabilities
        ↓
  /plugin install &amp;lt;name&amp;gt;@claude-plugins-official
        ↓
  Extended Claude Code
  ├── New Slash Commands (user-invoked)
  ├── New Skills (model-auto-triggered)
  ├── New Agents (specialized task delegates)
  └── New MCP Tools (external service integrations)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Installing plugins (CLI)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# PR review toolkit&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;pr-review-toolkit@claude-plugins-official

&lt;span class="c"&gt;# Git commit command suite&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;commit-commands@claude-plugins-official

&lt;span class="c"&gt;# Agent SDK development tools&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;agent-sdk-dev@claude-plugins-official

&lt;span class="c"&gt;# Code review plugin&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;code-review@claude-plugins-official

&lt;span class="c"&gt;# Hook management tool&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;hookify@claude-plugins-official
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Installing plugins (UI)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In Claude Code, type: /plugin
→ Click "Discover"
→ Browse and install desired plugins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Installing external partner plugins&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# GitHub integration&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;github@claude-plugins-official

&lt;span class="c"&gt;# Linear project management&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;linear@claude-plugins-official

&lt;span class="c"&gt;# Firebase integration&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;firebase@claude-plugins-official

&lt;span class="c"&gt;# Terraform infrastructure&lt;/span&gt;
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;terraform@claude-plugins-official
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Plugin Directory Overview
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Internal Plugins (/plugins)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Plugins&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LSP Language Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;clangd-lsp, csharp-lsp, gopls-lsp, jdtls-lsp, kotlin-lsp, lua-lsp, php-lsp, pyright-lsp, ruby-lsp, rust-analyzer-lsp, swift-lsp, typescript-lsp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Development Workflows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;agent-sdk-dev, claude-code-setup, commit-commands, feature-dev, mcp-server-dev, plugin-dev, pr-review-toolkit, ralph-loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code Quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;code-modernization, code-review, code-simplifier, security-guidance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output Styles&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;explanatory-output-style, learning-output-style&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Utility Tools&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;claude-md-management, cwc-makers, frontend-design, hookify, math-olympiad, session-report, skill-creator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;example-plugin&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;External Partner Plugins (/external_plugins)&lt;/strong&gt;: asana, context7, discord, fakechat, firebase, github, gitlab, greptile, imessage, laravel-boost, linear, playwright, serena, telegram, terraform&lt;/p&gt;




&lt;h2&gt;
  
  
  Five Key Plugins: Deep Dives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Plugin 1: pr-review-toolkit — Six Parallel PR Review Agents
&lt;/h3&gt;

&lt;p&gt;This is arguably the most elegantly designed plugin in the entire directory: &lt;strong&gt;6 specialized agents running in parallel, reviewing the same Pull Request from different angles simultaneously&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PR submitted
  ↓
comment-analyzer      ← Comment accuracy, documentation completeness, comment rot
pr-test-analyzer      ← Test coverage quality, edge cases, behavioral vs. line coverage
silent-failure-hunter ← Silent failures, empty catch blocks, missing error logging
type-design-analyzer  ← Type encapsulation, invariants (rated 1–10 on 4 dimensions)
code-reviewer         ← CLAUDE.md compliance, style, bugs (0–100 score)
code-simplifier       ← Readability, unnecessary complexity, redundant abstractions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Natural language triggering&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Are the tests thorough?"                  → triggers pr-test-analyzer
"Check the error handling in the API client" → triggers silent-failure-hunter
"Is this documentation accurate?"           → triggers comment-analyzer
"Simplify this code"                        → triggers code-simplifier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Full PR review (trigger all agents at once)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"I'm ready to create this PR. Please:
1. Review test coverage
2. Check for silent failures
3. Verify code comments are accurate
4. Review any new types
5. General code review"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Recommended workflow&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write code → code-reviewer
Fix issues → silent-failure-hunter (if error handling changed)
Add tests  → pr-test-analyzer
Document   → comment-analyzer
Polish     → code-simplifier
→ Create PR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Plugin 2: agent-sdk-dev — Agent SDK Project Scaffolding
&lt;/h3&gt;

&lt;p&gt;This plugin compresses "building a Claude Agent SDK project from scratch" into a single command.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/new-sdk-app&lt;/code&gt; command&lt;/strong&gt; (interactive project creation):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/new-sdk-app my-agent-project
&lt;span class="c"&gt;# Interactive prompts:&lt;/span&gt;
&lt;span class="c"&gt;# 1. Language: TypeScript or Python?&lt;/span&gt;
&lt;span class="c"&gt;# 2. Agent type: coding / business / custom&lt;/span&gt;
&lt;span class="c"&gt;# 3. Starting point: minimal / basic / specific example&lt;/span&gt;
&lt;span class="c"&gt;# 4. Package manager: npm/yarn/pnpm or pip/poetry&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it does automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checks and installs the latest SDK version&lt;/li&gt;
&lt;li&gt;Creates project file structure, &lt;code&gt;.env.example&lt;/code&gt;, &lt;code&gt;.gitignore&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Runs type checking (TS) or syntax validation (Python)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatically runs the appropriate Verifier agent&lt;/strong&gt; (validates the project against SDK best practices)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Two Verifier Agents&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Python project verification&lt;/span&gt;
&lt;span class="s2"&gt;"Verify my Python Agent SDK application"&lt;/span&gt;
→ Checks: SDK installation, requirements.txt/pyproject.toml,
           SDK usage patterns, agent init/config, .env security, error handling

&lt;span class="c"&gt;# TypeScript project verification&lt;/span&gt;
&lt;span class="s2"&gt;"Verify my TypeScript Agent SDK application"&lt;/span&gt;
→ Checks: SDK installation, tsconfig.json, &lt;span class="nb"&gt;type &lt;/span&gt;safety/imports,
           agent init/config, .env security, error handling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verifier output format&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Overall Status: PASS / PASS WITH WARNINGS / FAIL
- Critical Issues (blocking functionality)
- Warnings (suboptimal patterns)
- Passed Checks
- Recommendations (with SDK documentation links)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Plugin 3: code-review — Confidence-Filtered 4-Agent Code Review
&lt;/h3&gt;

&lt;p&gt;This plugin addresses the most common problem with AI code review: &lt;strong&gt;too many false positives&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Pre-checks — Skip closed, draft, or trivial PRs
2. Collect CLAUDE.md guideline files from the repo
3. Summarize PR changes
4. Launch 4 parallel agents:
   - Agent #1 &amp;amp; #2 → CLAUDE.md compliance checks
   - Agent #3 → Bug detection (changed code only)
   - Agent #4 → Git blame/history context analysis
5. Score each issue 0–100 for confidence
6. Filter out issues below threshold (default: 80)
7. Post review comment with high-confidence issues only
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Confidence scale&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Not confident, likely false positive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;Somewhat confident, might be real&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;Moderately confident, real but minor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;Highly confident, real and important&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;Absolutely certain, must fix&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What gets filtered out&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-existing issues not introduced by this PR&lt;/li&gt;
&lt;li&gt;Code that looks buggy but isn't&lt;/li&gt;
&lt;li&gt;Issues that linters will catch&lt;/li&gt;
&lt;li&gt;Items with lint-ignore comments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Configuring the confidence threshold&lt;/strong&gt; (in &lt;code&gt;commands/code-review.md&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Change 80 to your preferred threshold&lt;/span&gt;
&lt;span class="err"&gt;Filter&lt;/span&gt; &lt;span class="err"&gt;out&lt;/span&gt; &lt;span class="err"&gt;any&lt;/span&gt; &lt;span class="err"&gt;issues&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;a&lt;/span&gt; &lt;span class="err"&gt;score&lt;/span&gt; &lt;span class="err"&gt;less&lt;/span&gt; &lt;span class="err"&gt;than&lt;/span&gt; &lt;span class="err"&gt;80.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Plugin 4: hookify — Create Claude Code Hooks in Plain English
&lt;/h3&gt;

&lt;p&gt;Claude Code's Hooks feature (triggering custom logic before/after tool executions) requires manually editing complex JSON configuration. hookify eliminates that friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core commands&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a rule from plain English&lt;/span&gt;
/hookify Warn me when I use &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; commands

&lt;span class="c"&gt;# Analyze recent conversation, auto-suggest behaviors to block&lt;/span&gt;
/hookify

&lt;span class="c"&gt;# List all rules&lt;/span&gt;
/hookify:list

&lt;span class="c"&gt;# Enable/disable rules interactively&lt;/span&gt;
/hookify:configure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rules take effect &lt;strong&gt;immediately&lt;/strong&gt; — no restart required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule file format&lt;/strong&gt; (stored as &lt;code&gt;.claude/hookify.&amp;lt;name&amp;gt;.local.md&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block-dangerous-rm&lt;/span&gt;
&lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;
&lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rm\s+-rf&lt;/span&gt;
&lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

⚠️ &lt;span class="gs"&gt;**Dangerous rm command detected!**&lt;/span&gt;

Please verify the path and ensure you have backups.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advanced rules (multiple conditions)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warn-sensitive-files&lt;/span&gt;
&lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file&lt;/span&gt;
&lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warn&lt;/span&gt;
&lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;field&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file_path&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regex_match&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;\.env$|credentials|secrets&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;field&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;new_text&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;contains&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KEY&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

🔐 &lt;span class="gs"&gt;**Sensitive file edit detected!**&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Event types and when they fire&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;event&lt;/th&gt;
&lt;th&gt;Fires when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Before a Bash command executes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;On file read/write operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When the user submits a prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stop&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When Claude is about to stop responding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;all&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;All of the above&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Practical rule examples&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Block destructive commands&lt;/span&gt;
&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;
&lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rm\s+-rf|dd\s+if=|mkfs|format&lt;/span&gt;
&lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block&lt;/span&gt;

&lt;span class="c1"&gt;# Warn about debug code&lt;/span&gt;
&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file&lt;/span&gt;
&lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;console\.log\(|debugger;&lt;/span&gt;
&lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warn&lt;/span&gt;

&lt;span class="c1"&gt;# Require tests before stopping&lt;/span&gt;
&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stop&lt;/span&gt;
&lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;block&lt;/span&gt;
&lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;field&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;transcript&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;not_contains&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test|pytest|cargo test&lt;/span&gt;

&lt;span class="c1"&gt;# Prevent hardcoded API keys in TypeScript&lt;/span&gt;
&lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file&lt;/span&gt;
&lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;field&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file_path&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regex_match&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;\.tsx?$&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;field&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;new_text&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regex_match&lt;/span&gt;
    &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(API_KEY|SECRET|TOKEN)\s*=\s*["']&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Plugin 5: commit-commands — Three-Command Git Workflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Auto-generate commit message and commit&lt;/span&gt;
/commit

&lt;span class="c"&gt;# Full one-command workflow: commit → push → create PR&lt;/span&gt;
/commit-push-pr

&lt;span class="c"&gt;# Clean up local branches whose remotes have been deleted&lt;/span&gt;
/clean_gone
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What &lt;code&gt;/commit&lt;/code&gt; does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Analyzes staged and unstaged changes&lt;/li&gt;
&lt;li&gt;Reviews recent commit history to match repo style&lt;/li&gt;
&lt;li&gt;Stages files and creates a commit message with Claude Code attribution&lt;/li&gt;
&lt;li&gt;Automatically skips sensitive files (&lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;credentials.json&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What &lt;code&gt;/commit-push-pr&lt;/code&gt; does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates a new branch if currently on &lt;code&gt;main&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Commits the changes&lt;/li&gt;
&lt;li&gt;Pushes to &lt;code&gt;origin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Creates a PR via GitHub CLI (includes Summary + test checklist)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Plugin Specification Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Standard Plugin Directory Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;plugin-name/&lt;/span&gt;
&lt;span class="s"&gt;├── .claude-plugin/&lt;/span&gt;
&lt;span class="s"&gt;│   └── plugin.json&lt;/span&gt;      &lt;span class="c1"&gt;# Plugin metadata (required)&lt;/span&gt;
&lt;span class="s"&gt;├── .mcp.json&lt;/span&gt;            &lt;span class="c1"&gt;# MCP server config (optional)&lt;/span&gt;
&lt;span class="s"&gt;├── skills/&lt;/span&gt;              &lt;span class="c1"&gt;# Skill definitions (preferred)&lt;/span&gt;
&lt;span class="s"&gt;│   ├── my-skill/&lt;/span&gt;
&lt;span class="s"&gt;│   │   └── SKILL.md&lt;/span&gt;     &lt;span class="c1"&gt;# Model-auto-triggered skill&lt;/span&gt;
&lt;span class="s"&gt;│   └── my-command/&lt;/span&gt;
&lt;span class="s"&gt;│       └── SKILL.md&lt;/span&gt;     &lt;span class="c1"&gt;# User-invoked slash command (also uses SKILL.md)&lt;/span&gt;
&lt;span class="s"&gt;├── commands/&lt;/span&gt;            &lt;span class="c1"&gt;# Slash commands (legacy format)&lt;/span&gt;
&lt;span class="s"&gt;│   └── my-command.md&lt;/span&gt;
&lt;span class="s"&gt;├── agents/&lt;/span&gt;              &lt;span class="c1"&gt;# Agent definitions&lt;/span&gt;
&lt;span class="s"&gt;└── README.md&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The plugin.json Metadata Format
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-plugin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A description of what this plugin does"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Your Name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deliberately minimal by design: only three fields — no version, no dependency declarations. A plugin's capabilities are determined by its directory contents, not its metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Extension Mechanisms Compared
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mechanism 1: Skills (Recommended for new plugins)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Model-auto-triggered (context-based)&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;security-review&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Automatically trigger security review when security-related code is detected&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;# User-invoked (becomes /skill-name command)&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-command&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Short description shown in /help&lt;/span&gt;
&lt;span class="na"&gt;argument-hint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;arg1&amp;gt; [optional-arg]&lt;/span&gt;
&lt;span class="na"&gt;allowed-tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Read&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Glob&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Grep&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mechanism 2: Commands (Legacy format, functionally equivalent)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# commands/my-command.md
---
# YAML frontmatter
---
# Command content (same as SKILL.md)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mechanism 3: MCP Servers (External service integration)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;.mcp.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"my-service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.myservice.com/mcp"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Key Difference Between Two Skill Trigger Modes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger Type&lt;/th&gt;
&lt;th&gt;Activation&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User-invoked&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User types &lt;code&gt;/skill-name&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Explicit one-time operations (commit, review)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model-auto-triggered&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude judges from task context&lt;/td&gt;
&lt;td&gt;Continuous context enhancement (output styles, security checks)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This distinction is critical: the &lt;code&gt;frontend-design&lt;/code&gt; plugin is a perfect example of model-auto-triggered — the user simply says "create a dashboard" and Claude automatically applies that plugin's design principles, no explicit invocation required.&lt;/p&gt;




&lt;h2&gt;
  
  
  External Plugin Submission Process
&lt;/h2&gt;

&lt;p&gt;To get your own plugin listed in the official directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Ensure your plugin meets quality and security standards
   ↓
2. Visit https://clau.de/plugin-directory-submission
   ↓
3. Fill out the plugin submission form
   ↓
4. Anthropic review (quality + security)
   ↓
5. Merged into /external_plugins directory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Review criteria&lt;/strong&gt; (inferred from documentation):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plugin must have a complete README and plugin.json&lt;/li&gt;
&lt;li&gt;MCP servers, if used, must be clearly documented&lt;/li&gt;
&lt;li&gt;No malicious code or unauthorized data collection&lt;/li&gt;
&lt;li&gt;Must provide real value, not duplicate existing functionality&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Links &amp;amp; Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/anthropics/claude-plugins-official" rel="noopener noreferrer"&gt;https://github.com/anthropics/claude-plugins-official&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Plugin Development Docs&lt;/strong&gt;: &lt;a href="https://code.claude.com/docs/en/plugins" rel="noopener noreferrer"&gt;code.claude.com/docs/en/plugins&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;External Plugin Submission&lt;/strong&gt;: &lt;a href="https://clau.de/plugin-directory-submission" rel="noopener noreferrer"&gt;clau.de/plugin-directory-submission&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;Reference Implementation&lt;/strong&gt;: &lt;code&gt;/plugins/example-plugin&lt;/code&gt; (best starting point for plugin development)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Target Audience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Power Claude Code users&lt;/strong&gt;: Looking to extend their workflow capabilities with official plugins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin developers&lt;/strong&gt;: Learning how to build spec-compliant Claude Code plugins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Toolchain engineers&lt;/strong&gt;: Connecting their own services to the Claude Code ecosystem via MCP servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engineering productivity leads&lt;/strong&gt;: Building custom team plugins and standardizing development workflows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Plugin ecosystem&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;30+ internal plugins&lt;/strong&gt;: Covering LSP (12 languages), PR review, Git workflows, code quality, output styles, and other core development scenarios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15 external plugins&lt;/strong&gt;: GitHub, Firebase, Linear, Terraform, and other mainstream tools already integrated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified installation&lt;/strong&gt;: One command handles everything — &lt;code&gt;/plugin install &amp;lt;name&amp;gt;@claude-plugins-official&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Plugin specification&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Minimal metadata&lt;/strong&gt;: &lt;code&gt;plugin.json&lt;/code&gt; needs only name, description, and author&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three extension mechanisms&lt;/strong&gt;: Skills (recommended) / Commands (legacy) / MCP Servers (external services)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two trigger modes&lt;/strong&gt;: User-invoked vs. model-auto-triggered based on context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security awareness built in&lt;/strong&gt;: Official docs explicitly note that Anthropic does not vouch for third-party MCP servers — trust verification is the user's responsibility&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Core value of the five key plugins&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pr-review-toolkit&lt;/code&gt;: 6 parallel agents → multi-angle coverage, eliminating single-reviewer blind spots&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agent-sdk-dev&lt;/code&gt;: One-command scaffolding + automatic Verifier → lower barrier to entry for Agent SDK&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;code-review&lt;/code&gt;: Confidence filtering (default threshold 80) → fewer false positives, focus on real issues&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hookify&lt;/code&gt;: Natural language hook creation → no complex JSON config files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;commit-commands&lt;/code&gt;: &lt;code&gt;/commit-push-pr&lt;/code&gt; → full pipeline from code to PR in one command&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  One-Line Review
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;claude-plugins-official is not just a plugin directory — it is Anthropic's public answer to the question "how should AI tools be extended": minimize metadata, maximize compositional flexibility, let Skills appear in the right context automatically rather than forcing users to memorize commands.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>claude</category>
      <category>mcp</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>RAG Series (22): Long Context vs RAG — Do We Even Need RAG?</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Tue, 19 May 2026 02:02:34 +0000</pubDate>
      <link>https://dev.to/wonderlab/rag-series-22-long-context-vs-rag-do-we-even-need-rag-5a8j</link>
      <guid>https://dev.to/wonderlab/rag-series-22-long-context-vs-rag-do-we-even-need-rag-5a8j</guid>
      <description>&lt;h2&gt;
  
  
  A Question Worth Taking Seriously
&lt;/h2&gt;

&lt;p&gt;Gemini 1.5 Pro supports 1 million token context. Claude 3.5 handles 200K tokens. GPT-4 Turbo handles 128K. A small novel fits in context. Some people ask: is RAG still necessary?&lt;/p&gt;

&lt;p&gt;The question deserves a real answer, because it hides a genuine engineering decision: &lt;strong&gt;for a production system, should I use RAG or long context?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Large language model context windows (2024–2025):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Approximate text&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 1.5 Pro&lt;/td&gt;
&lt;td&gt;1,000,000 tokens&lt;/td&gt;
&lt;td&gt;~750,000 words, ~1500 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;200,000 tokens&lt;/td&gt;
&lt;td&gt;~150,000 words, ~300 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4 Turbo&lt;/td&gt;
&lt;td&gt;128,000 tokens&lt;/td&gt;
&lt;td&gt;~96,000 words, ~190 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;128,000 tokens&lt;/td&gt;
&lt;td&gt;~96,000 words, ~190 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This looks like a lot. But how much content does a real knowledge base have?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A mid-sized company's internal documentation: thousands of documents, millions of words&lt;/li&gt;
&lt;li&gt;A large codebase: tens of thousands of files, billions of tokens&lt;/li&gt;
&lt;li&gt;A news or research database: millions of articles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these exceed any model's context window. That is the hard ceiling on long context.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost of Long Context
&lt;/h2&gt;

&lt;p&gt;"Bigger window" doesn't mean "free." Every request processes every token, and the cost is real.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 1: Money
&lt;/h3&gt;

&lt;p&gt;Rough estimates at late 2024 pricing (input tokens):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Price per 1M tokens&lt;/th&gt;
&lt;th&gt;1M token request&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 1.5 Pro&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4 Turbo&lt;/td&gt;
&lt;td&gt;$10.00&lt;/td&gt;
&lt;td&gt;$10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Compare to RAG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval phase: Embedding API only (&amp;lt; $0.001)&lt;/li&gt;
&lt;li&gt;Generation phase: 2,000–5,000 tokens of retrieved context + question (&amp;lt; $0.05)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG can cost 20–200× less than long context for the same question.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At 1,000 user queries per day against an enterprise knowledge base:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long context (1M tokens): ~$1,250/day&lt;/li&gt;
&lt;li&gt;RAG (3K token context): ~$3–15/day&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost 2: Latency
&lt;/h3&gt;

&lt;p&gt;More tokens = slower response. Time to first token (TTFT) grows roughly linearly with input length:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;100K token input → TTFT ~2–5 seconds
1M token input   → TTFT ~15–30 seconds (varies by model and infrastructure)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A conversational application where the user waits 30 seconds before any output is largely unusable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 3: Lost in the Middle
&lt;/h3&gt;

&lt;p&gt;A 2023 Stanford paper "Lost in the Middle" (Liu et al.) found that when relevant information appears in the middle of a long context, LLM recall drops significantly. Information at the beginning or end performs best; information in the middle performs worst.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Position vs. recall (approximate trend):
Beginning (0–10%)    ████████████████ high
Middle (40–60%)      ██████           low
End (90–100%)        ████████████     higher
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stuffing 100 documents into context does not guarantee the model finds the one at position 50.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost of RAG
&lt;/h2&gt;

&lt;p&gt;RAG isn't free either.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 1: Imperfect Retrieval
&lt;/h3&gt;

&lt;p&gt;Vector search is approximate matching — it makes mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;False negatives&lt;/strong&gt;: relevant documents not retrieved. The user's question is semantically distant from the relevant passage; it falls outside the top-k.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;False positives&lt;/strong&gt;: irrelevant documents retrieved. The LLM receives noise, which can cause confusion or hallucination.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the problem that earlier articles in this series addressed: hybrid retrieval, Rerank, HyDE — all of these are patches for retrieval imperfection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 2: Chunking Breaks Context
&lt;/h3&gt;

&lt;p&gt;Chunking splits documents into fragments. Related information can end up in different chunks. A 10-page research report whose conclusion depends on an assumption from page 3 may be split such that only the conclusion chunk is retrieved — the LLM gets the conclusion without the premise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost 3: System Complexity
&lt;/h3&gt;

&lt;p&gt;RAG is an engineering system: vector store + embedding model + retrieval pipeline + update mechanism + evaluation framework. Compared to "send the document to the LLM," it has significantly higher maintenance cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five-Dimension Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Long Context&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document volume ceiling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~10–100 docs (limited by window and cost)&lt;/td&gt;
&lt;td&gt;Unlimited (vector store scales)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (all tokens billed every request)&lt;/td&gt;
&lt;td&gt;Low (only relevant fragments)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (large inputs are slow)&lt;/td&gt;
&lt;td&gt;Low (small inputs are fast)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall completeness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Perfect (everything is present)&lt;/td&gt;
&lt;td&gt;Incomplete (depends on retrieval quality)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge updates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires resending all content&lt;/td&gt;
&lt;td&gt;Only update changed documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Engineering complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low (direct API call)&lt;/td&gt;
&lt;td&gt;High (retrieval pipeline to maintain)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Single-document understanding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong (cross-document reasoning)&lt;/td&gt;
&lt;td&gt;Weaker (affected by chunking)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Neither approach wins on all dimensions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision Framework: Which One?
&lt;/h2&gt;

&lt;p&gt;Four dimensions to locate your scenario:&lt;/p&gt;

&lt;h3&gt;
  
  
  Dimension 1: Document Volume
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt; 50 docs, total &amp;lt; 100K tokens     → consider long context
50–1000 docs                       → evaluate cost, decide
&amp;gt; 1000 docs, or total &amp;gt; 1M tokens  → RAG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dimension 2: Update Frequency
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Static content (monthly updates or less)   → long context acceptable
Dynamic content (daily/hourly updates)     → RAG (incremental indexing is cheap)
Real-time data                             → RAG (or direct API integration)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dimension 3: Query Volume
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;One-time analysis (research, report generation)   → long context
Low-frequency queries (&amp;lt; 100/day)                 → either works
High-frequency queries (&amp;gt; 1000/day)               → RAG (cost differences compound)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dimension 4: Latency Requirements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive Q&amp;amp;A (&amp;lt; 3 second response)   → RAG
Report generation, offline analysis     → long context acceptable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Summary Decision Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use case                       Docs    Updates   Queries   Recommendation
──────────────────────────────────────────────────────────────────────────
Legal contract review (single) small   none      once      Long context
Enterprise knowledge base Q&amp;amp;A  large   frequent  high      RAG
PDF financial report analysis  medium  none      once      Long context
Product documentation chatbot  large   moderate  high      RAG
Codebase understanding         huge    frequent  high      RAG
Meeting notes summary (single) small   none      once      Long context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Hybrid Strategy: Use Both
&lt;/h2&gt;

&lt;p&gt;Long context and RAG are not mutually exclusive. Sometimes the best choice is a combination.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 1: RAG selects documents, long context reads in full
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Step 1: use RAG to find the 3 most relevant documents
&lt;/span&gt;&lt;span class="n"&gt;relevant_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# top-3 documents
&lt;/span&gt;
&lt;span class="c1"&gt;# Step 2: send full documents (not chunks) to the LLM
&lt;/span&gt;&lt;span class="n"&gt;full_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;load_full_doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;full_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;full_docs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: LLM answers based on complete documents
&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer based on the following documents:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;full_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Good fit for&lt;/strong&gt;: large document sets (can't send all), but each document requires complex cross-passage reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 2: Coarse-grained RAG with large chunks
&lt;/h3&gt;

&lt;p&gt;Traditional RAG uses 512–1024 token chunks. With larger windows, you can use 3,000–10,000 token chunks — preserving much more context while still doing retrieval filtering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Split with larger chunks (preserve more context)
&lt;/span&gt;&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# traditional 512 → now 4000 is reasonable
&lt;/span&gt;    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieve fewer chunks since each is larger
&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# 3 × 4000 = 12,000 tokens: precise and context-rich
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strategy 3: Summary cache + precise retrieval
&lt;/h3&gt;

&lt;p&gt;For large document libraries, use the LLM to generate a structured summary for each document; retrieve summaries; load the original passage on demand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pre-processing: generate summaries (one-time)
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this document&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s key points in 3 sentences:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summary_doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;summary_vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_documents&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;summary_doc&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Query time: retrieve summaries, load original passages
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_with_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summary_vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relevant_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nf"&gt;extract_relevant_passage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevant_chunks&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What Actually Changed
&lt;/h2&gt;

&lt;p&gt;The rise of large context windows genuinely shifted some decisions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where RAG was once necessary but now may not be:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understanding documents under 50 pages (just stuff it in — simpler)&lt;/li&gt;
&lt;li&gt;One-time document analysis tasks (not worth building a RAG system)&lt;/li&gt;
&lt;li&gt;Prototype validation (fast idea testing, no need for production-grade RAG)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where RAG is still necessary&lt;/strong&gt; (most production systems):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Knowledge bases with &amp;gt; 1,000 documents&lt;/li&gt;
&lt;li&gt;Frequently updated content&lt;/li&gt;
&lt;li&gt;High concurrency, cost-sensitive deployments&lt;/li&gt;
&lt;li&gt;Attribution requirements (RAG natively knows which document an answer came from)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Large context windows made "skip RAG for simple cases" a reasonable choice. They didn't make RAG obsolete — they made RAG's use case clearer: &lt;strong&gt;when document volume, update frequency, or cost makes "full context" impractical, RAG is irreplaceable.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Long Context&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core strength&lt;/td&gt;
&lt;td&gt;Complete context, cross-document reasoning&lt;/td&gt;
&lt;td&gt;Scalable, low cost, real-time updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Core limitation&lt;/td&gt;
&lt;td&gt;High cost, high latency, hard document ceiling&lt;/td&gt;
&lt;td&gt;Imperfect retrieval, engineering complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Small-scale, one-time deep analysis&lt;/td&gt;
&lt;td&gt;Large-scale production systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trend&lt;/td&gt;
&lt;td&gt;Windows keep growing, costs keep falling&lt;/td&gt;
&lt;td&gt;Retrieval quality keeps improving&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are not competitors — they're complementary tools. Understanding the true cost of each, and choosing the right one, is engineering judgment.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;Lost in the Middle: How Language Models Use Long Contexts (Liu et al., 2023)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2403.05530" rel="noopener noreferrer"&gt;Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/tutorials/rag/" rel="noopener noreferrer"&gt;LangChain Long Context RAG Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>llm</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>RAG Series (21): Performance Optimization — Faster and Cheaper</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Tue, 19 May 2026 02:01:42 +0000</pubDate>
      <link>https://dev.to/wonderlab/rag-series-21-performance-optimization-faster-and-cheaper-1eg6</link>
      <guid>https://dev.to/wonderlab/rag-series-21-performance-optimization-faster-and-cheaper-1eg6</guid>
      <description>&lt;h2&gt;
  
  
  The Cost Structure of RAG
&lt;/h2&gt;

&lt;p&gt;What happens in a single RAG request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. embed(question)          → 1 Embedding API call
2. vectorstore.search()     → vector store retrieval (local, fast)
3. llm.generate(context)    → 1 LLM API call
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At minimum 2 API calls per request. At scale, these compound quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: LLM calls typically 1–10 seconds; Embedding calls 0.1–0.5 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: token-based billing means identical questions pay the same price every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The four optimizations each target a different point in this chain:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;th&gt;What it saves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM response cache&lt;/td&gt;
&lt;td&gt;LLM call&lt;/td&gt;
&lt;td&gt;Skip LLM entirely, 0ms response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding cache&lt;/td&gt;
&lt;td&gt;Embedding call&lt;/td&gt;
&lt;td&gt;No re-embedding for identical text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Cache&lt;/td&gt;
&lt;td&gt;LLM call&lt;/td&gt;
&lt;td&gt;Reuse answers for similar questions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async batch Embedding&lt;/td&gt;
&lt;td&gt;Embedding call&lt;/td&gt;
&lt;td&gt;N serial round-trips → 1 concurrent call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Optimization 1: LLM Response Cache
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Principle&lt;/strong&gt;: A given (prompt, model, temperature) combination always produces a deterministic LLM call. Cache the result on the first call; return it directly on subsequent identical calls — no network request at all.&lt;/p&gt;

&lt;p&gt;LangChain exposes this as a global switch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.globals&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_llm_cache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemoryCache&lt;/span&gt;

&lt;span class="nf"&gt;set_llm_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InMemoryCache&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;   &lt;span class="c1"&gt;# one line, affects all LLM calls
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For persistence across restarts, swap in SQLite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SQLiteCache&lt;/span&gt;
&lt;span class="nf"&gt;set_llm_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SQLiteCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.llm_cache.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;3 questions, each asked twice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Q: What are the four core metrics in RAGAS?
  Cache miss:  1743ms   Cache hit:   0.7ms   Speedup: 2441×

Q: What are the common vector database options?
  Cache miss:  3675ms   Cache hit:   0.9ms   Speedup: 4126×

Q: What is Rerank?
  Cache miss:  9753ms   Cache hit:   0.9ms   Speedup: 10993×

Average: miss=5057ms  hit=0.8ms  speedup=6068×
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hit latency is 0.8ms&lt;/strong&gt; — that's dictionary lookup time, not network latency. On a cache hit, zero network requests are made.&lt;/p&gt;

&lt;p&gt;6000× sounds exaggerated, but this is what "in-memory dict vs. network API call" actually looks like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good fit for&lt;/strong&gt;: FAQ-style Q&amp;amp;A, report generation (user clicks "regenerate" repeatedly), popular questions asked by many users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitation&lt;/strong&gt;: Exact prompt match only. A rephrased question is a cache miss.&lt;/p&gt;




&lt;h2&gt;
  
  
  Optimization 2: Embedding Cache
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Principle&lt;/strong&gt;: The embedding vector for a given text is deterministic (same model + same text = same vector). &lt;code&gt;CacheBackedEmbeddings&lt;/code&gt; wraps a base embeddings object with a ByteStore layer — embed once, serialize and store, read from cache thereafter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_classic.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CacheBackedEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_classic.storage&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemoryByteStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalFileStore&lt;/span&gt;

&lt;span class="c1"&gt;# In-memory (lost on restart)
&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemoryByteStore&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# File-based (persistent across restarts)
# store = LocalFileStore("./embedding_cache/")
&lt;/span&gt;
&lt;span class="n"&gt;cached_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CacheBackedEmbeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;underlying_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;document_embedding_cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EMB_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# isolates cache by model name
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# API identical to regular embeddings
&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cached_embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;namespace=EMB_MODEL&lt;/code&gt; matters: if you switch embedding models, the old cached vectors have a different dimension and distribution. Namespacing by model name prevents the new model from reading stale vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;8 texts, three passes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First index (8 texts, all new):
  285ms   1 API call   8 texts sent

Repeat index (8 texts, all cached):
  5.7ms   0 API calls  0 texts sent

Knowledge base update (6 unchanged + 2 new):
  63.5ms  1 API call   2 texts sent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The third row is the point&lt;/strong&gt;: on a knowledge base update, the 6 unchanged documents are served from cache. Only the 2 new documents trigger an API call. This pairs naturally with the Indexing API from the previous article — content hash tracking identifies which documents need re-indexing; Embedding cache ensures identical content is never re-embedded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good fit for&lt;/strong&gt;: knowledge bases with a large stable core and occasional updates. The more documents, the lower the update frequency, the bigger the benefit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Optimization 3: Semantic Cache
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Principle&lt;/strong&gt;: LLM response cache requires an exact prompt match. Semantic Cache goes further: store historical (question, answer) pairs as vectors; when a new question arrives, run a nearest-neighbor search; if a sufficiently similar historical question is found, return its answer directly — skipping both retrieval and LLM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"What metrics does the RAGAS framework have?"  → miss → LLM generates → stored
"Describe the four core RAGAS metrics"         → vector search → finds above
                                               → similarity ≥ threshold → return cached answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_store&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_answers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;          &lt;span class="c1"&gt;# cache_id → answer
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search_with_relevance_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_answers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cache_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cache_id&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_answers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cache_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Results: Threshold Calibration Is the Hard Part
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Threshold: 0.85

RAGAS group:
  Original:  "What metrics does RAGAS have?"              → miss (3782ms)
  Paraphrase: "Describe the four core RAGAS metrics"      → miss (3298ms) ← expected HIT
  Different:  "How should I choose a vector database?"    → miss (2509ms) ← correct miss

Rerank group:
  Original:  "What role does Rerank play in RAG?"         → miss (11602ms)
  Paraphrase: "Why do RAG systems need re-ranking?"       → miss (3834ms) ← expected HIT
  Different:  "What is hybrid retrieval?"                 → miss (12578ms) ← correct miss

Total hit rate: 0/6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The paraphrases didn't hit the cache. This is not a code bug — &lt;strong&gt;threshold 0.85 is too high for these paraphrase pairs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why: &lt;code&gt;bge-large-zh-v1.5&lt;/code&gt; cosine similarity between these pairs likely falls in the 0.80–0.84 range, just below the threshold. Semantic similarity ≠ high cosine similarity. The mapping depends on the embedding model's representation space and training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct approach&lt;/strong&gt;: calibrate before setting a threshold. Measure the similarity distribution on your actual question samples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Calibration: measure similarity on known similar pairs and known-different pairs
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;numpy.linalg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;norm&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cosine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;similar_pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What RAGAS metrics are there?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List the RAGAS evaluation metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How to choose a vector DB?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Which vector database should I use?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;dissimilar_pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What RAGAS metrics are there?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How to choose a vector DB?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q2&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;similar_pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Similar:    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;cosine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; / &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q2&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dissimilar_pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dissimilar: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;cosine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; / &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set threshold between the two distributions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find a threshold that separates the two distributions. For Chinese Q&amp;amp;A with bge models, 0.80–0.85 is a common starting range — but you must validate on your own data before deploying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real value of Semantic Cache&lt;/strong&gt;: high-volume FAQ systems where users ask the same questions in many different ways (customer service bots, documentation assistants). Potential for large LLM call reduction. But the value is entirely dependent on threshold calibration — it's not a drop-in default.&lt;/p&gt;




&lt;h2&gt;
  
  
  Optimization 4: Async Batch Embedding
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Principle&lt;/strong&gt;: Embedding N texts sequentially = N network round-trips. Embedding N texts in a single batch call = 1 network round-trip, processed in parallel server-side.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;

&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

&lt;span class="c1"&gt;# Sequential (slow): one API call per text
&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Async batch (fast): one API call for all texts
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aembed_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;12 texts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sequential (one by one):    830ms
Async batch (one call):     289ms
Speedup:                    2.87×
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same vectors, 11 fewer network round-trips. Vector agreement &amp;gt; 0.9999 cosine similarity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to apply in the RAG pipeline&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Batch indexing at build time
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_documents_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aembed_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# bulk write to vector store
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;# Concurrent user queries in the service layer
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_batch_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aembed_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;questions&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The more documents, the bigger the gain. Batch documents in chunks of 50–100 during index builds; expect 3–5× speedup over sequential, depending on network latency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Combining All Four Optimizations
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. LLM cache (global, always on)
&lt;/span&gt;&lt;span class="nf"&gt;set_llm_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SQLiteCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.llm_cache.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Embedding cache (wrap the base embeddings)
&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalFileStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./embedding_cache/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CacheBackedEmbeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;underlying_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;document_embedding_cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EMB_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Semantic Cache (check before full pipeline)
&lt;/span&gt;&lt;span class="n"&gt;semantic_cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;YOUR_CALIBRATED_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semantic_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="n"&gt;semantic_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Async for bulk operations
&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aembed_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All four are orthogonal and stackable. Highest-ROI combination: &lt;strong&gt;LLM cache + Embedding cache&lt;/strong&gt; — near-zero implementation cost, should be on by default. &lt;strong&gt;Semantic Cache&lt;/strong&gt; requires calibration but delivers large savings once tuned. &lt;strong&gt;Async batch&lt;/strong&gt; is specifically valuable at index-build time and under high concurrency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=====================================================================
  Optimization Results Summary
=====================================================================

  Optimization             Before          After         Savings
  ─────────────────────────────────────────────────────────────
  LLM response cache       5057ms          0.8ms         99.98%  ✓ strongly recommended
  Embedding cache (rebuild) 285ms          5.7ms         98%     ✓ strongly recommended
  Embedding cache (update)  8 API calls    2 API calls   75%     ✓ strongly recommended
  Semantic Cache (t=0.85)   functional     needs calibr. —       ⚠ calibrate first
  Async batch Embedding     830ms          289ms         65%     ✓ recommended at scale
=====================================================================
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Full Code
&lt;/h2&gt;

&lt;p&gt;Complete code is open-sourced at:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/chendongqi/llm-in-action/tree/main/21-rag-performance" rel="noopener noreferrer"&gt;https://github.com/chendongqi/llm-in-action/tree/main/21-rag-performance&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rag_performance.py&lt;/code&gt; — all four benchmarks with report generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/chendongqi/llm-in-action
&lt;span class="nb"&gt;cd &lt;/span&gt;21-rag-performance
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python rag_performance.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;This article implemented and measured four RAG performance optimizations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM response cache&lt;/strong&gt;: cheapest and highest impact — one line of code, repeated questions go from 5057ms to 0.8ms (6000× speedup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding cache&lt;/strong&gt;: identical text never re-embedded; knowledge base updates only embed changed content (8 calls → 2 calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Cache&lt;/strong&gt;: conceptually correct, but threshold 0.85 produced 0/6 hits in this experiment — threshold calibration is non-optional; measure similarity distribution on real data before setting any value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async batch Embedding&lt;/strong&gt;: 2.87× speedup for 12 texts; benefit grows with document count&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first three optimizations attack the same root problem: &lt;strong&gt;repeated computation is waste&lt;/strong&gt;. The same work shouldn't cost twice. The fourth attacks a different problem: &lt;strong&gt;serial waiting is unnecessary&lt;/strong&gt;. Work that can be parallelized shouldn't be queued.&lt;/p&gt;

&lt;p&gt;Different problems, same goal: making RAG viable in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/how_to/llm_caching/" rel="noopener noreferrer"&gt;LangChain LLM Caching Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/how_to/caching_embeddings/" rel="noopener noreferrer"&gt;CacheBackedEmbeddings Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/concepts/async/" rel="noopener noreferrer"&gt;LangChain Async API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>performance</category>
    </item>
    <item>
      <title>RAG Series (20): Enterprise RAG Architecture</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Tue, 19 May 2026 02:00:40 +0000</pubDate>
      <link>https://dev.to/wonderlab/rag-series-20-enterprise-rag-architecture-4765</link>
      <guid>https://dev.to/wonderlab/rag-series-20-enterprise-rag-architecture-4765</guid>
      <description>&lt;h2&gt;
  
  
  The Gap Between Demo and Production
&lt;/h2&gt;

&lt;p&gt;Every article in this series has shared one architectural assumption: a single vector store, accessible to everyone, returning any document to any user.&lt;/p&gt;

&lt;p&gt;That works in a demo. In an enterprise environment, it breaks immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Company A's documents can be retrieved by Company B's users&lt;/li&gt;
&lt;li&gt;Financial data can be pulled by any employee&lt;/li&gt;
&lt;li&gt;HR policies visible to contractors&lt;/li&gt;
&lt;li&gt;One user hammers the API and takes down the service for everyone else&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Production enterprise RAG needs three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Incoming request
  ↓ rate limit check   — is this user still within quota?
  ↓ cache lookup       — has this question been answered before?
  ↓ tenant routing     — which knowledge base?
  ↓ permission filter  — within that KB, what can this user see?
  ↓ retrieve + generate — answer from authorized content only
  ↓ cache write        — store for next time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This article implements each layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: Multi-Tenancy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strategy: one Qdrant Collection per tenant&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each customer or department gets its own Qdrant Collection. Collections are physically isolated — you can't search &lt;code&gt;acme_corp&lt;/code&gt;'s content by querying &lt;code&gt;globex_corp&lt;/code&gt;, because the two collections are entirely separate vector spaces.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_qdrant&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QdrantVectorStore&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qdrant_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QdrantClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qdrant_client.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VectorParams&lt;/span&gt;

&lt;span class="n"&gt;qdrant_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QdrantClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# production: host="qdrant-server"
&lt;/span&gt;
&lt;span class="n"&gt;tenant_stores&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;QdrantVectorStore&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TENANT_DOCS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;qdrant_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;vectors_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VectorParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Distance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QdrantVectorStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;qdrant_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tenant_stores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Routing is trivial — the request carries a &lt;code&gt;tenant_id&lt;/code&gt;, the service selects the matching store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tenant_stores&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tenant: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tenant_stores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# permission filter added in Layer 2
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why not a shared Collection with a &lt;code&gt;tenant_id&lt;/code&gt; metadata filter?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It works technically, but carries a risk: a filter bug means Tenant A's data leaks to Tenant B. There's no hard boundary. Collection-level isolation also makes teardown clean — removing a tenant means dropping their Collection, with no residue.&lt;/p&gt;

&lt;p&gt;For soft isolation (departments within one company), metadata filtering is fine. For hard isolation (different customers), separate Collections are safer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 2: Access Control
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strategy: documents carry &lt;code&gt;access_level&lt;/code&gt;; retrieval injects a Qdrant filter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each document declares its access level in metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Annual bonus: S-tier 3 months, A-tier 2 months...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr-policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;access_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Robot control system: EtherCAT bus, latency &amp;lt;1ms...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;robot-spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;access_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engineering_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Roles map to the access levels they can see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ROLE_PERMISSIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engineering_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finance_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engineer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engineering_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finance_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;employee&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At retrieval time, the role's allowed levels become a Qdrant &lt;code&gt;MatchAny&lt;/code&gt; filter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qdrant_client.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FieldCondition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MatchAny&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;levels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ROLE_PERMISSIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;access_filter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;must&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;FieldCondition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata.access_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MatchAny&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;any&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;levels&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tenant_stores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;access_filter&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This filter executes at the vector database layer, not the application layer.&lt;/strong&gt; Unauthorized documents never leave the database — they aren't returned to the application, so there's nothing to leak.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 3: Caching
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strategy: &lt;code&gt;(tenant_id, role, question)&lt;/code&gt; as cache key, TTL 300 seconds&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CacheEntry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CacheEntry&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_store&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CacheEntry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Including &lt;code&gt;role&lt;/code&gt; in the cache key matters: an engineer and an HR manager asking the same question get different contexts (different documents pass the permission filter), so they may get different answers. Cache entries are not cross-role reusable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 4: Rate Limiting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strategy: sliding window, 5 requests per user per minute&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_requests&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;window_seconds&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;allow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_log&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_log&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                               &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_window&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_log&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_log&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sliding window vs. fixed window: a fixed window allows bursting at boundaries — a user can send 5 requests at second 59 and 5 more at second 61, sending 10 in 60 seconds. A sliding window enforces the limit across any 60-second interval.&lt;/p&gt;




&lt;h2&gt;
  
  
  Experiment Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario A: Normal Retrieval
&lt;/h3&gt;

&lt;p&gt;Engineer alice queries company info and technical docs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q: What type of company is ACME Corp?
A: ACME Corp is a smart manufacturing company.
Sources: [company-intro, robot-spec]   ← public + engineering docs, correct
elapsed: 995ms

Q: What communication protocol does ACME's robot system use?
A: ACME Corp's robot control system uses the EtherCAT real-time bus.
Sources: [company-intro, robot-spec]   ← engineering doc correctly retrieved
elapsed: 1709ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario B: Permission Filtering
&lt;/h3&gt;

&lt;p&gt;The key thing to read here is the &lt;code&gt;sources&lt;/code&gt; array — not whether &lt;code&gt;docs_retrieved &amp;gt; 0&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[B1] Engineer alice asks about annual bonus (hr_only doc):
  Sources: [company-intro, robot-spec]   ← hr-policy is NOT in sources
  A: The reference material does not contain information about the bonus policy.

[B2] HR bob asks about net profit (finance_only doc):
  Sources: [company-intro, hr-policy]    ← financial-report is NOT in sources
  A: The reference material does not contain ACME's 2025 net profit.

[B3] HR bob asks about annual leave (hr_only doc):
  Sources: [company-intro, hr-policy]    ← hr-policy correctly appears
  A: Year 1: 12 days. Each additional year: +2 days. Maximum: 20 days.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What access control actually looks like in practice&lt;/strong&gt;: &lt;code&gt;hr-policy&lt;/code&gt; never appears in alice's sources list; &lt;code&gt;financial-report&lt;/code&gt; never appears in bob's sources list. The Qdrant filter intercepts these documents at the database layer. The LLM never receives them, so it correctly responds that the information isn't available.&lt;/p&gt;

&lt;p&gt;This is the right behavior: users still get documents they &lt;em&gt;can&lt;/em&gt; access (public + their role-specific docs); only the restricted documents are absent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario C: Tenant Isolation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[C1] Globex user charlie asks about ACME Corp's headcount:
  Tenant: globex_corp
  Sources: [products, company-intro]   ← these are Globex's own docs
  A: The reference material does not contain ACME's employee count.

[C2] Globex user queries their own product lines:
  Sources: [company-intro, products]   ← Globex docs correctly returned
  A: GlexCloud, GlexAnalytics, GlexAI...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Charlie is querying the &lt;code&gt;globex_corp&lt;/code&gt; Collection for ACME Corp information. Of course nothing comes back — ACME's content doesn't physically exist in Globex's Collection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario D: Cache Hit
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;First&lt;/span&gt; &lt;span class="err"&gt;request&lt;/span&gt; &lt;span class="err"&gt;(Scenario&lt;/span&gt; &lt;span class="err"&gt;A1):&lt;/span&gt; &lt;span class="err"&gt;995ms,&lt;/span&gt; &lt;span class="py"&gt;cache_hit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;span class="err"&gt;Same&lt;/span&gt; &lt;span class="err"&gt;question&lt;/span&gt; &lt;span class="py"&gt;repeated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="s"&gt;0ms, cache_hit=true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;0ms means the repeated request skipped both retrieval and LLM generation entirely. For frequently repeated questions — company policy, common workflows, product FAQs — caching compounds quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario E: Rate Limiting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5 req / 60s / user&lt;/span&gt;

&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed&lt;/span&gt;
&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed&lt;/span&gt;
&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed&lt;/span&gt;
&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed&lt;/span&gt;
&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed&lt;/span&gt;
&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RATE LIMITED   ← limit enforced&lt;/span&gt;
&lt;span class="err"&gt;Request&lt;/span&gt; &lt;span class="py"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RATE LIMITED&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rate limiter correctly allowed 5 and blocked 2 out of 7 requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  FastAPI Service Layer
&lt;/h2&gt;

&lt;p&gt;The four layers above are wired together in a single &lt;code&gt;query()&lt;/code&gt; function, then exposed via FastAPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Header&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enterprise RAG Service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;QueryRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;x_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;(...),&lt;/span&gt;     &lt;span class="c1"&gt;# user identity from request header
&lt;/span&gt;    &lt;span class="n"&gt;x_user_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;(...),&lt;/span&gt;   &lt;span class="c1"&gt;# user role from request header
&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x_user_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rate_limited&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Too many requests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_hit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_hit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start with: &lt;code&gt;uvicorn enterprise_rag:app --host 0.0.0.0 --port 8080&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In production, &lt;code&gt;x_user_id&lt;/code&gt; and &lt;code&gt;x_user_role&lt;/code&gt; should come from JWT token decoding, not raw client headers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Upgrade Path
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Demo Implementation&lt;/th&gt;
&lt;th&gt;Production Replacement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qdrant&lt;/td&gt;
&lt;td&gt;&lt;code&gt;:memory:&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dedicated server, &lt;code&gt;host="qdrant-server"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;In-process dict&lt;/td&gt;
&lt;td&gt;Redis (distributed, persistent, TTL native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limiter&lt;/td&gt;
&lt;td&gt;In-process counter&lt;/td&gt;
&lt;td&gt;Redis + sliding-window Lua script (safe across instances)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User identity&lt;/td&gt;
&lt;td&gt;Raw Header&lt;/td&gt;
&lt;td&gt;JWT token decode + signature verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;print()&lt;/td&gt;
&lt;td&gt;Structured logs + alerting on LLM call volume / latency / errors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Full Code
&lt;/h2&gt;

&lt;p&gt;Complete code is open-sourced at:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/chendongqi/llm-in-action/tree/main/20-enterprise-rag" rel="noopener noreferrer"&gt;https://github.com/chendongqi/llm-in-action/tree/main/20-enterprise-rag&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;enterprise_rag.py&lt;/code&gt; — full implementation: multi-tenancy + access control + cache + rate limiting + FastAPI + scenario verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/chendongqi/llm-in-action
&lt;span class="nb"&gt;cd &lt;/span&gt;20-enterprise-rag
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python enterprise_rag.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;This article implemented a four-layer enterprise RAG architecture. Key findings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Collection-level tenant isolation&lt;/strong&gt; — separate Qdrant Collections per tenant provide a physical boundary; metadata filtering alone offers no hard guarantee&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permissions enforced at the DB layer&lt;/strong&gt; — Qdrant's &lt;code&gt;MatchAny&lt;/code&gt; filter means restricted documents never leave the database; there's nothing for the application to leak&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache key must include role&lt;/strong&gt; — same question, different role → different context → potentially different answer; cross-role cache reuse produces wrong results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sliding window beats fixed window&lt;/strong&gt; — eliminates boundary bursting; any 60-second interval is bounded, not just aligned windows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access control is about absence&lt;/strong&gt; — users see the documents they're allowed to see; restricted documents simply don't appear in sources; the LLM correctly reports "no information available" for what it never received&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The gap between a RAG demo and a RAG production system is mostly engineering, not algorithms.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qdrant.tech/documentation/concepts/filtering/" rel="noopener noreferrer"&gt;Qdrant Filtering Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/integrations/vectorstores/qdrant/" rel="noopener noreferrer"&gt;langchain-qdrant Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>ragas</category>
      <category>ai</category>
      <category>qdrant</category>
    </item>
  </channel>
</rss>
