<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nishant prakash</title>
    <description>The latest articles on DEV Community by Nishant prakash (@nishant_prakash_780f5d541).</description>
    <link>https://dev.to/nishant_prakash_780f5d541</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3596122%2Fbf82630b-dedc-4453-8918-a3cc87df539f.jpg</url>
      <title>DEV Community: Nishant prakash</title>
      <link>https://dev.to/nishant_prakash_780f5d541</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nishant_prakash_780f5d541"/>
    <language>en</language>
    <item>
      <title>I Thought TOON Was Hype. Then I Tested It…</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Sat, 22 Nov 2025 19:03:09 +0000</pubDate>
      <link>https://dev.to/nishant_prakash_780f5d541/i-thought-toon-was-hype-then-i-tested-it-3bad</link>
      <guid>https://dev.to/nishant_prakash_780f5d541/i-thought-toon-was-hype-then-i-tested-it-3bad</guid>
      <description>&lt;p&gt;If you’re even slightly deep into the world of agentic AI, you’ve probably seen it: blog after blog mentioning something called &lt;strong&gt;TOON&lt;/strong&gt;. The posts were short. Some offered impressive claims like “cuts JSON token usage in half.” Others just mentioned it as a rising serialization format for LLMs, especially in agent-based workflows.&lt;/p&gt;

&lt;p&gt;Curious but unsatisfied, I decided to take a different route. I didn’t just want to read about TOON, I wanted to build with it. I wanted to &lt;strong&gt;see what made it special&lt;/strong&gt;, how it compares to formats like JSON and CSV, and where it actually shines. And of course, I wanted to share what I found in a way that makes it easier for you to understand too.&lt;/p&gt;

&lt;p&gt;This blog is that story.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wy86wjzz81qt2ebkl4o.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wy86wjzz81qt2ebkl4o.gif" alt="Why waste token" width="480" height="400"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Agentic AI in One Paragraph
&lt;/h2&gt;

&lt;p&gt;If you’re working with agentic AI frameworks like &lt;strong&gt;CrewAI&lt;/strong&gt;, &lt;strong&gt;LangGraph&lt;/strong&gt;, or &lt;strong&gt;AutoGen&lt;/strong&gt;, you already know the basics: agents powered by LLMs collaborate, passing structured information between each other to complete tasks. Think of each agent like a focused worker, one might gather data, another might summarize it, and another might generate recommendations.&lt;/p&gt;

&lt;p&gt;But here’s the catch: &lt;strong&gt;every time they exchange data&lt;/strong&gt;, it costs tokens. And in LLMs, tokens mean time, money, and performance limits.&lt;/p&gt;

&lt;p&gt;That’s where TOON comes in.&lt;/p&gt;




&lt;h2&gt;
  
  
  So... What Is TOON, Really?
&lt;/h2&gt;

&lt;p&gt;TOON stands for &lt;strong&gt;Token-Oriented Object Notation&lt;/strong&gt;. It’s a new serialization format that’s designed specifically for LLM communication. Like JSON, it can represent nested, structured data. But unlike JSON, it avoids all the extra syntax, no curly braces, no repeated field names, and no need for quotes around every string.&lt;/p&gt;

&lt;p&gt;Here’s a side-by-side comparison to make that clearer:&lt;/p&gt;

&lt;h3&gt;
  
  
  A Simple JSON:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Same in TOON:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data[2]{id,name,score}:
  1,Alice,85
  2,Bob,92
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TOON is compact, readable, and tailor-made for LLMs. But I wanted to know: &lt;strong&gt;how much better is it, really?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The POC I Built
&lt;/h2&gt;

&lt;p&gt;Before diving into results, here are the exact two scripts used for benchmarking. These are available publicly so readers can clone and run them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Test 1 (Complex Nested JSON)&lt;/strong&gt;: &lt;a href="https://github.com/trickste/toontoon/blob/main/toon_test_1.py" rel="noopener noreferrer"&gt;https://github.com/trickste/toontoon/blob/main/toon_test_1.py&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test 2 (Flat JSON)&lt;/strong&gt;: &lt;a href="https://github.com/trickste/toontoon/blob/main/toon_test_2.py" rel="noopener noreferrer"&gt;https://github.com/trickste/toontoon/blob/main/toon_test_2.py&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These scripts serialize the same dataset into different formats and measure how many tokens each representation consumes when passed through an LLM.&lt;/p&gt;

&lt;p&gt;To really understand TOON, I built a &lt;strong&gt;two-agent proof of concept&lt;/strong&gt; powered by &lt;strong&gt;Ollama (LLaMA 3.1)&lt;/strong&gt; running locally. The idea was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent 1&lt;/strong&gt; (Data Generator): produces structured data in Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent 2&lt;/strong&gt; (LLM Analyzer): consumes that data in various formats and explains it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here’s the twist: I fed the data into Agent 2 using &lt;strong&gt;six different serialization formats&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON&lt;/li&gt;
&lt;li&gt;JSON (Compact)&lt;/li&gt;
&lt;li&gt;YAML&lt;/li&gt;
&lt;li&gt;XML&lt;/li&gt;
&lt;li&gt;CSV&lt;/li&gt;
&lt;li&gt;TOON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And for each one, I measured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many &lt;strong&gt;tokens&lt;/strong&gt; the format used&lt;/li&gt;
&lt;li&gt;Whether the LLM could interpret it clearly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How the Data Looks Across Formats
&lt;/h2&gt;

&lt;p&gt;Below are simplified examples to help you visualize how &lt;strong&gt;flat&lt;/strong&gt; and &lt;strong&gt;nested&lt;/strong&gt; JSON convert into TOON.&lt;/p&gt;




&lt;h3&gt;
  
  
  Flat JSON Example
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charlie"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data[3]{id,name,score}:
  1,Alice,85
  2,Bob,92
  3,Charlie,78
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Complex Nested JSON Example
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"contact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"phone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123-456"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"math"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"science"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"projects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AI Lab"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Robotics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"contact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bob@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"phone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"789-012"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"physics"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"projects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Quantum Lab"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data[2]{id,name,score,contact,tags,projects}:
  1,Alice,85,{'email': 'alice@example.com', 'phone': '123-456'},['math','science'],[{'title':'AI Lab','year':2022},{'title':'Robotics','year':2023}]
  2,Bob,92,{'email': 'bob@example.com', 'phone': '789-012'},['physics'],[{'title':'Quantum Lab','year':2021}]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where things get interesting. You can &lt;em&gt;see&lt;/em&gt; immediately that for simple rows, TOON is incredibly compact... but as soon as nested structures appear, TOON begins embedding Python-like lists and dictionaries inside its rows.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results: Token Counts by Format
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test 1: Complex Nested Data (contacts, tags, projects)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Token Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;118&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON Compact&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YAML&lt;/td&gt;
&lt;td&gt;135&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;td&gt;178&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;203&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Test 2: Simple Flat Data (id, name, score)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Token Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON Compact&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YAML&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🧠 Key Takeaway: CSV Isn’t Always Enough
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Wait... CSV is cheaper than TOON in both cases?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes, but there’s a catch.&lt;/p&gt;

&lt;p&gt;CSV is great for flat data. But as soon as your data includes &lt;strong&gt;nested objects&lt;/strong&gt;, &lt;strong&gt;lists&lt;/strong&gt;, or &lt;strong&gt;hierarchies&lt;/strong&gt;, CSV becomes... lossy. You end up flattening objects into strings. You lose type safety. You lose clarity.&lt;/p&gt;

&lt;p&gt;In contrast, &lt;strong&gt;TOON maintains full structure&lt;/strong&gt;, while being much leaner than JSON.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt; repeats every key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XML&lt;/strong&gt; doubles them with open and close tags&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YAML&lt;/strong&gt; saves space but isn’t built for tabular rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON&lt;/strong&gt; gives you schema + data, with nearly CSV-level efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why TOON is rising fast in agentic tools. It’s &lt;strong&gt;not about beating CSV&lt;/strong&gt;, but about &lt;strong&gt;replacing JSON&lt;/strong&gt; in places where token count actually matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned (And What You Can Try)
&lt;/h2&gt;

&lt;p&gt;This POC taught me far more than just how different serialization formats compare. It revealed how LLMs &lt;em&gt;actually behave&lt;/em&gt; when they are fed structured data, and why some formats may be popular even when they don’t always win the token race.&lt;/p&gt;

&lt;p&gt;Here are the real lessons that stood out.&lt;/p&gt;

&lt;h3&gt;
  
  
  TOON shines when structure matters
&lt;/h3&gt;

&lt;p&gt;If your data has any level of hierarchy, TOON keeps the meaning intact while still being much more compact than JSON. It lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preserve nested objects&lt;/li&gt;
&lt;li&gt;keep column-like readability&lt;/li&gt;
&lt;li&gt;maintain type hints&lt;/li&gt;
&lt;li&gt;avoid repeated keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This balance explains why TOON is becoming a popular choice in agentic systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  CSV gives the fewest tokens, but at a real cost
&lt;/h3&gt;

&lt;p&gt;CSV had the lowest token counts in both tests. That part is true.&lt;/p&gt;

&lt;p&gt;But CSV comes with limitations that make it hard to use in multi-agent AI workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it cannot represent nested structures&lt;/li&gt;
&lt;li&gt;lists and objects get flattened into strings&lt;/li&gt;
&lt;li&gt;type information is lost&lt;/li&gt;
&lt;li&gt;you cannot reliably reconstruct the original JSON&lt;/li&gt;
&lt;li&gt;LLMs sometimes misinterpret flattened mixed content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So CSV wins in raw token count, but loses the moment your agents need anything more than simple rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Compact is a practical middle ground
&lt;/h3&gt;

&lt;p&gt;Compact JSON performed far better than formatted JSON. It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keeps full structure&lt;/li&gt;
&lt;li&gt;removes all whitespace&lt;/li&gt;
&lt;li&gt;tokenizes efficiently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is still more verbose than TOON for flat data, but more predictable for nested data.&lt;/p&gt;

&lt;h3&gt;
  
  
  TOON is optimized for LLM clarity, not just token savings
&lt;/h3&gt;

&lt;p&gt;When LLMs read TOON, the schema line gives them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;number of items&lt;/li&gt;
&lt;li&gt;field names&lt;/li&gt;
&lt;li&gt;expected structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps models interpret the data more reliably than CSV or XML.&lt;/p&gt;

&lt;p&gt;In real agent workflows, clarity often beats a small token savings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nested data changes everything
&lt;/h3&gt;

&lt;p&gt;In the nested test, TOON did not produce the smallest token count because the inline lists and dicts added overhead. This showed a valuable insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TOON works best for uniform row-like structures. For deep nesting, compact JSON may be more efficient.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is important because it means serialization choices should match your data shape, not simply rely on trends.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real takeaway
&lt;/h3&gt;

&lt;p&gt;Each format has strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSV&lt;/strong&gt;: smallest tokens, weakest structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON&lt;/strong&gt;: great structure-to-token ratio, very LLM friendly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON Compact&lt;/strong&gt;: predictable, solid for nested data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YAML&lt;/strong&gt;: readable, moderate token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XML&lt;/strong&gt;: expressive but token-heavy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the biggest thing we learned:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TOON is becoming popular not because it always produces the smallest token count, but because it gives you structure, compactness, and LLM-friendly clarity in one place.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If your agents need to share structured data and reason about it reliably, TOON is a strong contender.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools I Used
&lt;/h2&gt;

&lt;p&gt;Here’s the stack I used for this POC:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.10+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; running &lt;strong&gt;LLaMA3:latest&lt;/strong&gt; locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tiktoken&lt;/strong&gt; for token counting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;python-toon&lt;/strong&gt; (fallback inline encoder included)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV / JSON / YAML / XML&lt;/strong&gt; modules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agents are simple but modular, you can replace Agent 1 with an API call or Agent 2 with a more sophisticated analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This POC showed that serialization shapes how agents think. TOON isn’t always the smallest, but it delivers a strong balance of structure and clarity for agentic workflows. Test different formats with your own data and see what your LLM responds to best.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Was Done Getting Answers - So I Built RAG That Asks Questions Too</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Wed, 12 Nov 2025 18:12:44 +0000</pubDate>
      <link>https://dev.to/nishant_prakash_780f5d541/i-was-done-getting-answers-so-i-built-rag-that-asks-questions-too-1c2e</link>
      <guid>https://dev.to/nishant_prakash_780f5d541/i-was-done-getting-answers-so-i-built-rag-that-asks-questions-too-1c2e</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“What if a RAG system could not only fetch information but also reason about it, critique itself, and write a report, all autonomously?”&lt;/em&gt;&lt;br&gt;
That question sent me down a rabbit hole that ended with &lt;strong&gt;Data-Inspector&lt;/strong&gt;, a proof-of-concept &lt;strong&gt;Agentic RAG&lt;/strong&gt; pipeline built with Ollama, LangChain, Tavily, and Streamlit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cdjseraqpstwz83wplc.jpeg" alt="Rag was not enough" width="800" height="578"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Spark — From RAG to &lt;em&gt;Agentic&lt;/em&gt; RAG
&lt;/h2&gt;

&lt;p&gt;Traditional RAG systems are brilliant at &lt;em&gt;retrieval and response&lt;/em&gt;, but not at &lt;em&gt;reasoning or reflection.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They typically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve documents relevant to a query.&lt;/li&gt;
&lt;li&gt;Feed them to a large language model (LLM).&lt;/li&gt;
&lt;li&gt;Generate an answer that sounds confident, even when it’s wrong.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model reads, but it doesn’t &lt;em&gt;think.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I began wondering: what if we could &lt;strong&gt;assign roles&lt;/strong&gt; inside the RAG flow?&lt;br&gt;
One agent fetches data, another summarizes, another synthesizes, another critiques, like a research team working in harmony.&lt;/p&gt;

&lt;p&gt;That’s how &lt;strong&gt;Data-Inspector&lt;/strong&gt; was born, a system that doesn’t just “search and answer,” but “reads, reasons, and reviews.”&lt;/p&gt;


&lt;h2&gt;
  
  
  What Exactly &lt;em&gt;Is&lt;/em&gt; Agentic RAG?
&lt;/h2&gt;

&lt;p&gt;Before diving into code, let’s unpack what this buzzword really means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic RAG (Retrieval-Augmented Generation)&lt;/strong&gt; is an evolution of the classic RAG pipeline.&lt;br&gt;
While traditional RAG enhances an LLM with external knowledge, &lt;strong&gt;Agentic RAG gives that process a mind of its own.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  From Static Pipelines to Autonomous Reasoners
&lt;/h3&gt;

&lt;p&gt;In standard RAG, you have a single, linear pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieve → Generate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s powerful but static, there’s no reflection, no iteration, and no specialization.&lt;/p&gt;

&lt;p&gt;Agentic RAG transforms this static chain into a &lt;strong&gt;network of intelligent roles&lt;/strong&gt;, each responsible for one cognitive task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieve → Understand → Synthesize → Critique → Generate → (Loop back if needed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every role acts as an &lt;em&gt;agent&lt;/em&gt;, capable of reasoning over its inputs, producing structured outputs, and handing them off to the next stage.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Key Principles Behind Agentic RAG
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Role-based Autonomy&lt;/strong&gt;
Each agent (retriever, summarizer, critic, etc.) has a clearly defined job and communicates via structured data (JSON, markdown).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This modularity allows independent improvement of each skill, like retraining just your summarizer agent for better factual grounding.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Reflection Loops&lt;/strong&gt;
Agentic systems don’t stop at the first output. They evaluate and refine.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This is what turns a “talkative assistant” into a “thoughtful collaborator.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dynamic Knowledge Access&lt;/strong&gt;&lt;br&gt;
Instead of relying only on a static vector database, agentic systems can trigger live searches, query APIs, or even plan multi-step reasoning chains.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transparency &amp;amp; Explainability&lt;/strong&gt;&lt;br&gt;
Each stage produces interpretable intermediate artifacts, summaries, reviews, critique logs, making the system auditable and debuggable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Common Architectures of Agentic RAG
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architecture Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Planner–Executor Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A planning agent decomposes a task, executors handle retrieval and summarization.&lt;/td&gt;
&lt;td&gt;Workflow orchestration in research assistants.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Critic–Refiner Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The system critiques its own output and regenerates it.&lt;/td&gt;
&lt;td&gt;Self-RAG, Self-Refine, Reflexion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-Agent Collaboration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple specialized agents work in a pipeline, passing structured outputs downstream.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Data-Inspector&lt;/strong&gt;😛&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The approach I took, &lt;strong&gt;multi-agent collaboration&lt;/strong&gt;, felt the most natural.&lt;br&gt;
Each Python class became a self-contained professional: retriever, summarizer, synthesizer, and critic, all orchestrated by a pipeline.&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture Overview — A RAG System with Personality
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Data-Inspector/
├── agents/
│   ├── retriever.py       # Retrieval
│   ├── summarizer.py      # Summarization
│   ├── synthesizer.py     # Knowledge fusion
│   └── critic.py          # Review / Reflection
├── rag/
│   ├── chunker.py         # Document processing
│   └── vectorstore.py     # Vector memory (optional)
├── pipeline.py            # Agentic orchestration
└── ui_streamlit.py        # Interactive interface
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each component acts like a neuron in a cognitive system, independent yet collaborative.&lt;/p&gt;


&lt;h2&gt;
  
  
  Retrieval — Learning to Find Relevant Knowledge
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Retriever&lt;/strong&gt; is powered by the Tavily API. It’s the system’s scout, locating relevant information for the query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_urls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])[:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_sources&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unlike traditional RAG’s static embeddings, this retrieves &lt;strong&gt;live knowledge&lt;/strong&gt;, keeping the system temporally aware and factually updated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chunking — Learning to Read Like a Human
&lt;/h2&gt;

&lt;p&gt;HTML pages are noisy. The &lt;code&gt;chunker.py&lt;/code&gt; module cleans and splits them into coherent text segments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;prepare_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_html&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking long text into overlapping chunks lets the summarizer think locally while preserving context globally, just like a human scanning through paragraphs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summarization — Turning Reading Into Understanding
&lt;/h2&gt;

&lt;p&gt;Each chunk passes through the &lt;strong&gt;SummarizerAgent&lt;/strong&gt;, guided by a structured system prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSTEM_SUMMARIZER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a precise technical summarizer...
Return JSON with: key_points[], methods, evidence[], limitations[]
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RAG vs Fine-Tuning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"key_points"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"RAG adapts faster"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fine-tuning offers deeper control"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"limitations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Depends on retrieval quality"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All agents speak in JSON, a shared language that prevents context drift and ensures machine-readable collaboration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Synthesis — Connecting the Dots
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;SynthesisAgent&lt;/strong&gt; merges multiple summaries into a unified comparative analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;synthesize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bulletized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key_points&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;System:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bulletized&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the model evolves from “reader” to “analyst,” forming relationships between insights and organizing them logically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Critique — Giving the System a Conscience
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;CriticAgent&lt;/strong&gt; inspects the synthesized narrative and calls out weak logic or missing perspectives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;System:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;SYNTHESIS:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"missing_perspectives"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Data bias"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"weak_arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Unsupported claims about fine-tuning benefits"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overall_risk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reflective loop transforms a basic RAG pipeline into a &lt;strong&gt;self-aware reasoning system.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Report Generation — From Thought to Thesis
&lt;/h2&gt;

&lt;p&gt;Finally, all insights are compiled into a Markdown report via &lt;code&gt;pipeline.py&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;report_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
System:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_REPORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

User: Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
SYNTHESIS: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
CRITIC REVIEW: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Write final report in Markdown.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;report_md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;report_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result reads like an academic mini-paper:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Executive summary&lt;/li&gt;
&lt;li&gt;Comparative analysis&lt;/li&gt;
&lt;li&gt;Decision framework&lt;/li&gt;
&lt;li&gt;Risks and gaps&lt;/li&gt;
&lt;li&gt;References&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system doesn’t just compute, it &lt;em&gt;articulates&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agentic RAG Outperforms Traditional RAG
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Traditional RAG&lt;/th&gt;
&lt;th&gt;Agentic RAG (Data-Inspector)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single linear chain&lt;/td&gt;
&lt;td&gt;Multi-agent collaboration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Learning Behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieval + generation only&lt;/td&gt;
&lt;td&gt;Retrieval + reasoning + reflection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error Handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None — one-shot generation&lt;/td&gt;
&lt;td&gt;Built-in self-critique loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Explainability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opaque output&lt;/td&gt;
&lt;td&gt;Transparent intermediate JSONs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adaptability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Static embeddings&lt;/td&gt;
&lt;td&gt;Dynamic web retrieval + modular agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output Depth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fluent but shallow&lt;/td&gt;
&lt;td&gt;Analytical, reference-backed synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Agentic RAG = Traditional RAG + Cognition.&lt;/strong&gt;&lt;br&gt;
It elevates retrieval-augmented generation into &lt;em&gt;reason-augmented generation.&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompts are contracts.&lt;/strong&gt;
Each agent must have a clear, bounded responsibility, otherwise, outputs collapse into noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy is discipline disguised as freedom.&lt;/strong&gt;
Structured interaction enables creativity without chaos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critique breeds truth.&lt;/strong&gt;
The CriticAgent was the breakthrough, the moment the system began questioning itself, quality skyrocketed.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;Agentic RAG hints at a future where models won’t just &lt;em&gt;generate answers&lt;/em&gt; but will &lt;em&gt;collaborate intelligently.&lt;/em&gt;&lt;br&gt;
When Data-Inspector finished its first report, it didn’t feel like I’d run code, it felt like I’d led a discussion with a team of invisible colleagues.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore the Project
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/trickste/data-inspector" rel="noopener noreferrer"&gt;Data-Inspector — Agentic RAG Demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run it locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
streamlit run app/ui_streamlit.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Reflection
&lt;/h2&gt;

&lt;p&gt;What began as a question: &lt;em&gt;“Can RAG think critically?”&lt;/em&gt; -&amp;gt; evolved into an experiment in &lt;strong&gt;digital reasoning.&lt;/strong&gt;&lt;br&gt;
And maybe that’s the trajectory AI will take next:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;from systems that &lt;em&gt;answer questions&lt;/em&gt; to systems that &lt;em&gt;question their own answers.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>No OpenAI API? No Problem. Build RAG Locally with Ollama and FastAPI.</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Thu, 06 Nov 2025 14:09:22 +0000</pubDate>
      <link>https://dev.to/nishant_prakash_780f5d541/no-openai-api-no-problem-build-rag-locally-with-ollama-and-fastapi-5g3c</link>
      <guid>https://dev.to/nishant_prakash_780f5d541/no-openai-api-no-problem-build-rag-locally-with-ollama-and-fastapi-5g3c</guid>
      <description>&lt;p&gt;I built a &lt;strong&gt;fully local Retrieval-Augmented Generation (RAG)&lt;/strong&gt; system that lets a &lt;strong&gt;Llama 3 model&lt;/strong&gt; answer questions about my own PDFs and Markdown files, no cloud APIs, no external servers, all running on my machine.&lt;/p&gt;

&lt;p&gt;It’s powered by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streamlit&lt;/strong&gt; for the frontend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; for the backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChromaDB&lt;/strong&gt; for vector storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; to run Llama 3 locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system ingests documents, chunks and embeds them, retrieves relevant parts for a query, and feeds them into Llama 3 to generate grounded answers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz1umdeesz0cgu9jcvsr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz1umdeesz0cgu9jcvsr.jpg" alt="You might need some help" width="800" height="999"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Introduction – Why Go Local?
&lt;/h2&gt;

&lt;p&gt;It started with a simple frustration, I had a bunch of private PDFs and notes I wanted to query like ChatGPT, but without sending anything to the cloud.&lt;br&gt;
LLMs are powerful, but they &lt;strong&gt;don’t know your documents&lt;/strong&gt;. RAG changes that, it gives the model a “working memory” by feeding it relevant chunks of your data each time you ask a question.&lt;/p&gt;

&lt;p&gt;So, I decided to build my own mini-ChatGPT for personal docs, everything self-hosted, modular, and transparent.&lt;br&gt;
The goals were simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload docs → Ask questions → Get answers &lt;strong&gt;with citations&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Stay &lt;strong&gt;offline&lt;/strong&gt; and &lt;strong&gt;private&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Learn the &lt;strong&gt;moving parts&lt;/strong&gt; of a RAG pipeline by hand, not just through frameworks like LangChain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project Repository&lt;/strong&gt;: &lt;a href="https://github.com/trickste/raga" rel="noopener noreferrer"&gt;github.com/trickste/raga&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  🏗 Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The setup runs in two clear phases: &lt;strong&gt;ingestion&lt;/strong&gt; and &lt;strong&gt;querying&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf9sek1j7itbue2j7g2b.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf9sek1j7itbue2j7g2b.jpg" alt="RAG Flow Diagram" width="721" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Streamlit UI&lt;/strong&gt; lets users upload documents and ask questions interactively.&lt;br&gt;
The &lt;strong&gt;FastAPI backend&lt;/strong&gt; handles everything else, text extraction, embedding, search, and invoking the LLM.&lt;/p&gt;

&lt;p&gt;This separation makes it easy to debug, extend, and even swap components later (e.g., replace Chroma with Pinecone, or Llama 3 with Mistral).&lt;/p&gt;


&lt;h2&gt;
  
  
  Code Walkthrough
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Ingestion – Reading and Chunking Documents
&lt;/h3&gt;

&lt;p&gt;The ingestion pipeline starts by reading PDFs or Markdown files and turning them into clean text chunks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# via PyMuPDF
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# split into ~500-char chunks
&lt;/span&gt;    &lt;span class="n"&gt;num_added&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;num_added&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chunking was trickier than expected.&lt;br&gt;
Too long, and the model forgets details. Too short, and it loses context.&lt;br&gt;
After experimenting, I settled around &lt;strong&gt;300–500 words per chunk&lt;/strong&gt; with slight overlaps to maintain continuity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;OVERLAP&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That overlap (~10–15%) turned out to be a small tweak that made a big difference in retrieval quality.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Vector Storage – Semantic Search with ChromaDB
&lt;/h3&gt;

&lt;p&gt;Each chunk is embedded using &lt;strong&gt;SentenceTransformers (all-MiniLM-L6-v2)&lt;/strong&gt; and stored in a local ChromaDB collection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PersistentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langrag_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each document is now represented as a &lt;strong&gt;vector&lt;/strong&gt; in a 384-dimensional space.&lt;br&gt;
When a query comes in, we embed the question and retrieve the &lt;strong&gt;top-k most similar chunks&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retrieved_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the “retrieval” in Retrieval-Augmented Generation happens.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Querying – Feeding the Context to the LLM
&lt;/h3&gt;

&lt;p&gt;Once we have the most relevant chunks, we build a &lt;strong&gt;grounded prompt&lt;/strong&gt; and send it to &lt;strong&gt;Llama 3&lt;/strong&gt; via &lt;strong&gt;Ollama&lt;/strong&gt;’s local API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a document QA assistant. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer strictly using only the CONTEXT below.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTEXT:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;QUESTION: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt discipline was &lt;strong&gt;critical&lt;/strong&gt;.&lt;br&gt;
If you don’t tell the model to stick to the context, it will happily hallucinate.&lt;br&gt;
By enforcing “If it’s not in the context, say you don’t know,” we drastically improved trustworthiness.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. The FastAPI Backend – Tying It All Together
&lt;/h3&gt;

&lt;p&gt;FastAPI glues it all: ingestion, querying, and LLM invocation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/add_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(...)):&lt;/span&gt;
    &lt;span class="n"&gt;file_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;num_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Added &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunks from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;QueryRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This clean separation of endpoints made debugging painless.&lt;br&gt;
Every step logs info: embeddings added, chunks retrieved, distances scored, and LLM latency.&lt;/p&gt;


&lt;h3&gt;
  
  
  5. Streamlit Frontend – A Minimal, Interactive UI
&lt;/h3&gt;

&lt;p&gt;The frontend makes it fun, drag and drop a file, type a question, and watch it respond.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file_uploader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Upload PDF or Markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask a question about your documents:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get Answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**Answer:**&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s only ~30 lines of Streamlit, but it transforms the project into a usable app.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Chunking Is an Art
&lt;/h3&gt;

&lt;p&gt;Small, overlapping chunks worked better than large ones.&lt;br&gt;
Breaking text at &lt;strong&gt;semantic boundaries&lt;/strong&gt; (paragraphs or sections) gave cleaner retrievals.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Quality of Retrieval Beats Quantity
&lt;/h3&gt;

&lt;p&gt;Feeding too many chunks diluted answers.&lt;br&gt;
3 relevant chunks &amp;gt; 10 vague ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Prompt Grounding Changes Everything
&lt;/h3&gt;

&lt;p&gt;Explicitly instructing the LLM not to make things up was the single most effective fix for hallucination.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Local Models Are Ready for Prime Time
&lt;/h3&gt;

&lt;p&gt;Running &lt;strong&gt;Llama 3 via Ollama&lt;/strong&gt; felt just as smooth as using an API, but faster, cheaper, and private.&lt;br&gt;
And yes, no API keys or rate limits, WOOHOO !!!&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Observe Everything
&lt;/h3&gt;

&lt;p&gt;Logging every stage, chunk sizes, retrieval scores, final prompts, made debugging feel scientific rather than guesswork.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion – My Own “ChatGPT for PDFs”
&lt;/h2&gt;

&lt;p&gt;At the end of this build, I had a working &lt;strong&gt;Local RAG Assistant&lt;/strong&gt;, a tiny offline system that could read, index, and reason about my documents.&lt;br&gt;
It runs entirely on my laptop, keeps my data private, and helped me deeply understand &lt;strong&gt;how modern LLM pipelines actually work&lt;/strong&gt; under the hood.&lt;/p&gt;

&lt;p&gt;There’s plenty of room to grow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add &lt;strong&gt;source citations&lt;/strong&gt; in answers.&lt;/li&gt;
&lt;li&gt;Support more file types (Word, HTML, etc.).&lt;/li&gt;
&lt;li&gt;Experiment with &lt;strong&gt;different embedding models&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Add caching or user authentication for a real-world app.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But most importantly, it taught me how retrieval, embeddings, and prompt engineering combine to make language models truly useful.&lt;/p&gt;

&lt;p&gt;If you’ve been thinking about building something similar, &lt;strong&gt;just start&lt;/strong&gt;.&lt;br&gt;
It’s incredibly rewarding to see your own files talk back intelligently.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Happy building and may your vectors always find their nearest neighbors.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>openai</category>
      <category>ai</category>
      <category>mlops</category>
      <category>python</category>
    </item>
    <item>
      <title>Teaching AI to Take Initiative – Building a Self-Thinking App with LangGraph and Ollama</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Tue, 04 Nov 2025 16:38:38 +0000</pubDate>
      <link>https://dev.to/nishant_prakash_780f5d541/teaching-ai-to-take-initiative-building-a-self-thinking-app-with-langgraph-and-ollama-2h9j</link>
      <guid>https://dev.to/nishant_prakash_780f5d541/teaching-ai-to-take-initiative-building-a-self-thinking-app-with-langgraph-and-ollama-2h9j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction – When AI Becomes a Recruiter
&lt;/h2&gt;

&lt;p&gt;Imagine uploading a job description and a candidate’s resume, and an AI instantly evaluates the match, gives ATS scores, writes tailored feedback, and even generates interview questions.&lt;br&gt;
No spreadsheets. No manual parsing. Just structured, explainable intelligence.&lt;/p&gt;

&lt;p&gt;That’s what I set out to build an &lt;strong&gt;Agentic AI Resume Evaluator&lt;/strong&gt; powered by &lt;strong&gt;LangGraph&lt;/strong&gt;, &lt;strong&gt;Ollama&lt;/strong&gt;, and &lt;strong&gt;FastAPI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Repository:&lt;/strong&gt; &lt;a href="https://github.com/trickste/ntrvur" rel="noopener noreferrer"&gt;github.com/trickste/ntrvur&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this blog, I’ll walk through how I stitched together an end-to-end agentic system that autonomously reasons, uses tools, and collaborates with sub-agents to produce actionable hiring insights.&lt;/p&gt;

&lt;p&gt;By the end, you’ll see how agentic architectures, local LLMs, and good old Python orchestration come together to make AI do work, not just talk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpeh9cjkx4ydjj3bm0ina.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpeh9cjkx4ydjj3bm0ina.jpg" alt="Agent?Really?" width="800" height="882"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What Is Agentic AI, Really?
&lt;/h2&gt;

&lt;p&gt;Agentic AI means giving AI the ability to think, decide, and act toward a goal, much like a human agent.&lt;br&gt;
Instead of “just predicting text,” the AI plans its steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“Extract candidate details → Compute ATS score → Evaluate → Review → Merge → Return final report.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; – for specific actions (e.g., ATS computation, text extraction).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory &amp;amp; State&lt;/strong&gt; – to carry information between steps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reasoning Loop&lt;/strong&gt; – to know when to invoke which tool.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where frameworks like LangChain and its orchestration layer LangGraph shine, they let you turn an LLM into a multi-step problem solver.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tech Stack – Tools Behind the Magic
&lt;/h2&gt;

&lt;p&gt;Before diving into code, let's introduce the key players in this adventure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Builds and executes the agent workflow as a directed graph. Each node performs a task (ATS, experience extraction, etc.).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain (Community)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provides integrations and structured tool definitions (like &lt;code&gt;StructuredTool.from_function&lt;/code&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ollama&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs local LLMs (like &lt;code&gt;llama3&lt;/code&gt; or &lt;code&gt;phi3&lt;/code&gt;) offline, via &lt;code&gt;ollama.Client&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FastAPI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wraps everything in a clean REST API (&lt;code&gt;/api/evaluate&lt;/code&gt;) for users to upload JDs &amp;amp; resumes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;scikit-learn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handles ATS scoring via TF-IDF cosine similarity.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PyMuPDF (fitz)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extracts text from PDF resumes reliably.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With this stack, I had the ingredients for an AI that could think (LLM via Ollama), remember/learn (via LangChain’s memory or retrieval components), act (use tools or retrieve info, potentially via MCP or LangChain tools), and be deployed (FastAPI serving a LangGraph-managed workflow). Now it was time to design the brain of the operation – the agent’s architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing an Agentic Workflow (Architecture)
&lt;/h2&gt;

&lt;p&gt;The system runs like a small team of AI specialists working in sync. Once a JD and resume are uploaded, FastAPI triggers a LangGraph workflow, a series of smart nodes that extract the candidate’s name, compute the ATS score, and infer years of experience. A local Ollama LLM then wraps these findings into structured JSON.&lt;/p&gt;

&lt;p&gt;From there, the Evaluator Agent analyzes the match and drafts interview questions, while the Reviewer Agent critiques and improves them. A final Synthesizer merges both perspectives into one coherent report, a clean orchestration of reasoning, review, and refinement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2xyg61n1gcgcemxthcd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2xyg61n1gcgcemxthcd.jpg" alt="Flow Architecture" width="701" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each part operates independently but shares structured state through LangGraph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s unpack the pieces.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Agentic Evaluation with LangGraph
&lt;/h3&gt;

&lt;p&gt;LangGraph was the backbone of my orchestration.&lt;br&gt;
I defined a &lt;code&gt;StateGraph&lt;/code&gt; with typed state fields using &lt;strong&gt;Pydantic&lt;/strong&gt;, and created sequential nodes that act like tools in a workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agentic_evaluator.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Nodes as Actions
&lt;/h4&gt;

&lt;p&gt;Each node modifies and returns the updated state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ats_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ats_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ATS] → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;experience_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;experience_extractor_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Experience] → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, a small &lt;strong&gt;LLM finalizer&lt;/strong&gt; via Ollama neatly packages all extracted values as JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;llm_finalize_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Given this data, return valid JSON:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangGraph made it effortless to visualize this reasoning loop as a flowchart.&lt;br&gt;
Each node is deterministic, while the last one invokes reasoning, the true “thinking” step.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Evaluator Agent – LLM as the Hiring Analyst
&lt;/h3&gt;

&lt;p&gt;Once the structured MCP (Model Context Protocol)-like output was ready, I passed it into an &lt;strong&gt;Evaluator Agent&lt;/strong&gt;, an Ollama-powered LLM acting as a hiring analyst.&lt;/p&gt;

&lt;p&gt;It uses templated system and user prompts stored as Markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/prompts/evaluator_system.md
app/prompts/evaluator_user.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt strictly enforces a JSON schema with ATS score, resume feedback, and 10–15 interview questions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaLLM&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This agent reads the JD and resume, scores the match, and auto-generates a structured evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Reviewer Agent – The Senior Interviewer
&lt;/h3&gt;

&lt;p&gt;Next, the &lt;strong&gt;Reviewer Agent&lt;/strong&gt; steps in, another Ollama instance prompted with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The JD and resume text&lt;/li&gt;
&lt;li&gt;Candidate experience level&lt;/li&gt;
&lt;li&gt;The evaluator’s generated questions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its role: review the quality of those questions, detect missing areas, and produce a refined list.&lt;br&gt;
This agent embodies &lt;strong&gt;human-in-the-loop reasoning&lt;/strong&gt;, automated quality control.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;review_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;years_of_experience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;years_of_experience&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output includes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"QUESTION_REVIEW"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"missing_topics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"CI/CD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Cloud Costing"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"improvement_suggestions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Add a behavioral question"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"UPDATED_QUESTIONS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Explain blue/green deployments?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How do you manage secrets in CI/CD?"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Synthesizer – The Final Merge
&lt;/h3&gt;

&lt;p&gt;Finally, I wrote a &lt;code&gt;merge_evaluator_and_reviewer&lt;/code&gt; function to combine both agents’ results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge_evaluator_and_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evaluator_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer_json&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;candidate_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evaluator_json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evaluator_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;candidate_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;updated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;candidate_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RESUME_FEEDBACK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RESUME_FEEDBACK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FINAL_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromkeys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INTERVIEW_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;reviewer_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATED_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;)),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QUESTIONS_REVIEW_SUMMARY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reviewer_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QUESTION_REVIEW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;updated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a &lt;strong&gt;rich, structured payload&lt;/strong&gt; ready for dashboards or downstream workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: FastAPI – Making It Real
&lt;/h3&gt;

&lt;p&gt;To expose the system, I used FastAPI for a clean HTTP interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/evaluate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EvaluateResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_pdf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;jd_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;jd_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resume_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;resume_pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resume_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resume_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;mcp_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agentic_evaluation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;eval_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mcp_output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;review_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INTERVIEW_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;mcp_output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;merge_evaluator_and_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eval_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty? Upload a JD and a resume, and within seconds you get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"payload"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"John Doe"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ATS_SCORE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;84.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"RESUME_FEEDBACK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Python"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Strong"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Docker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Moderate"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"FINAL_QUESTIONS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Explain container orchestration?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Describe CI/CD best practices?"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Agentic AI is orchestration, not magic.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Breaking down tasks into nodes and state transitions makes the system explainable and debuggable. LangGraph turns “AI reasoning” into visual, modular code.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Local LLMs are surprisingly capable.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Running &lt;code&gt;llama3&lt;/code&gt; and &lt;code&gt;phi3&lt;/code&gt; locally through Ollama gave me full control no latency, no API keys. And switching models was one-line config.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Strict JSON prompts save hours.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Enforcing schema at the prompt level eliminated parsing headaches. (Still, I built &lt;code&gt;coerce_json()&lt;/code&gt; to auto-clean malformed outputs.)&lt;/p&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;Agents need reviewers too.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Having a second LLM review the first’s work brought real reliability a taste of collaborative multi-agent systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;FastAPI makes deployment feel native.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Integrating async routes and file uploads made serving AI models feel as easy as any microservice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion – Agentic AI for Real-World Automation
&lt;/h3&gt;

&lt;p&gt;This project taught me that &lt;strong&gt;Agentic AI isn’t just theory&lt;/strong&gt;, it’s a practical blueprint for intelligent systems.&lt;br&gt;
By combining &lt;strong&gt;LangGraph&lt;/strong&gt; (structured reasoning), &lt;strong&gt;Ollama&lt;/strong&gt; (local intelligence), and &lt;strong&gt;FastAPI&lt;/strong&gt; (deployment glue), I built a hiring-assistant AI that’s fast, explainable, and private.&lt;/p&gt;

&lt;p&gt;But more than that, it changed how I think about AI apps, not as “chatbots,” but as &lt;strong&gt;collaborative systems of agents&lt;/strong&gt;, each with a role, a goal, and a reasoning process.&lt;/p&gt;

&lt;p&gt;If you’re looking to dive deeper, here are the docs that shaped my journey:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/langgraph" rel="noopener noreferrer"&gt;LangGraph Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ollama.ai" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; – Run local LLMs easily&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs" rel="noopener noreferrer"&gt;LangChain Tools and Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fastapi.tiangolo.com" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; – Lightning-fast web APIs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; – Emerging standard for LLM-tool integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Repo Snapshot
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/
├── core/config.py
├── routers/evaluate.py
├── services/
│   ├── agentic_evaluator.py
│   ├── evaluator_agent.py
│   ├── reviewer_agent.py
│   ├── synthesizer_agent.py
│   └── ollama_client.py
├── prompts/
│   ├── evaluator_system.md
│   ├── evaluator_user.md
│   ├── reviewer_system.md
│   └── reviewer_user.md
└── utils/json_safety.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Final Thought:&lt;/strong&gt;&lt;br&gt;
Agentic AI is not about creating a single genius model, it’s about creating a &lt;em&gt;team&lt;/em&gt; of smart, cooperative components.&lt;br&gt;
And once they start working together you realize, automation can actually &lt;em&gt;think&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Happy building, fellow tinkerers!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>python</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
