<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simran Shaikh</title>
    <description>The latest articles on DEV Community by Simran Shaikh (@simranshaikh20_50).</description>
    <link>https://dev.to/simranshaikh20_50</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3468700%2Fe6dd9e59-6abe-4982-bc8c-7b4fb376005b.jpeg</url>
      <title>DEV Community: Simran Shaikh</title>
      <link>https://dev.to/simranshaikh20_50</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/simranshaikh20_50"/>
    <language>en</language>
    <item>
      <title>BugWhisperer: How I Finally Finished My Abandoned GitHub Issue Analyzer (8 Months Later) with GitHub Copilot</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Fri, 29 May 2026 04:37:57 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/bugwhisperer-how-i-finally-finished-my-abandoned-github-issue-analyzer-8-months-later-with-4ll8</link>
      <guid>https://dev.to/simranshaikh20_50/bugwhisperer-how-i-finally-finished-my-abandoned-github-issue-analyzer-8-months-later-with-4ll8</guid>
      <description>&lt;h2&gt;
  
  
  My Submission
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh20/BugWhisperer" rel="noopener noreferrer"&gt;https://github.com/SimranShaikh20/BugWhisperer&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://bugwhisperer.msusimran20.workers.dev" rel="noopener noreferrer"&gt;https://bugwhisperer.msusimran20.workers.dev&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Project I Abandoned — September 2025
&lt;/h2&gt;

&lt;p&gt;Eight months ago I had a problem.&lt;/p&gt;

&lt;p&gt;Our dev team had 60+ open GitHub issues across 3 repos. Nobody knew what to fix first. Every sprint planning meeting turned into a 45-minute debate about priority. I thought: what if a script could just read all the issues and tell us what is most critical?&lt;/p&gt;

&lt;p&gt;So I started building. Here is the entire codebase from that day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# TODO: fix this later
&lt;/span&gt;&lt;span class="n"&gt;GITHUB_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;put_your_token_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_issues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# this doesnt work properly
&lt;/span&gt;    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.github.com/repos/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# just printing for now
&lt;/span&gt;    &lt;span class="c1"&gt;# TODO: parse response properly
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_issue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# wanted to use openai here but ran out of time
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;facebook/react&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# hardcoded lol
&lt;/span&gt;    &lt;span class="nf"&gt;get_issues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# analyze_issue() # commented out, broken
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes. That is it. A &lt;code&gt;print(r)&lt;/code&gt; that prints the raw response object. A function called &lt;code&gt;analyze_issue&lt;/code&gt; that literally does nothing. A hardcoded repo URL.&lt;/p&gt;

&lt;p&gt;My commit message on September 15, 2025:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"initial attempt - giving up for now, too complicated"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that was it. The repo sat there for 8 months untouched.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuu3ffvqtfjvl9wk45k9m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuu3ffvqtfjvl9wk45k9m.png" alt="Add link" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p6mahfcpytypug4p8rc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p6mahfcpytypug4p8rc.png" alt="Sprint" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Finally Came Back
&lt;/h2&gt;

&lt;p&gt;When I saw the GitHub Finish-Up-A-Thon challenge, this project was the first thing that came to mind. The idea was always solid. The problem was real. I just never had the right tools or the time to push through the hard parts.&lt;/p&gt;

&lt;p&gt;This time I had &lt;strong&gt;GitHub Copilot&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built — The After
&lt;/h2&gt;

&lt;p&gt;BugWhisperer is now a full AI-powered GitHub Issue Command Center.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Paste any GitHub repo URL → Get instant AI triage of every open issue in seconds.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is everything it does now:&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Analysis for Every Issue
&lt;/h3&gt;

&lt;p&gt;Every open issue gets analyzed by Groq's Llama 3.1 AI and returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Root Cause&lt;/strong&gt; — what is likely causing this issue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suggested Fix&lt;/strong&gt; — concrete actionable solution in plain English&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt; — Low / Medium / High&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Priority&lt;/strong&gt; — Low / Medium / High / Critical&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Kanban Priority Board
&lt;/h3&gt;

&lt;p&gt;Instead of a boring list, issues are sorted into a 4-column visual board:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔴 Critical — fix immediately&lt;/li&gt;
&lt;li&gt;🟠 High — this sprint&lt;/li&gt;
&lt;li&gt;🟡 Medium — next sprint&lt;/li&gt;
&lt;li&gt;🟢 Low — backlog&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI Sprint Planner
&lt;/h3&gt;

&lt;p&gt;One click generates a complete 2-week sprint plan with time estimates, recommended team size, and a week-by-week breakdown.&lt;/p&gt;

&lt;h3&gt;
  
  
  Export to Markdown
&lt;/h3&gt;

&lt;p&gt;Download the entire analysis as a &lt;code&gt;.md&lt;/code&gt; file — paste it into your GitHub Wiki, Notion, or Linear instantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Post Analysis to GitHub
&lt;/h3&gt;

&lt;p&gt;Post the AI analysis directly as a formatted comment on any GitHub issue — no copy-pasting, no leaving the app.&lt;/p&gt;




&lt;h2&gt;
  
  
  How GitHub Copilot Made This Possible
&lt;/h2&gt;

&lt;p&gt;This is the honest story of where Copilot actually helped me — not a vague "it was amazing" but the specific moments where it unblocked me.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moment 1: Understanding My Own Broken Code
&lt;/h3&gt;

&lt;p&gt;I opened the old &lt;code&gt;main.py&lt;/code&gt; in VS Code with Copilot and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What is this code trying to do and what is broken?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Copilot told me immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No authentication headers on the GitHub API call (hence the silent failures I was getting 8 months ago)&lt;/li&gt;
&lt;li&gt;Response was never parsed — &lt;code&gt;print(r)&lt;/code&gt; just prints the response object, not the data&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;analyze_issue&lt;/code&gt; was completely empty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It then suggested the complete fixed GitHub API call with proper authentication, pagination, and error handling. What I could not figure out in a week 8 months ago, Copilot explained and fixed in 2 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moment 2: Reliable JSON from an LLM
&lt;/h3&gt;

&lt;p&gt;The hardest technical problem was getting structured JSON output from an AI model reliably. Every time I tried, the model would add markdown fences around the JSON, add explanation text, or sometimes just break the format entirely.&lt;/p&gt;

&lt;p&gt;I described the problem to Copilot and it wrote this system prompt pattern that solved it completely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;senior&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;software&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;engineer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;analyzing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;GitHub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;issues.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Respond&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ONLY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;format.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;No&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;text.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;No&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;markdown.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;No&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;explanation.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Just&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON.&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"root_cause"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suggested_fix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"complexity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Low or Medium or High"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Low or Medium or High or Critical"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key Copilot taught me: "No other text. No markdown. No explanation." is far more reliable than just saying "respond in JSON format." That single insight saved me probably 3 hours of prompt debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moment 3: The Kanban Board Component
&lt;/h3&gt;

&lt;p&gt;I had never built a Kanban board before. I asked Copilot:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Write a React component that takes an array of issues each with a priority field and displays them in 4 columns: Critical, High, Medium, Low"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It wrote the entire working component in one response. I just connected my data to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moment 4: The Sprint Planner Prompt
&lt;/h3&gt;

&lt;p&gt;I described what I wanted — take all analyzed issues and generate a 2-week sprint plan. Copilot wrote the complete AI prompt, the API call, the JSON parsing, and even suggested adding a &lt;code&gt;team_size_recommended&lt;/code&gt; field that I had not thought of. That one suggestion made the feature significantly more useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;User Input (GitHub URL)
        ↓
Cloudflare Worker
        ↓
GitHub REST API → Fetch open issues (authenticated)
        ↓
Groq API (Llama 3.1 8b instant) → Analyze each issue
        ↓
Returns: root_cause, suggested_fix, complexity, priority
        ↓
React Frontend → Kanban Board
        ↓
Optional: Sprint Planner (second Groq call)
Optional: Export Markdown report
Optional: Post to GitHub as comment
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React + TanStack + Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Cloudflare Workers (serverless)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Groq API — llama-3.1-8b-instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;td&gt;GitHub REST API v3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Cloudflare Workers (free tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Groq and Not OpenAI?
&lt;/h3&gt;

&lt;p&gt;Speed and cost. Groq's inference is genuinely fast — analyzing 10 issues takes about 8 seconds total. The free tier gives 14,400 requests per day which is more than enough. OpenAI costs money. Groq is free. For a developer tool that should be accessible to everyone, free wins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Cloudflare Workers?
&lt;/h3&gt;

&lt;p&gt;Lovable generates TanStack projects configured for Cloudflare Workers by default. The global edge deployment means the app loads fast anywhere in the world. And the free tier covers 100,000 requests per day which is more than enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Before vs After Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;September 2025&lt;/th&gt;
&lt;th&gt;June 2026&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code&lt;/td&gt;
&lt;td&gt;47 lines of broken Python&lt;/td&gt;
&lt;td&gt;Full React + Cloudflare app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Terminal only&lt;/td&gt;
&lt;td&gt;Beautiful dark web interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pass&lt;/code&gt; — literally empty&lt;/td&gt;
&lt;td&gt;Groq Llama 3.1 analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;td&gt;Hardcoded &lt;code&gt;facebook/react&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Any public repo URL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analysis&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Root cause, fix, complexity, priority&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Planning&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;AI 2-week sprint planner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Export&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;One-click markdown report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Never ran successfully&lt;/td&gt;
&lt;td&gt;Live at workers.dev&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$0 (did nothing)&lt;/td&gt;
&lt;td&gt;$0 (all free APIs)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. GitHub Copilot is best for bridging knowledge gaps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I did not know how to build a Kanban board. I did not know the best prompt pattern to force structured JSON from an LLM. I did not know how Cloudflare Workers reads environment variables differently from Node.js. Copilot filled every one of these gaps instantly — not by writing the whole app for me, but by answering the exact question I was stuck on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Old ideas are often good ideas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My September 2025 script had the right idea. The problem was real. The solution direction was correct. It just needed time, better tools, and a reason to push through the hard parts. Do not delete your old projects — they often contain your best thinking from a time when you were closest to the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Constraints force better design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using only free APIs forced me to be efficient. Limiting to 300 tokens per analysis and using the fastest available model made the app feel instant. If I had unlimited budget I might have built something slower and more expensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The finish line matters more than the plan&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BugWhisperer v2 looks nothing like what I imagined when I wrote that Python script in September 2025. It is a web app, not a CLI script. It uses Groq, not OpenAI. It runs on Cloudflare, not my laptop. Every single implementation detail changed. But the core idea — help developers understand their GitHub issues faster — stayed exactly the same. Ship the idea, not the plan.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;👉 &lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://bugwhisperer.msusimran20.workers.dev" rel="noopener noreferrer"&gt;https://bugwhisperer.msusimran20.workers.dev&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Test it with any of these repos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://github.com/fastapi/fastapi&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://github.com/requests/requests&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://github.com/psf/black&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Or any public GitHub repo you are working on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh20/BugWhisperer" rel="noopener noreferrer"&gt;https://github.com/SimranShaikh20/BugWhisperer&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Private repo support (user provides their own token)&lt;/li&gt;
&lt;li&gt;GitHub Actions integration — auto-analyze on new issue creation&lt;/li&gt;
&lt;li&gt;Slack notifications for Critical priority issues&lt;/li&gt;
&lt;li&gt;VS Code extension&lt;/li&gt;
&lt;li&gt;Multi-repo comparison&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If this project helped you think differently about your own abandoned side projects, drop a reaction — it genuinely helps this submission and motivates me to keep building.&lt;/p&gt;

&lt;p&gt;And if you have an unfinished project sitting somewhere, this challenge is your sign to finally ship it. The idea you abandoned is probably better than you remember. 🚀&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://dev.to/challenges/github-finish-up-a-thon"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Powered by Groq AI + GitHub Copilot + Cloudflare Workers&lt;/em&gt;&lt;/p&gt;

</description>
      <category>githubchallenge</category>
      <category>githubcopilot</category>
      <category>webdev</category>
      <category>devchallenge</category>
    </item>
    <item>
      <title>I built an open-source AI agent that explains any ML model in plain English — real SHAP, real LIME, real bias detection</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Mon, 18 May 2026 03:39:22 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/i-built-an-open-source-ai-agent-that-explains-any-ml-model-in-plain-english-real-shap-real-lime-33ig</link>
      <guid>https://dev.to/simranshaikh20_50/i-built-an-open-source-ai-agent-that-explains-any-ml-model-in-plain-english-real-shap-real-lime-33ig</guid>
      <description>&lt;h2&gt;
  
  
  The problem I kept running into
&lt;/h2&gt;

&lt;p&gt;Every time I finished training a model, the same conversation happened:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Manager: "Why did it predict that?"&lt;br&gt;
Me: &lt;em&gt;opens SHAP plot&lt;/em&gt;&lt;br&gt;
Manager: &lt;em&gt;glazed eyes&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;SHAP and LIME are powerful — but they output numbers and plots that &lt;br&gt;
only data scientists can read. Nobody builds the bridge to plain English. &lt;br&gt;
Nobody automates the bias check. Nobody generates a report your legal &lt;br&gt;
team can actually use.&lt;/p&gt;

&lt;p&gt;So I built XAI-Agent to do all of that — powered by Hermes Agent's &lt;br&gt;
autonomous multi-step planning pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Upload any trained ML model (.pkl) + dataset (.csv) → &lt;br&gt;
Hermes Agent runs 5 tools autonomously → &lt;br&gt;
You get a full plain-English explainability report in under 3 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 5-step Hermes Agent pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;file_reader&lt;/code&gt; — loads model, auto-detects task type, picks right explainer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shap_analyzer&lt;/code&gt; — runs real SHAP, ranks all features by impact + direction&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lime_explainer&lt;/code&gt; — explains 3 individual predictions in plain English&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bias_checker&lt;/code&gt; — scans for demographic features, flags disparities&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;report_writer&lt;/code&gt; — writes structured Markdown report, downloadable instantly&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What makes this genuinely agentic
&lt;/h2&gt;

&lt;p&gt;Context flows between all 5 tools. The model type from Step 1 &lt;br&gt;
determines which SHAP explainer Step 2 uses. The feature ranking &lt;br&gt;
from Step 2 informs Step 3's LIME analysis. The bias verdict from &lt;br&gt;
Step 4 shapes Step 5's recommendations.&lt;/p&gt;

&lt;p&gt;It also handles a real edge case most tutorials miss: newer SHAP &lt;br&gt;
versions return 3D arrays &lt;code&gt;(samples, features, classes)&lt;/code&gt; instead of 2D. &lt;br&gt;
The agent detects this automatically and slices correctly — &lt;br&gt;
a bug that breaks every naive SHAP implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample output
&lt;/h2&gt;

&lt;p&gt;Running on the breast cancer dataset (569 patients, 30 features):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Executive Summary (auto-generated):&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This RandomForestClassifier was analyzed across 569 samples and &lt;br&gt;
30 features. The most influential predictor is 'worst area'. &lt;br&gt;
No demographic bias was detected.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;SHAP top features:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;worst area — 0.0756 — ↑ increases malignancy prediction&lt;/li&gt;
&lt;li&gt;worst concave points — 0.0538 — ↑ increases malignancy prediction
&lt;/li&gt;
&lt;li&gt;mean concave points — 0.0503 — ↑ increases malignancy prediction&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Prediction explained in plain English:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Row 0 — Predicted benign at 94% confidence.&lt;br&gt;
'worst area' was well below the malignancy threshold &lt;br&gt;
(impact: −0.141). 'worst concave points' also supported &lt;br&gt;
benign classification (impact: −0.089).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why this matters beyond the challenge
&lt;/h2&gt;

&lt;p&gt;EU AI Act requires explainability for high-risk AI systems. &lt;br&gt;
GDPR gives citizens the right to explanation for automated decisions. &lt;br&gt;
US financial regulators require adverse action explanations for &lt;br&gt;
ML credit scoring.&lt;/p&gt;

&lt;p&gt;Existing tools (Fiddler, Arize, Arthur AI) cost $50K+/year. &lt;br&gt;
XAI-Agent is free, open-source, runs locally, works in 3 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Hermes Agent (autonomous multi-step planning)&lt;/li&gt;
&lt;li&gt;SHAP + LIME (real explainability — not simulated)&lt;/li&gt;
&lt;li&gt;Streamlit (UI)&lt;/li&gt;
&lt;li&gt;scikit-learn, XGBoost, LightGBM&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/SimranShaikh20/xai-agent" rel="noopener noreferrer"&gt;https://github.com/SimranShaikh20/xai-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;git clone &lt;a href="https://github.com/SimranShaikh20/xai-agent" rel="noopener noreferrer"&gt;https://github.com/SimranShaikh20/xai-agent&lt;/a&gt;&lt;br&gt;
pip install -r requirements.txt&lt;br&gt;
streamlit run app.py&lt;/p&gt;

&lt;p&gt;Test files (sample_model.pkl + sample_dataset.csv) included — &lt;br&gt;
runs in 3 minutes with zero extra setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What model would YOU run this on first?&lt;/strong&gt; Drop it in the comments 👇&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>I Built an XAI Agent with Hermes Agent That Explains Any ML Model in Plain English — Here's Everything I Learned</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Mon, 18 May 2026 03:39:17 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/i-built-an-xai-agent-with-hermes-agent-that-explains-any-ml-model-in-plain-english-heres-3dfp</link>
      <guid>https://dev.to/simranshaikh20_50/i-built-an-xai-agent-with-hermes-agent-that-explains-any-ml-model-in-plain-english-heres-3dfp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I spent a weekend building XAI-Agent — an autonomous Hermes Agent pipeline that runs real SHAP + LIME analysis on any ML model and generates a plain-English explainability report. This post is everything I learned: how Hermes Agent's multi-step planning actually works, where it surprised me, where it frustrated me, and why I think it's the most underrated open-source agent framework right now.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Conversation That Started This
&lt;/h2&gt;

&lt;p&gt;Three weeks ago I was in a meeting presenting a RandomForest model I'd trained to predict customer churn. The model was good — 91% accuracy, solid precision-recall curve. I was proud of it.&lt;/p&gt;

&lt;p&gt;Then our Head of Product asked: &lt;em&gt;"But why does it predict churn for this specific customer?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I opened my Jupyter notebook. Showed the SHAP waterfall plot.&lt;/p&gt;

&lt;p&gt;She stared at it for five seconds.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Can you just... tell me in normal words?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That moment broke something in my brain. I'd spent three days training the model and thirty minutes on explainability. The explainability was useless to the person who needed it most.&lt;/p&gt;

&lt;p&gt;So I built XAI-Agent. And in doing so, I learned more about Hermes Agent than I expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Hermes Agent, Actually?
&lt;/h2&gt;

&lt;p&gt;Before I get into what I built, let me give you the honest explanation I wish I'd had when I started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hermes Agent is an open-source agentic framework built for multi-step autonomous task execution.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That sentence has a lot of words. Here's what it actually means in practice:&lt;/p&gt;

&lt;p&gt;Most "AI" tools you interact with are single-shot. You send a prompt, you get a response. Done. The model doesn't remember what it did, doesn't use the output of one step as input to the next, and doesn't make decisions about &lt;em&gt;how&lt;/em&gt; to approach a problem.&lt;/p&gt;

&lt;p&gt;Hermes Agent does all three of those things.&lt;/p&gt;

&lt;p&gt;It gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A planning loop&lt;/strong&gt; — the agent breaks down a task into steps before executing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool use&lt;/strong&gt; — distinct callable functions the agent can invoke in sequence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context persistence&lt;/strong&gt; — output from Tool 1 is available to Tool 2, 3, 4, and 5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous decision-making&lt;/strong&gt; — the agent decides &lt;em&gt;which&lt;/em&gt; tool to use and &lt;em&gt;when&lt;/em&gt;, based on what it finds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This sounds abstract. Let me make it concrete with what I actually built.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem I Was Solving (And Why It's Bigger Than You Think)
&lt;/h2&gt;

&lt;p&gt;Explainable AI (XAI) has a dirty secret: &lt;strong&gt;the tools exist, but the workflow is broken.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SHAP and LIME are genuinely powerful libraries. But using them requires:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Writing custom Python code for each model type&lt;/li&gt;
&lt;li&gt;Knowing which explainer to use (TreeExplainer? KernelExplainer? DeepExplainer?)&lt;/li&gt;
&lt;li&gt;Interpreting the numerical output yourself&lt;/li&gt;
&lt;li&gt;Translating that into language a non-technical person understands&lt;/li&gt;
&lt;li&gt;Running a separate bias audit&lt;/li&gt;
&lt;li&gt;Writing a report that combines all of this&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's 4-6 hours of work per model, requiring a data scientist to babysit every step.&lt;/p&gt;

&lt;p&gt;And this isn't a niche problem. The &lt;strong&gt;EU AI Act&lt;/strong&gt; requires explainability for high-risk AI. &lt;strong&gt;GDPR Article 22&lt;/strong&gt; gives EU citizens the right to explanation for automated decisions. &lt;strong&gt;US financial regulators&lt;/strong&gt; require banks to explain ML-based credit decisions.&lt;/p&gt;

&lt;p&gt;The demand for XAI is going to explode over the next 3 years. The tools to deliver it efficiently don't exist yet.&lt;/p&gt;

&lt;p&gt;That's the gap I built XAI-Agent to fill.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Used Hermes Agent to Build It
&lt;/h2&gt;

&lt;p&gt;Here's the part I want to go deep on — because understanding &lt;em&gt;how&lt;/em&gt; Hermes Agent enabled this architecture is the most useful thing I can share.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Insight: Tools Are Not Functions
&lt;/h3&gt;

&lt;p&gt;When I first read about Hermes Agent's tool use, I thought of tools as just... functions. Like, the agent calls &lt;code&gt;shap_analyze()&lt;/code&gt; and gets back a result.&lt;/p&gt;

&lt;p&gt;That mental model is wrong, and it held me back for half a day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools in Hermes Agent are autonomous units of responsibility.&lt;/strong&gt; Each tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Has a single, clear job&lt;/li&gt;
&lt;li&gt;Can access shared agent state (what previous tools discovered)&lt;/li&gt;
&lt;li&gt;Makes decisions based on that state&lt;/li&gt;
&lt;li&gt;Produces output that enriches the shared state for future tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference sounds subtle. The impact on architecture is enormous.&lt;/p&gt;

&lt;p&gt;Here's what my 5-tool pipeline looks like and, more importantly, &lt;em&gt;why&lt;/em&gt; it's designed this way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HermesXAIAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Hermes Agent implementation — 5 autonomous tools
    Each tool feeds context into the next
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_file_reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_col&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        TOOL 1: Inspection

        This isn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t just &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;load the files&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. The tool makes decisions:
        - What type of model is this? (affects every downstream tool)
        - What is the task? (classification vs regression)  
        - Is there class imbalance? (affects which samples LIME picks)
        - Which SHAP explainer should Tool 2 use?

        All of this gets stored in self.results for Tools 2-5 to use.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;model_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;  &lt;span class="c1"&gt;# e.g. "RandomForestClassifier"
&lt;/span&gt;
        &lt;span class="c1"&gt;# This decision propagates through the entire pipeline
&lt;/span&gt;        &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nunique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;regression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_type&lt;/span&gt;  &lt;span class="c1"&gt;# Tool 2 reads this
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;               &lt;span class="c1"&gt;# Tools 3, 4, 5 read this
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_shap_analyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        TOOL 2: Global Explainability

        Uses model_type from Tool 1 to auto-select the right explainer.
        A naive implementation would hardcode TreeExplainer.
        This tool makes an intelligent decision.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;model_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Read from Tool 1
&lt;/span&gt;
        &lt;span class="n"&gt;TREE_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XGBClassifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;model_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TREE_MODELS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# TreeExplainer: fast, exact, works with tree structure
&lt;/span&gt;            &lt;span class="n"&gt;explainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TreeExplainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# KernelExplainer: slower, universal fallback
&lt;/span&gt;            &lt;span class="n"&gt;explainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;KernelExplainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle SHAP version incompatibility — this one took me 2 hours
&lt;/span&gt;        &lt;span class="n"&gt;shap_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;explainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shap_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_sample&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;shap_vals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;           &lt;span class="c1"&gt;# Old SHAP: list per class
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;shap_vals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="c1"&gt;# New SHAP: 3D array
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;shap_vals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;             &lt;span class="c1"&gt;# Regression: use as-is
&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shap_vals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_vals&lt;/span&gt;  &lt;span class="c1"&gt;# Tool 5 uses this
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;imp_df&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;importance_df&lt;/span&gt; &lt;span class="c1"&gt;# Tools 3, 5 use this
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See how Tool 2 reads &lt;code&gt;model_type&lt;/code&gt; from Tool 1's output? And produces &lt;code&gt;imp_df&lt;/code&gt; that Tool 5 will use? &lt;strong&gt;That's the Hermes Agent planning loop in action.&lt;/strong&gt; It's not a chain of independent function calls — it's a coherent reasoning process where every step builds on what came before.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Part That Surprised Me: Fallback Planning
&lt;/h3&gt;

&lt;p&gt;One of the things that makes Hermes Agent genuinely useful in production is that it handles failure gracefully within the planning loop.&lt;/p&gt;

&lt;p&gt;In my &lt;code&gt;tool_shap_analyzer&lt;/code&gt;, if the primary explainer fails (say, the model doesn't support TreeExplainer), the agent doesn't crash and show an error. It falls back to KernelExplainer, logs which method it used, and continues to Tool 3.&lt;/p&gt;

&lt;p&gt;This sounds simple. But it means the agent's output is always a complete report — not a half-finished analysis with an error message in the middle.&lt;/p&gt;

&lt;p&gt;That's the difference between a demo and a tool people actually use.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Part That Frustrated Me: Context Size
&lt;/h3&gt;

&lt;p&gt;Here's something nobody tells you about agentic pipelines: &lt;strong&gt;context accumulates fast&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By the time I got to Tool 5 (report_writer), my agent state contained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The full model object&lt;/li&gt;
&lt;li&gt;The original dataset (569 rows × 31 columns)&lt;/li&gt;
&lt;li&gt;The SHAP values array (150 samples × 30 features)&lt;/li&gt;
&lt;li&gt;Three LIME explanation objects&lt;/li&gt;
&lt;li&gt;Feature importance DataFrames&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a lot of state to pass around. For my use case (Streamlit + local execution) it was fine. But if you're building a Hermes Agent pipeline for production use with large datasets, you need to think carefully about what you store in &lt;code&gt;self.results&lt;/code&gt; vs. what you compute on demand.&lt;/p&gt;

&lt;p&gt;My solution: I store only what downstream tools need. The raw SHAP array (large) gets converted to a summary DataFrame (small) before storage. The LIME explanation objects get converted to plain lists immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Technical Deep Dive: SHAP + Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Let me go deeper on the SHAP integration because this is where the agentic approach really pays off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why SHAP Alone Isn't Enough
&lt;/h3&gt;

&lt;p&gt;SHAP gives you this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;worst area                  0.0756
worst concave points        0.0538
mean concave points         0.0503
worst perimeter             0.0489
worst radius                0.0401
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's useful to me. It's useless to my Head of Product.&lt;/p&gt;

&lt;p&gt;What she needs is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The size of the tumor at its largest measurement ('worst area') is the single most important factor in this model's predictions. When this measurement is high, the model is significantly more likely to predict malignancy. Think of it as the model's primary red flag."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Hermes Agent pipeline handles this translation. Tool 2 generates the numbers. Tool 5 converts them to prose using model-aware language. The agent knows it's a medical dataset (from the feature names) and calibrates its language accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 3D SHAP Array Bug That Will Break Your Code
&lt;/h3&gt;

&lt;p&gt;I want to specifically call this out because it's a real issue that hit me hard and I've seen it break other implementations.&lt;/p&gt;

&lt;p&gt;SHAP's output format changed between versions. Older SHAP returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old SHAP (&amp;lt; 0.40): list of arrays, one per class
&lt;/span&gt;&lt;span class="n"&gt;shap_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array_class_0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;array_class_1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# each: (n_samples, n_features)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Newer SHAP (0.44+) returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# New SHAP (&amp;gt;= 0.44): single 3D array
&lt;/span&gt;&lt;span class="n"&gt;shap_values&lt;/span&gt;  &lt;span class="c1"&gt;# shape: (n_samples, n_features, n_classes)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you write naive code like &lt;code&gt;shap_values[1]&lt;/code&gt; to get class 1, it works on old SHAP and silently returns wrong results (a column, not a matrix) on new SHAP.&lt;/p&gt;

&lt;p&gt;The Hermes Agent approach of having a dedicated &lt;code&gt;tool_shap_analyzer&lt;/code&gt; with explicit version handling is what caught this. Because the tool is isolated, I could add this logic cleanly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ndim&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# New SHAP: take class 1 slice
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shap_raw&lt;/span&gt;  &lt;span class="c1"&gt;# Regression or binary
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a good example of why agentic tool isolation matters. If my SHAP analysis was one giant function, this fix would be buried in 200 lines of code. As a distinct tool, it's 10 lines, clearly documented, easy to test independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hermes Agent vs. Other Agentic Frameworks
&lt;/h2&gt;

&lt;p&gt;I know some of you are thinking: &lt;em&gt;"Why Hermes Agent? Why not LangChain, AutoGen, or CrewAI?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Fair question. Here's my honest take after building this:&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain
&lt;/h3&gt;

&lt;p&gt;LangChain is powerful but opinionated. It works best when you're chaining LLM calls together. For my use case — where the "tools" are Python scientific computing libraries, not LLM endpoints — LangChain felt like overkill. I'd be fighting its abstractions rather than using them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use LangChain when:&lt;/strong&gt; You're primarily chaining LLM calls, doing RAG, or need the massive ecosystem of pre-built integrations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Hermes Agent when:&lt;/strong&gt; You want clean, explicit tool definitions where you control exactly what runs and when.&lt;/p&gt;

&lt;h3&gt;
  
  
  AutoGen
&lt;/h3&gt;

&lt;p&gt;AutoGen's multi-agent conversation model is fascinating but complex. For a pipeline where the execution order is deterministic (always: inspect → SHAP → LIME → bias → report), AutoGen's flexibility is unnecessary complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use AutoGen when:&lt;/strong&gt; You need agents to negotiate with each other, have non-deterministic execution paths, or are building multi-agent debate systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Hermes Agent when:&lt;/strong&gt; You have a clear sequential task where each step has defined inputs and outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  CrewAI
&lt;/h3&gt;

&lt;p&gt;CrewAI is probably the closest conceptually — role-based agents with tool access. The main difference I found: Hermes Agent gives you more direct control over the planning loop. In CrewAI, the "crew" model adds abstraction that I didn't need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use CrewAI when:&lt;/strong&gt; You want to model your problem as a team of specialized agents with roles and delegation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Hermes Agent when:&lt;/strong&gt; You want to own the planning logic explicitly and aren't sure yet what "roles" make sense for your problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  My Honest Assessment
&lt;/h3&gt;

&lt;p&gt;Hermes Agent's sweet spot is &lt;strong&gt;deterministic multi-step pipelines where you care about exactly what runs&lt;/strong&gt;. Scientific computing, data analysis, document processing, code generation — anything where you can define clear tool responsibilities and want explicit control over the execution flow.&lt;/p&gt;

&lt;p&gt;It's not trying to be everything. That's actually its strength.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Build Next
&lt;/h2&gt;

&lt;p&gt;After spending a weekend with Hermes Agent, here's what I'm thinking about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Counterfactual explanations.&lt;/strong&gt; The natural next question after "why did the model predict X?" is "what would need to change to get a different prediction?" Hermes Agent would add a Tool 6 for this — using DiCE (Diverse Counterfactual Explanations) to generate &lt;em&gt;"if your tumor's worst area had been 15% smaller, the model would have predicted benign."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Neural network support.&lt;/strong&gt; Right now XAI-Agent handles tree models and most sklearn models. Adding SHAP DeepExplainer for PyTorch/TensorFlow would open up the majority of production models running in industry right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI/CD integration.&lt;/strong&gt; The most powerful version of this isn't a web app — it's a tool that runs automatically every time a model is retrained, generates a comparison report (did the important features change? did bias metrics shift?), and posts it as a GitHub PR comment.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture: Why Open-Source Agents Matter
&lt;/h2&gt;

&lt;p&gt;I want to end with something that's been on my mind since I started building this.&lt;/p&gt;

&lt;p&gt;XAI platforms from companies like Fiddler AI, Arize, and Arthur AI are excellent. They're also $30K–$100K+ per year. A startup, a researcher, a solo developer, a nonprofit building AI for healthcare in a developing country — none of them can afford that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The open-source AI agent ecosystem is the equalizer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hermes Agent running SHAP + LIME locally, generating a downloadable report, requiring nothing beyond a Python environment — that's accessible to anyone with a laptop.&lt;/p&gt;

&lt;p&gt;As AI becomes more embedded in consequential decisions (loan approvals, medical diagnoses, hiring, parole, content moderation), the ability to explain and audit those decisions cannot be a luxury that only well-funded companies can afford.&lt;/p&gt;

&lt;p&gt;Open, capable agent systems like Hermes Agent aren't just technically interesting. They're the infrastructure for making AI accountability universal.&lt;/p&gt;

&lt;p&gt;That's why I built XAI-Agent. That's why I'm writing about Hermes Agent. And that's why I think the work this community is doing with open-source agents genuinely matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh20" rel="noopener noreferrer"&gt;github.com/SimranShaikh20/xai-agent&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SimranShaikh20/xai-agent
&lt;span class="nb"&gt;cd &lt;/span&gt;xai-agent
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
streamlit run app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test files (sample_model.pkl + sample_dataset.csv) are included. Full analysis in under 3 minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Questions I'd Love to Discuss
&lt;/h2&gt;

&lt;p&gt;I'm genuinely curious about the community's experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have you used Hermes Agent on something other than text/LLM tasks? What was the use case?&lt;/li&gt;
&lt;li&gt;How do you handle context accumulation in long agentic pipelines?&lt;/li&gt;
&lt;li&gt;What explainability features would make XAI-Agent actually useful for your work?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop them in the comments — I read and respond to everything 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://dev.to/challenges/hermes-agent"&gt;DEV Hermes Agent Challenge&lt;/a&gt; — May 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tags: #hermesagentchallenge #agents #ai #machinelearning #explainableai #opensource #python #streamlit&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>VibeSafe</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Sat, 16 May 2026 11:07:38 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/vibesafe-5252</link>
      <guid>https://dev.to/simranshaikh20_50/vibesafe-5252</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;VibeSafe&lt;/strong&gt; — a privacy-first AI code auditor that analyzes your project and generates a &lt;strong&gt;Proof of Authorship certificate&lt;/strong&gt;, identifying your human architectural decisions versus AI-assisted patterns.&lt;/p&gt;

&lt;p&gt;The problem VibeSafe solves is real and growing: AI-assisted development is now standard practice, but that creates a trust gap. When submitting to a hackathon, applying for a job, or open-sourcing a project, reviewers increasingly ask — &lt;em&gt;"Did you actually build this, or did an AI?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;VibeSafe answers that question — not by detecting AI, but by &lt;strong&gt;surfacing your human decisions&lt;/strong&gt;: the architecture choices, the product instincts, the specific tradeoffs only you would have made.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;Drop your project files into VibeSafe. Gemma 4 31B reads your &lt;strong&gt;entire codebase in one prompt&lt;/strong&gt; (thanks to the 262K context window) and returns a structured report across four sections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Section&lt;/th&gt;
&lt;th&gt;What you get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🔑 Proof of Authorship&lt;/td&gt;
&lt;td&gt;Your human design decisions vs AI patterns + originality score&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔐 Security Audit&lt;/td&gt;
&lt;td&gt;Vulnerabilities, exposed secrets, injection risks with specific fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🧠 Logic Analysis&lt;/td&gt;
&lt;td&gt;Edge cases, dead code, race conditions, code quality score&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📖 Plain English&lt;/td&gt;
&lt;td&gt;What your code actually does, explained simply&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At the end, you download a &lt;strong&gt;signed Proof of Authorship certificate&lt;/strong&gt; — a plain-text document that lists every architectural decision that proves the project is genuinely yours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Privacy first
&lt;/h3&gt;

&lt;p&gt;VibeSafe makes a direct API call from your browser to OpenRouter. No backend server. No database. No code stored anywhere. Your API key lives in React state only — closing the tab clears everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live App:&lt;/strong&gt; &lt;a href="https://vibesafe.lovable.app" rel="noopener noreferrer"&gt;vibesafe.lovable.app&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick walkthrough:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Enter your free OpenRouter API key&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Upload your project files or paste code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drag and drop multiple files. VibeSafe accepts &lt;code&gt;.py&lt;/code&gt; &lt;code&gt;.js&lt;/code&gt; &lt;code&gt;.jsx&lt;/code&gt; &lt;code&gt;.ts&lt;/code&gt; &lt;code&gt;.tsx&lt;/code&gt; &lt;code&gt;.html&lt;/code&gt; &lt;code&gt;.css&lt;/code&gt; &lt;code&gt;.json&lt;/code&gt; &lt;code&gt;.go&lt;/code&gt; &lt;code&gt;.rs&lt;/code&gt; &lt;code&gt;.php&lt;/code&gt; &lt;code&gt;.sql&lt;/code&gt; and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Watch Gemma 4 analyze your code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The terminal-style loading screen shows exactly what Gemma 4 is doing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;› Connecting to Gemma 4 31B...              ✓
› Reading codebase structure...             ✓
› Running security vulnerability scan...    ✓
› Checking for exposed API keys...          ✓
› Analyzing logic and edge cases...         ✓
› Identifying human architectural decisions ✓
› Detecting AI-generated patterns...        ✓
› Generating Proof of Authorship...         █
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4 — Get your full report&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/step4_screenshot" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/step4_screenshot" alt="Full 4-card report dashboard" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Download your Proof of Authorship certificate&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
         VIBESAFE — PROOF OF AUTHORSHIP
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Generated: 2026-05-16T10:32:11.000Z
Analyzed by: Gemma 4 31B via OpenRouter
Files: app.py, utils.py, config.py

HUMAN CONTRIBUTION SCORE: 78/100

HUMAN ARCHITECTURAL DECISIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Separation of database connection into get_db() factory
   Evidence: Explicit factory pattern in app.py

2. Stateless token design using user ID as token
   Evidence: /login returns raw user ID, deliberate tradeoff

3. Search and todos endpoints intentionally separated
   Evidence: distinct route handlers for different concerns

AI-ASSISTED PATTERNS DETECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Boilerplate Flask route scaffolding
2. Generic try/except error handling blocks

VERDICT: Solid architecture with critical SQL injection 
issues that need fixing before production deployment.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh20/vibesafe" rel="noopener noreferrer"&gt;github.com/SimranShaikh20/vibesafe&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; React 18 + Vite + Tailwind CSS + Framer Motion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File handling:&lt;/strong&gt; react-dropzone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Model:&lt;/strong&gt; Gemma 4 31B (&lt;code&gt;google/gemma-4-31b-it:free&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Provider:&lt;/strong&gt; OpenRouter free tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Lovable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;vibesafe/&lt;/span&gt;
&lt;span class="s"&gt;├── src/&lt;/span&gt;
&lt;span class="s"&gt;│   ├── App.jsx&lt;/span&gt;
&lt;span class="s"&gt;│   └── components/&lt;/span&gt;
&lt;span class="s"&gt;│       ├── CodeUploader.jsx     ← drag-drop + paste + API key&lt;/span&gt;
&lt;span class="s"&gt;│       ├── LoadingAnalysis.jsx  ← animated terminal steps&lt;/span&gt;
&lt;span class="s"&gt;│       ├── ReportDashboard.jsx  ← 4-card report layout&lt;/span&gt;
&lt;span class="s"&gt;│       ├── AuthorshipCard.jsx   ← hero card + certificate export&lt;/span&gt;
&lt;span class="s"&gt;│       ├── SecurityCard.jsx     ← expandable vulnerability list&lt;/span&gt;
&lt;span class="s"&gt;│       ├── LogicCard.jsx        ← quality score + logic issues&lt;/span&gt;
&lt;span class="s"&gt;│       └── PlainEnglishCard.jsx ← plain language explanation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Model chosen: Gemma 4 31B Dense (&lt;code&gt;google/gemma-4-31b-it&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;This was a deliberate choice, not a default. Here is exactly why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not Gemma 4 4B or 2B (edge models)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The small Gemma 4 models are remarkable for their size — running on a phone or Raspberry Pi is genuinely impressive. But VibeSafe needs to reason across an entire codebase simultaneously, identify cross-file architectural patterns, and generate nuanced judgments about human intent vs AI generation. A 4B model cannot hold enough context or produce reliable structured JSON for this level of analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not Gemma 4 26B MoE?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The MoE model is excellent for throughput — activating only 3.8B parameters per token makes it fast and efficient. But for security analysis, I need consistent reasoning quality on every token. A dense model that activates all 31B parameters gives more reliable vulnerability detection. Missing one SQL injection is worse than slower inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The killer feature: 262K context window&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is what makes the whole idea possible. I send the &lt;strong&gt;entire project&lt;/strong&gt; — every file concatenated with filename headers — in a single prompt. No chunking, no lost context between files, no missed cross-file dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Every file in one shot&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; 
  &lt;span class="s2"&gt;`\n\n=== FILE: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; ===\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Single call to Gemma 4 — 262K tokens available&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://openrouter.ai/api/v1/chat/completions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google/gemma-4-31b-it:free&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
      &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The authorship prompt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most interesting engineering decision was how to prompt Gemma 4 to distinguish human decisions from AI patterns. Generic code review prompts don't work — I needed Gemma 4 to think specifically about &lt;em&gt;intent&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are VibeSafe. Your job is to identify what the human developer 
intentionally designed versus AI-generated or boilerplate code.

Look for:
- Architecture decisions that reflect product thinking
- Specific tradeoffs that reveal human judgment  
- Patterns that are generic and could be AI-generated
- Cross-file design consistency that shows planned thinking

Return structured JSON with human_decisions and ai_patterns arrays.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;strong&gt;asking about intent produces better results than asking about code quality.&lt;/strong&gt; Gemma 4's reasoning capability is what makes this distinction meaningful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results on VibeSafe's own code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I ran VibeSafe on itself. Here is what Gemma 4 identified as my human decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separating the authorship report as the hero card rather than security (product decision)&lt;/li&gt;
&lt;li&gt;Using a terminal aesthetic for a code tool (design decision)&lt;/li&gt;
&lt;li&gt;Direct browser-to-API calls instead of a backend (architecture decision — prioritizing privacy)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And what it flagged as AI-assisted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boilerplate Tailwind utility class combinations&lt;/li&gt;
&lt;li&gt;Standard React useState patterns&lt;/li&gt;
&lt;li&gt;Generic error boundary structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Originality score: &lt;strong&gt;74/100&lt;/strong&gt;. That felt honest.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why this matters beyond the hackathon
&lt;/h3&gt;

&lt;p&gt;The question &lt;em&gt;"did you build this?"&lt;/em&gt; is going to become one of the most important questions in software hiring, education, and open source in the next few years. Right now there is no good answer — just vibes and gut instinct.&lt;/p&gt;

&lt;p&gt;VibeSafe is a first attempt at making that question answerable. Not by detecting AI (which is an arms race), but by documenting human contribution in a structured, verifiable way.&lt;/p&gt;

&lt;p&gt;Gemma 4's 262K context and reasoning capability made this possible to build in a weekend. That says something about where open models are right now.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with React + Gemma 4 31B + OpenRouter free tier&lt;/em&gt;&lt;br&gt;
&lt;em&gt;GitHub: &lt;a href="https://github.com/SimranShaikh20/vibesafe" rel="noopener noreferrer"&gt;github.com/SimranShaikh20/vibesafe&lt;/a&gt;&lt;/em&gt;[=&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>I Built a Tool That Proves Your Code Is Yours — Here's What Gemma 4 Made Possible</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Sat, 16 May 2026 11:01:48 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/-i-built-a-tool-that-proves-your-code-is-yours-heres-what-gemma-4-made-possible-4glh</link>
      <guid>https://dev.to/simranshaikh20_50/-i-built-a-tool-that-proves-your-code-is-yours-heres-what-gemma-4-made-possible-4glh</guid>
      <description>&lt;p&gt;There is a question spreading quietly through the software industry right now. Hiring managers are asking it. Hackathon judges are asking it. Open source maintainers are asking it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Did you actually build this?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nobody has a good answer yet. I tried to build one — and Gemma 4 is the reason it worked.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Is Talking About
&lt;/h2&gt;

&lt;p&gt;AI-assisted development has gone mainstream fast. Cursor, Copilot, Lovable, Bolt — developers are shipping real products with significant AI assistance, and that is genuinely fine. The tools exist, the skills are in using them well.&lt;/p&gt;

&lt;p&gt;But a trust gap is forming. When you submit a project to a hackathon, post a repo on GitHub, or show work in a job interview, reviewers are increasingly skeptical. The portfolio that used to signal skill now also signals a question mark.&lt;/p&gt;

&lt;p&gt;The current answer to "did you build this?" is essentially: trust me.&lt;/p&gt;

&lt;p&gt;That is not good enough. And trying to &lt;em&gt;detect&lt;/em&gt; AI-generated code is an arms race nobody will win — models improve, detection fails, repeat.&lt;/p&gt;

&lt;p&gt;I wanted a different approach: instead of detecting AI, &lt;strong&gt;document the human&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: VibeSafe
&lt;/h2&gt;

&lt;p&gt;VibeSafe is a browser-based code auditor that takes your project files, sends them to Gemma 4 31B in a single prompt, and returns a &lt;strong&gt;Proof of Authorship certificate&lt;/strong&gt; — a structured document identifying your human architectural decisions versus AI-assisted patterns.&lt;/p&gt;

&lt;p&gt;The output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HUMAN ARCHITECTURAL DECISIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Separation of database connection into factory function
   Evidence: get_db() pattern used consistently across modules

2. Deliberate stateless token design
   Evidence: login() returns user ID directly — intentional tradeoff

3. Privacy-first architecture: no backend, direct browser API calls
   Evidence: all external calls made from frontend, no server layer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the things only &lt;em&gt;I&lt;/em&gt; would have decided. The boilerplate React hooks and Tailwind utility classes? Gemma 4 flags those as AI-assisted. The product decisions, the tradeoffs, the specific ways the pieces connect? Those are mine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Gemma 4 Specifically
&lt;/h2&gt;

&lt;p&gt;I tried this concept with smaller models first. It did not work.&lt;/p&gt;

&lt;p&gt;The problem is that distinguishing &lt;em&gt;intent&lt;/em&gt; from &lt;em&gt;output&lt;/em&gt; requires holding the entire codebase in mind simultaneously. A model that has only seen half your files cannot tell you whether your architectural choices are consistent across the project. It cannot spot that you made the same deliberate tradeoff in three different places — which is actually the strongest signal of human authorship.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 262K Context Window Changes the Analysis
&lt;/h3&gt;

&lt;p&gt;Gemma 4's 262K context window means I send everything in one shot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;files&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`\n\n=== FILE: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; ===\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// One prompt. Entire project. Gemma 4 sees everything at once.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://openrouter.ai/api/v1/chat/completions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google/gemma-4-31b-it:free&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No chunking. No lost context. No missed cross-file patterns. The model sees the whole picture before making any judgment — the same way a senior engineer would read a codebase before commenting on it.&lt;/p&gt;

&lt;h3&gt;
  
  
  31B Dense vs the MoE Model
&lt;/h3&gt;

&lt;p&gt;I specifically chose the 31B Dense model over the 26B MoE for this use case.&lt;/p&gt;

&lt;p&gt;The MoE model activates ~3.8B parameters per token — it is faster and more efficient, ideal for high-throughput applications. But security analysis needs consistent reasoning quality on every single token. Missing one vulnerability because a parameter set was not activated is worse than slower inference. For a tool that is auditing your code for real risks, I wanted the full model engaged on every decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning Mode for Authorship Detection
&lt;/h3&gt;

&lt;p&gt;The part that surprised me most was how well Gemma 4 handles the authorship question when prompted correctly. Generic "review my code" prompts produce generic answers. But when you ask specifically about &lt;em&gt;intent&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Look for architectural decisions that reflect product thinking.
Look for specific tradeoffs that reveal human judgment.
Distinguish these from patterns that are generic and could be AI-generated.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemma 4 produces genuinely insightful distinctions. It identified that my choice to put authorship as the hero card — not security — was a human product decision. It noticed the privacy-first architecture (no backend) as a deliberate tradeoff, not a default. It caught that I reused the same terminal aesthetic across components as a consistent design language.&lt;/p&gt;

&lt;p&gt;That is not code review. That is architectural reasoning.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Running VibeSafe on Itself Taught Me
&lt;/h2&gt;

&lt;p&gt;I ran VibeSafe on its own source code. The results were honest in a way I did not expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human decisions Gemma 4 identified:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authorship card as hero feature (product decision, not default)&lt;/li&gt;
&lt;li&gt;Direct browser-to-API architecture (privacy tradeoff)&lt;/li&gt;
&lt;li&gt;Terminal aesthetic as unified design language&lt;/li&gt;
&lt;li&gt;Certificate export as plain text (accessibility over PDF complexity)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI-assisted patterns it flagged:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard Tailwind utility class combinations&lt;/li&gt;
&lt;li&gt;Boilerplate useState/useEffect patterns&lt;/li&gt;
&lt;li&gt;Generic error boundary structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Originality score: 74/100&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That feels right. A good chunk of the implementation is standard React patterns. But the product decisions — what to build, how to frame it, what matters to the user — those are mine. 74 out of 100 captures that honestly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Developers Right Now
&lt;/h2&gt;

&lt;p&gt;Open models at Gemma 4's capability level running on free infrastructure changes what individual developers can build.&lt;/p&gt;

&lt;p&gt;Six months ago, this analysis would have required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A paid API with expensive per-token costs&lt;/li&gt;
&lt;li&gt;A backend to handle large context requests&lt;/li&gt;
&lt;li&gt;Chunking logic to split codebases into pieces&lt;/li&gt;
&lt;li&gt;Multiple round-trips losing context between calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now it is a single &lt;code&gt;fetch()&lt;/code&gt; call from a React component. Free. 262K tokens. Full model. No backend.&lt;/p&gt;

&lt;p&gt;The barrier between "idea" and "working product" for AI-powered developer tools has dropped significantly. VibeSafe went from concept to working demo in a weekend — not because the engineering is simple, but because Gemma 4 handles the hard part.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The "did you build this?" problem is not going away. It is going to intensify as models improve and AI-assisted development becomes more capable.&lt;/p&gt;

&lt;p&gt;But I think the framing of the question is wrong. The interesting question is not "how much did AI write?" — it is "what did the human decide?"&lt;/p&gt;

&lt;p&gt;Architecture. Product instincts. Tradeoffs. The specific shape of a solution. These things are still fundamentally human, even when the implementation is AI-assisted. They are also what actually matter in a developer.&lt;/p&gt;

&lt;p&gt;VibeSafe is a first attempt at making those decisions visible and documentable. Gemma 4's reasoning capability and context window are what made it possible to build something that actually captures them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://vibe-proof-code.lovable.app/" rel="noopener noreferrer"&gt;https://vibe-proof-code.lovable.app/&lt;/a&gt;&lt;br&gt;
🔗 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh20/vibesafe" rel="noopener noreferrer"&gt;github.com/SimranShaikh20/vibesafe&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You need a free OpenRouter API key from &lt;a href="https://openrouter.ai/keys" rel="noopener noreferrer"&gt;openrouter.ai/keys&lt;/a&gt; — no credit card, takes 30 seconds.&lt;/p&gt;

&lt;p&gt;Run it on your own project. See what Gemma 4 says about what you built.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;VibeSafe · Powered by Gemma 4 31B · OpenRouter free tier · Built for the vibe coding era&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>DukanBot: I Flipped OpenClaw Inside-Out to Run WhatsApp for 12 Million Kirana Stores</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Fri, 24 Apr 2026 10:53:24 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/dukanbot-i-flipped-openclaw-inside-out-to-run-whatsapp-for-12-million-kirana-stores-3956</link>
      <guid>https://dev.to/simranshaikh20_50/dukanbot-i-flipped-openclaw-inside-out-to-run-whatsapp-for-12-million-kirana-stores-3956</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;There are &lt;strong&gt;12 million kirana stores&lt;/strong&gt; in India.&lt;/p&gt;

&lt;p&gt;Every single one of them runs on WhatsApp. Orders come in on WhatsApp. Confirmations go out on WhatsApp. Payment reminders are typed manually — at midnight, after a full day standing behind a counter.&lt;/p&gt;

&lt;p&gt;My neighbour Sharma Ji runs one of these stores. He writes every "your order is confirmed 🙏" message by hand. When customers don't pay, he has to remember to follow up. When he forgets — which happens — he loses money. Not because he's a bad businessman. Because he has no system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DukanBot&lt;/strong&gt; is that system.&lt;/p&gt;

&lt;p&gt;It's a complete order management dashboard for kirana stores where the store owner clicks once — and an OpenClaw AI agent running on Groq's LLaMA 3.3 70B sends the WhatsApp message, handles the customer's reply, and logs everything to a real database.&lt;/p&gt;

&lt;p&gt;No manual typing. No forgotten follow-ups. No lost money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; [&lt;a href="https://dukan-bot.netlify.app/%5Dhttps://dukan-bot.netlify.app/" rel="noopener noreferrer"&gt;https://dukan-bot.netlify.app/]https://dukan-bot.netlify.app/&lt;/a&gt;)&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh/dukanbot" rel="noopener noreferrer"&gt;github.com/SimranShaikh/dukanbot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz411qtp9asipvjth048n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz411qtp9asipvjth048n.png" alt="DukanBot Dashboard showing live order stats" width="800" height="284"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  How I Used OpenClaw
&lt;/h2&gt;

&lt;p&gt;Here's the thing that makes DukanBot different from every other submission:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I didn't use OpenClaw as a chatbot. I used it as a messaging engine triggered by a web dashboard.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most people connect OpenClaw as a chat interface — you talk to it, it responds. I flipped this completely. In DukanBot, the store owner never touches OpenClaw at all. They just use the dashboard. OpenClaw runs silently in the background, receiving webhooks from the frontend and firing WhatsApp messages to customers.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Store owner clicks "Send Confirmation" in DukanBot dashboard
              ↓
Dashboard POSTs JSON to OpenClaw webhook (localhost:18789/webhook)
              ↓
OpenClaw's DukanBot skill receives the payload
              ↓
Groq LLaMA 3.3 70B formats the message with context
              ↓
OpenClaw sends WhatsApp to customer via connected channel
              ↓
Customer gets: "Hello Rahul! Your order DKN-023 from
               Sharma Kirana worth ₹340 is confirmed 🙏"
              ↓
Dashboard shows "Sent via OpenClaw 🦞 ✓" toast
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This pattern — &lt;strong&gt;web app as the UI, OpenClaw as the execution layer&lt;/strong&gt; — is something I haven't seen in any other submission. It treats OpenClaw like a microservice, not a personal assistant.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Webhook Payload
&lt;/h3&gt;

&lt;p&gt;When the store owner clicks &lt;strong&gt;Send Confirmation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello Rahul! Your order DKN-023 from Sharma Kirana Store worth ₹340 has been confirmed. Thank you! 🙏"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"+919876543210"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"confirmation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"order_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DKN-023"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"store_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sharma Kirana Store"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"upi_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sharma@upi"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the store owner clicks &lt;strong&gt;Send Payment Reminder:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello Rahul, aapka ₹340 ka payment DKN-023 ke liye pending hai. Please pay on UPI: sharma@upi. Thank you 🙏"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"+919876543210"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reminder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"order_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DKN-023"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the reminder is in &lt;strong&gt;Hinglish&lt;/strong&gt; (Hindi + English). That's intentional — kirana store customers in India communicate in Hinglish, not formal English. OpenClaw's SKILL.md handles the tone.&lt;/p&gt;

&lt;h3&gt;
  
  
  The SKILL.md — The Real Brain
&lt;/h3&gt;

&lt;p&gt;This is the complete skill file that powers DukanBot. No code — pure markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dukanbot&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kirana store WhatsApp assistant. Sends order confirmations&lt;/span&gt;
&lt;span class="s"&gt;and payment reminders. Handles customer replies automatically.&lt;/span&gt;
&lt;span class="s"&gt;Powered by Groq LLaMA 3.3 70B Versatile.&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# DukanBot — Kirana Store WhatsApp AI&lt;/span&gt;

You are DukanBot, a WhatsApp assistant for Indian kirana stores.
You run on Groq's ultra-fast LLaMA 3.3 70B model via OpenClaw.

&lt;span class="gu"&gt;## When webhook type is "confirmation":&lt;/span&gt;
Send WhatsApp to the "to" number using the "message" field.
Add a warm closing: "Aapka business humara garv hai 🙏"
Log: [timestamp] CONFIRMATION sent to [number] for [order_id]

&lt;span class="gu"&gt;## When webhook type is "reminder":&lt;/span&gt;
Send WhatsApp to the "to" number using the "message" field.
Keep tone polite but clear — small business relationships matter.
Add: "Koi problem ho toh batayein — hum help karenge 🙏"
Log: [timestamp] REMINDER sent to [number] for [order_id]

&lt;span class="gu"&gt;## When customers reply on WhatsApp:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; "paid" / "done" / "ho gaya" → "Shukriya! Payment received ✅🙏"
&lt;span class="p"&gt;-&lt;/span&gt; "when" / "kab" / "ready" → "Aapka order prepare ho raha hai.
   Notification milegi jaldi!"
&lt;span class="p"&gt;-&lt;/span&gt; "cancel" / "nahi chahiye" → Forward to store owner immediately:
   "⚠️ Customer [name] wants to cancel order [id]. Please respond."
&lt;span class="p"&gt;-&lt;/span&gt; anything else → Summarize and forward to store owner

&lt;span class="gu"&gt;## Tone Rules:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Always Hinglish (mix of Hindi and English naturally)
&lt;span class="p"&gt;-&lt;/span&gt; Never rude, never rushed — ye relationship business hai
&lt;span class="p"&gt;-&lt;/span&gt; Use 🙏 for greetings and thanks
&lt;span class="p"&gt;-&lt;/span&gt; Keep messages under 3 sentences
&lt;span class="p"&gt;-&lt;/span&gt; Always include store name

&lt;span class="gu"&gt;## What you are:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Framework: OpenClaw (open-source personal AI)
&lt;span class="p"&gt;-&lt;/span&gt; Model: Groq LLaMA 3.3 70B Versatile (~200ms response)
&lt;span class="p"&gt;-&lt;/span&gt; Channel: WhatsApp
&lt;span class="p"&gt;-&lt;/span&gt; Purpose: Automate the boring parts so store owners focus on people
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Groq (Not Claude or GPT)
&lt;/h3&gt;

&lt;p&gt;This was a deliberate technical choice.&lt;/p&gt;

&lt;p&gt;A kirana store owner clicking "Send Reminder" expects instant feedback. If the button spins for 3 seconds, they think it broke. I benchmarked three providers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Avg Response&lt;/th&gt;
&lt;th&gt;Free Tier&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Groq LLaMA 3.3 70B&lt;/td&gt;
&lt;td&gt;~180ms&lt;/td&gt;
&lt;td&gt;✅ Generous&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;This project&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku&lt;/td&gt;
&lt;td&gt;~800ms&lt;/td&gt;
&lt;td&gt;❌ Paid&lt;/td&gt;
&lt;td&gt;Complex reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI GPT-4o-mini&lt;/td&gt;
&lt;td&gt;~600ms&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;General use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Groq's LPU hardware is purpose-built for inference. For a WhatsApp message that's 2-3 sentences, 180ms feels instant. The store owner sees the success toast before they've finished reading the button label.&lt;/p&gt;

&lt;p&gt;Config in &lt;code&gt;openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"GROQ_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gsk_..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"groq/llama-3.3-70b-versatile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"fallbacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"groq/llama-3.1-8b-instant"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"telegram"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"accounts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"main"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"token"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_TELEGRAM_BOT_TOKEN"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  OpenClaw Status — Built Into the UI
&lt;/h3&gt;

&lt;p&gt;The navbar shows a live OpenClaw connection indicator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Green dot = webhook URL configured, connection verified&lt;/li&gt;
&lt;li&gt;🔴 Red dot = webhook URL missing → clicking it goes to Settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If not connected, the Quick Reply panel shows a yellow banner:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"OpenClaw not connected. Go to Settings → OpenClaw to set up."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The entire OpenClaw install flow — Node.js, &lt;code&gt;npm install -g openclaw@latest&lt;/code&gt;, model selection, SKILL.md download, webhook URL — is a dedicated tab inside DukanBot. The store owner never needs to read an external doc.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenClaw Powers 4 Distinct Flows
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Flow&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;What OpenClaw Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Order confirmation&lt;/td&gt;
&lt;td&gt;Store owner clicks button&lt;/td&gt;
&lt;td&gt;Sends formatted WhatsApp to customer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payment reminder&lt;/td&gt;
&lt;td&gt;Store owner clicks button or "Send All"&lt;/td&gt;
&lt;td&gt;Sends Hinglish reminder with UPI details&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer reply: paid&lt;/td&gt;
&lt;td&gt;Customer texts "ho gaya"&lt;/td&gt;
&lt;td&gt;Replies "Shukriya ✅" automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer reply: unknown&lt;/td&gt;
&lt;td&gt;Any other message&lt;/td&gt;
&lt;td&gt;Summarizes + forwards to store owner&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Screenshots
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Dashboard — all data from Supabase, zero hardcoded values&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5a9jnqrs7qoh3xkgdjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc5a9jnqrs7qoh3xkgdjz.png" alt="Dashboard with live stats" width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenClaw Setup tab — install guide built into the app&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffy8xi3vyxryxehmzbnz1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffy8xi3vyxryxehmzbnz1.png" alt="OpenClaw setup page inside DukanBot" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orders page — real CRUD, filter by status&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6193f81cuwdm8easoxs3.png" alt="Orders management page" width="635" height="773"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. OpenClaw is more powerful as an outbound engine than a chatbot
&lt;/h3&gt;

&lt;p&gt;The obvious use case for OpenClaw is "chat with your AI." The less obvious use case — which I think is actually more powerful — is using it as a &lt;strong&gt;triggered action engine&lt;/strong&gt; for existing web apps. Your web app handles the UI. OpenClaw handles the messy, stateful parts: sending messages, handling replies, logging.&lt;/p&gt;

&lt;p&gt;This separation of concerns is clean. The dashboard developer doesn't need to understand WhatsApp's API quirks. The OpenClaw skill handles that. The skill author doesn't need to understand Supabase schema. The dashboard handles that.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. SKILL.md is Markdown with superpowers
&lt;/h3&gt;

&lt;p&gt;I expected to write JavaScript to handle different webhook types (confirmation vs reminder vs customer reply). Instead, I described the behavior in plain English inside SKILL.md — and it worked reliably across hundreds of test messages. The instruction "Never rude, never rushed — ye relationship business hai" actually produced measurably warmer replies than "be professional."&lt;/p&gt;

&lt;p&gt;Writing a good SKILL.md feels more like writing a really clear job description than writing code. That's a genuine paradigm shift.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Supabase RLS is not optional for multi-user apps
&lt;/h3&gt;

&lt;p&gt;I tested without Row Level Security first. Every store owner saw every other store's orders. Adding &lt;code&gt;auth.uid() = user_id&lt;/code&gt; policies to both tables fixed it in one SQL command. This is the kind of thing that's easy to skip when prototyping but disastrous in production. Now it's the first thing I set up.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cultural specificity is a feature, not a constraint
&lt;/h3&gt;

&lt;p&gt;"Hinglish messages" and "₹ formatting" and "DD/MM/YYYY" and "floating WhatsApp button" aren't localisation details. They're the product. A kirana store owner who opens an app and sees "Hello Sharma Ji 🙏" instead of "Hello John" trusts that app differently. Building for a specific real person — not a generic user — makes every product decision easier and better.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The hardest part was scope
&lt;/h3&gt;

&lt;p&gt;The temptation was to add inventory tracking, Google Sheets sync, customer loyalty points, voice notes. I cut all of it. The constraint "what would Sharma Ji actually use on a Tuesday afternoon" kept the scope tight and the product coherent. Ship the useful thing first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Clone the repo&lt;/span&gt;
git clone https://github.com/SimranShaikh/dukanbot.git
&lt;span class="nb"&gt;cd &lt;/span&gt;dukanbot &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# 2. Set up Supabase (see README for full SQL)&lt;/span&gt;
&lt;span class="c"&gt;# Add your VITE_SUPABASE_URL and VITE_SUPABASE_ANON_KEY to .env&lt;/span&gt;

&lt;span class="c"&gt;# 3. Install OpenClaw with Groq&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
&lt;span class="nv"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gsk_your_key openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;

&lt;span class="c"&gt;# 4. Install the DukanBot skill&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.openclaw/workspace/skills/dukanbot
&lt;span class="nb"&gt;cp &lt;/span&gt;skills/dukanbot/SKILL.md ~/.openclaw/workspace/skills/dukanbot/
openclaw gateway restart

&lt;span class="c"&gt;# 5. Open DukanBot → Settings → paste http://localhost:18789/webhook&lt;/span&gt;
&lt;span class="c"&gt;# 6. Click "Test Connection" → green ✓ → you're live&lt;/span&gt;

npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full setup guide, SQL schema, and SKILL.md at the GitHub repo above.&lt;/p&gt;




&lt;h2&gt;
  
  
  ClawCon Michigan
&lt;/h2&gt;

&lt;p&gt;I didn't attend ClawCon Michigan — I'm a final-year CS student in India and the geography didn't work out this time. But building DukanBot made me realise something: the most interesting OpenClaw builds aren't coming from Silicon Valley. They're coming from people who have a specific, local, unglamorous problem that no VC-funded startup will ever solve. A Telegram bot for kirana store payment reminders. An SMS agent for farmers. An auto-reply skill for a one-person tailoring shop.&lt;/p&gt;

&lt;p&gt;That's the version of personal AI I'm excited about. And from what I've read about ClawCon Michigan, it seems like the community there gets this too. Would love to be there in person next year.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Simran Shaikh — Final Year BE CS, The Maharaja Sayajirao University of Baroda&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Stack: OpenClaw + Groq LLaMA 3.3 70B + Supabase + React (Lovable.dev)&lt;/em&gt;&lt;br&gt;
&lt;em&gt;GitHub: &lt;a href="https://github.com/SimranShaikh/dukanbot" rel="noopener noreferrer"&gt;github.com/SimranShaikh/dukanbot&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
      <category>openclaw</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built a Multi-Step AI Agent in One Day with Google ADK — Here's What Nobody Tells You</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Thu, 23 Apr 2026 09:28:41 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/i-built-a-multi-step-ai-agent-in-one-day-with-google-adk-heres-what-nobody-tells-you-3m2n</link>
      <guid>https://dev.to/simranshaikh20_50/i-built-a-multi-step-ai-agent-in-one-day-with-google-adk-heres-what-nobody-tells-you-3m2n</guid>
      <description>&lt;p&gt;I'm a final-year computer science student. I spend most of my days training deep learning models on image datasets, debugging tensor shape errors at 2am, and convincing myself that 67% accuracy is "a solid baseline." &lt;/p&gt;

&lt;p&gt;I do not, normally, build AI agents.&lt;/p&gt;

&lt;p&gt;But when Google Cloud NEXT '26 dropped last week and I saw the announcements around ADK 2.0 and the new Gemini Enterprise Agent Platform, I got genuinely curious. Not marketing-brochure curious — actually curious. Because the thing they kept saying was: &lt;em&gt;"You can now build multi-step autonomous agents that coordinate with each other."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That sounded either really powerful or really overhyped. I wanted to find out which.&lt;/p&gt;

&lt;p&gt;So I spent a day building something with it. This is what actually happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Even Is ADK?
&lt;/h2&gt;

&lt;p&gt;Before I get into the friction, a quick explainer for anyone who hasn't seen the announcements.&lt;/p&gt;

&lt;p&gt;ADK — Agent Development Kit — is Google's open-source Python framework for building AI agents. Not chatbots. &lt;em&gt;Agents&lt;/em&gt; — programs that take a goal, break it into steps, use tools, and figure out how to get things done autonomously.&lt;/p&gt;

&lt;p&gt;The ADK 2.0 alpha (released March 2026) brought in graph-based workflows, collaborative multi-agent support, and native Vertex AI integration. The stable version (1.x) already supports multi-agent coordination and tool use. That's what I ended up using, and I'll explain why in a moment.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Decided to Build
&lt;/h2&gt;

&lt;p&gt;I wanted to build a &lt;strong&gt;Research Assistant Agent&lt;/strong&gt; — you give it a topic, it searches the web, structures the findings, and suggests what to explore next.&lt;/p&gt;

&lt;p&gt;The twist: instead of one agent doing everything, I'd build it as a &lt;strong&gt;multi-agent pipeline&lt;/strong&gt; with specialist sub-agents, the way ADK is actually designed to be used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;web_searcher&lt;/code&gt; → hits Google Search, returns raw findings
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;analyst_summarizer&lt;/code&gt; → structures those findings for developers
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;research_coordinator&lt;/code&gt; → orchestrates both, delivers the final answer
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Simple enough concept. Let's talk about what happened when I actually tried to set it up.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup: Where Things Got Interesting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — Getting the API Key
&lt;/h3&gt;

&lt;p&gt;Go to &lt;a href="https://aistudio.google.com" rel="noopener noreferrer"&gt;aistudio.google.com&lt;/a&gt;, sign in, click "Get API Key." This part was genuinely smooth. Took maybe 3 minutes. Free tier gives you enough to build and experiment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Installing ADK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;google-adk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple. Worked on the first try. The install is clean and the dependencies are sensible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Creating the Project Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adk create research_agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gave me a folder with &lt;code&gt;agent.py&lt;/code&gt;, &lt;code&gt;.env&lt;/code&gt;, and &lt;code&gt;__init__.py&lt;/code&gt; already stubbed out. That's a nice touch — you're not hunting for the right structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;research_agent/
    agent.py
    .env
    __init__.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4 — The Part Where I Hit a Wall
&lt;/h3&gt;

&lt;p&gt;I was excited by ADK 2.0 after reading about the new workflow engine, so I tried installing it first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;google-adk &lt;span class="nt"&gt;--pre&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's the honest thing nobody's blog post tells you: &lt;strong&gt;ADK 2.0 is a proper alpha.&lt;/strong&gt; The docs say it. The PyPI page says it. But you don't fully feel it until you're staring at import errors because the API surface has breaking changes from 1.x.&lt;/p&gt;

&lt;p&gt;I spent about 40 minutes trying to make 2.0 work before I made the practical call: the stable 1.31.x release already supports multi-agent orchestration. The thing I wanted to build was fully doable without the alpha. So I went back to stable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;google-adk  &lt;span class="c"&gt;# without --pre&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; ADK 2.0 is genuinely exciting for what it brings (graph-based workflows, better debugging tooling, stateful multi-step support), but right now it's for people who want to be on the bleeding edge and don't mind patching things. If you want to &lt;em&gt;build and ship something this week&lt;/em&gt;, use 1.x.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Agent
&lt;/h2&gt;

&lt;p&gt;Here's the full code. I'll walk you through each piece.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Project Setup
&lt;/h3&gt;

&lt;p&gt;Add your API key to &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;GOOGLE_GENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;your_key_here&lt;/span&gt;
&lt;span class="py"&gt;GOOGLE_GENAI_USE_VERTEXAI&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;FALSE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GOOGLE_GENAI_USE_VERTEXAI=FALSE&lt;/code&gt; means you're using AI Studio (free), not Vertex AI (Google Cloud). Keep it false unless you've set up a Cloud project.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Sub-Agents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google_search&lt;/span&gt;

&lt;span class="c1"&gt;# Sub-Agent 1: Does the actual web searching
&lt;/span&gt;&lt;span class="n"&gt;searcher_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_searcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Searches the web for up-to-date information on a given topic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a web research specialist. Your only job is to search for 
    accurate, recent information on the topic given to you.
    Always use the google_search tool — never answer from memory alone.
    Prioritize sources from 2025-2026.
    Return a clear summary of what you found, including source context.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Sub-Agent 2: Turns raw findings into structured output
&lt;/span&gt;&lt;span class="n"&gt;summarizer_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyst_summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Structures raw research into clear developer-friendly summaries.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a technical writer for a developer audience.
    Structure your response as:

    ## Key Findings
    [3-4 bullet points of the most important facts]

    ## The Most Surprising Thing
    [One insight that might be unexpected]

    ## What to Watch Out For
    [Caveats, limitations, or gotchas]

    ## 3 Follow-Up Questions
    [Specific questions a developer might want to explore next]

    Keep the tone honest, direct, and useful. No marketing fluff.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing I noticed: the &lt;code&gt;description&lt;/code&gt; field matters more than I expected. The coordinator uses it to decide &lt;em&gt;which&lt;/em&gt; sub-agent to delegate to. If your description is vague, the orchestration gets confused. Lesson from 30 minutes of head-scratching.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Coordinator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_coordinator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coordinates multi-step research by delegating to specialist sub-agents.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You coordinate a two-step research pipeline:
    Step 1 — Delegate to web_searcher to find current information.
    Step 2 — Pass those findings to analyst_summarizer to structure them.
    Step 3 — Present the final structured output to the user.
    Always complete both steps before responding. Do not skip the search step.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;searcher_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summarizer_agent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;agents=[...]&lt;/code&gt; parameter is doing the heavy lifting here. You're giving the coordinator a roster of sub-agents it can delegate to. It decides when to call which one based on the task and their descriptions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adk web research_agent/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8000&lt;/code&gt; and you get a chat interface with full event inspection — you can see every step the agent takes, every tool call, every sub-agent handoff. For a framework aimed at developers, this is actually thoughtful UX.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Asked It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test 1: Something I Already Knew the Answer To
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What is Google ADK and what was announced at Cloud NEXT '26?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent searched, found the NEXT '26 announcements, and structured them cleanly. The output was accurate. It correctly identified ADK 2.0's graph-based workflows and the Gemini Enterprise Agent Platform rebrand. It cited things from 2026, not 2023. &lt;/p&gt;

&lt;h3&gt;
  
  
  Test 2: Something Niche
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What's the current state of AI agents in manufacturing quality control?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where it got more interesting. The search results were mixed — some solid, some generic. The summarizer was honest about the limitations of what it found. It flagged one follow-up question I hadn't considered: whether outcome-based pricing (one of NEXT '26's announcements) changes the economics of running vision AI at manufacturing scale. I hadn't thought about that angle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test 3: Pushing It
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What's the A2A protocol and why does it matter for a student building their first AI project?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Best output of the three. The framing of "for a student" changed the register of the summary — it explained the A2A protocol in practical terms (agents from different companies can talk to each other without custom integration code) rather than enterprise-speak. The follow-up questions were specific and genuinely useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Genuinely Impressed Me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The multi-agent handoff is seamless.&lt;/strong&gt; I expected some clunkiness at the boundary between searcher and summarizer. There wasn't any. The coordinator passes context cleanly, and the summarizer clearly received structured findings rather than raw text. I don't know exactly what's happening under the hood, but the output quality was noticeably better than a single-agent approach I tested alongside it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The web UI for debugging.&lt;/strong&gt; Being able to see the full event trace — which agent ran, what tool it called, what it returned — is not a small thing. When something goes wrong (and it will), you can actually see where. This is the kind of tooling that makes the difference between framework adoption and abandonment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;google_search&lt;/code&gt; as a first-class tool.&lt;/strong&gt; You import it, you add it to &lt;code&gt;tools=[]&lt;/code&gt;, and it works. No API key management, no rate limit configuration to figure out upfront. For getting started, that's exactly right.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Push Back On
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ADK 2.0 alpha is not ready for a tutorial.&lt;/strong&gt; I understand why Google announced it at NEXT '26 — the graph-based workflow engine is a genuine step forward in how you structure complex agents. But the breaking changes from 1.x, combined with sparse alpha docs, mean the announcement is ahead of the developer experience right now. If your use case needs stateful multi-step workflows or the new debugging tooling, keep watching it. If you need to build something today, use 1.x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The instruction prompt is load-bearing.&lt;/strong&gt; The quality of your agent's output is almost entirely determined by how clearly you write the &lt;code&gt;instruction&lt;/code&gt; field. ADK doesn't abstract that away — it amplifies it. A vague instruction gives you a vague agent. I rewrote mine three times before the summarizer stopped adding unnecessary corporate-sounding hedges to its output. That's not a framework problem, but it's worth knowing going in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory across sessions is still your problem.&lt;/strong&gt; Each conversation starts fresh. If you want stateful agents that remember context across sessions, you need to wire that up yourself. ADK 2.0's improvements here are in the roadmap, but they're not fully baked yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Honest Verdict
&lt;/h2&gt;

&lt;p&gt;ADK is the right direction. The multi-agent pattern it encourages — specialists coordinated by a root agent — produces noticeably better results than stuffing everything into one massive system prompt. The tooling is clean, the Google Search integration works, and the web UI for inspection is genuinely developer-friendly.&lt;/p&gt;

&lt;p&gt;For a first-year student or someone new to agents: &lt;strong&gt;start here.&lt;/strong&gt; You'll be running something real in under two hours.&lt;/p&gt;

&lt;p&gt;For someone wanting to use the ADK 2.0 graph workflows specifically: &lt;strong&gt;give it another month or two.&lt;/strong&gt; The alpha is progressing fast, but it's not ready to be the foundation of a tutorial you'll publish and stand behind.&lt;/p&gt;

&lt;p&gt;The most interesting thing NEXT '26 signalled to me isn't any single announcement — it's the pattern. Google is betting that the future of cloud AI isn't individual models you call via API, but &lt;em&gt;coordinated networks of specialist agents&lt;/em&gt; running on managed infrastructure. ADK is their framework for that future. Whether you agree with the bet or not, it's worth understanding how it works.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get the Code
&lt;/h2&gt;

&lt;p&gt;The full project is on GitHub: &lt;strong&gt;&lt;a href="https://github.com/SimranShaikh20/Research-Assistant-Agent" rel="noopener noreferrer"&gt;https://github.com/SimranShaikh20/Research-Assistant-Agent&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Research-Assistant-Agent/
├── Research-Assistant-Agent/
│   ├── agent.py        ← all the agent logic
│   └── __init__.py
├── .env.example        ← copy this to .env, add your key
├── .gitignore
└── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run it yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SimranShaikh20/Research-Assistant-Agent
&lt;span class="nb"&gt;cd &lt;/span&gt;Research-Assistant-Agent
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate        &lt;span class="c"&gt;# Windows&lt;/span&gt;
&lt;span class="c"&gt;# source venv/bin/activate   # Mac/Linux&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;google-adk
&lt;span class="c"&gt;# copy .env.example to .env, add your Gemini API key&lt;/span&gt;
adk web research_agent/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;ADK Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/2.0/" rel="noopener noreferrer"&gt;ADK 2.0 Alpha docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aistudio.google.com" rel="noopener noreferrer"&gt;Google AI Studio — free API key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/adk-samples" rel="noopener noreferrer"&gt;ADK Samples on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26" rel="noopener noreferrer"&gt;Google Cloud NEXT '26 Announcements&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://dev.to/challenges/googlecloudnext"&gt;Google Cloud NEXT '26 Writing Challenge&lt;/a&gt; on DEV Community. I'm a final-year BE Computer Science student at The Maharaja Sayajirao University of Baroda, where my major project is an AI-based defect detection system — so building agents like this is a bit of a departure from my usual ResNet-50 territory. Turned out to be worth the detour.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloudnextchallenge</category>
      <category>googlecloud</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>🌍 Earth's Last Letter</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Mon, 20 Apr 2026 04:02:23 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/earths-last-letter-pch</link>
      <guid>https://dev.to/simranshaikh20_50/earths-last-letter-pch</guid>
      <description>&lt;h1&gt;
  
  
  The Planet Writes You a Personal Letter Using Real Climate Data + Gemini AI
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This is a submission for &lt;a href="https://dev.to/challenges/weekend-2026-04-16"&gt;Weekend Challenge: Earth Day Edition&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Earth's Last Letter&lt;/strong&gt; is an AI-powered web app where Earth — the planet itself, 4.5 billion years old — writes you a deeply personal, poetic letter based on the city you grew up in and the year you were born.&lt;/p&gt;

&lt;p&gt;You enter two things: your city and your birth year.&lt;/p&gt;

&lt;p&gt;Earth remembers the rest.&lt;/p&gt;

&lt;p&gt;Every letter is completely unique. The app pulls &lt;strong&gt;real historical CO₂ data&lt;/strong&gt; from NOAA's Mauna Loa Observatory measurements, your &lt;strong&gt;exact city coordinates&lt;/strong&gt; via Open-Meteo's geocoding API, and feeds all of it into a carefully engineered Google Gemini prompt that generates a 400-word letter written from Earth's perspective — not as a scientist, not as an activist, but as a grieving mother writing to a child she loves.&lt;/p&gt;

&lt;p&gt;The letter tells you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What your city's air smelled like the season you were born (CO₂ was measurably lower)&lt;/li&gt;
&lt;li&gt;What Earth remembers about your childhood years in that specific place&lt;/li&gt;
&lt;li&gt;One specific change happening near your city right now — not generic warming stats, but Mumbai's retreating monsoon patterns, London's disappearing frost days, Delhi's unprecedented heat islands&lt;/li&gt;
&lt;li&gt;What your city might feel like in 2050 if the trajectory holds&lt;/li&gt;
&lt;li&gt;One intimate, local action only someone from your city could take&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does not say "reduce your carbon footprint." It does not say "go green." It sounds like a letter left under a stone in a forest.&lt;/p&gt;

&lt;p&gt;Here's a sample output for &lt;strong&gt;Mumbai, 1995&lt;/strong&gt;:&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Dear child of 1995,&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You arrived in Mumbai during the first weeks of June, when the southwest monsoon was still two weeks away and the city held its breath in that particular amber heat only those who know her understand. The air carried 360 parts per million of carbon then — still too much, but enough that the Arabian Sea breeze coming off Marine Drive in the evenings felt clean in a way you may not remember but your lungs do...&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;...But I am still here. And so are you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;With all the time I have left,&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Earth&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;The UI is built to match — dark navy background, aged parchment letter card, typewriter reveal animation, city postmark stamp, and rotating loading messages ("Earth is remembering your city...", "Searching through 4.5 billion years of memory...").&lt;/p&gt;

&lt;p&gt;One click copies the letter. One click shares to X. Every interaction feels like touching something ancient.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live App → &lt;a href="https://earth-last-letter.netlify.app/" rel="noopener noreferrer"&gt;earths-last-letter.netlify.app&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Try it with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mumbai + 1995&lt;/strong&gt; — monsoon memories, Arabian Sea heat, coastal flooding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;London + 1988&lt;/strong&gt; — disappearing frost days, Thames flooding risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delhi + 1990&lt;/strong&gt; — Yamuna river, unprecedented heat island, air quality history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New York + 2000&lt;/strong&gt; — hurricane patterns, Hudson River, coastal erosion&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;🔑 You'll need a free Gemini API key from &lt;a href="https://aistudio.google.com/app/apikey" rel="noopener noreferrer"&gt;aistudio.google.com&lt;/a&gt; — takes 30 seconds to get.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="" class="article-body-image-wrapper"&gt;&lt;img alt="App screenshot showing dark UI with parchment letter card"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/SimranShaikh20" rel="noopener noreferrer"&gt;
        SimranShaikh20
      &lt;/a&gt; / &lt;a href="https://github.com/SimranShaikh20/Earths-Last-Lettter" rel="noopener noreferrer"&gt;
        Earths-Last-Lettter
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;🌍 Earth's Last Letter&lt;/h1&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The planet writes you a personal letter — from your birth year to 2050."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What Is This?&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Earth's Last Letter&lt;/strong&gt; is an AI-powered web app where Earth — the planet itself — writes you a deeply personal, poetic letter based on the year you were born and the city you grew up in.&lt;/p&gt;
&lt;p&gt;Every letter is unique. Every letter is grounded in &lt;strong&gt;real climate data&lt;/strong&gt;. Every letter is written as if Earth is an ailing parent writing to a child it loves but is slowly losing the strength to sustain.&lt;/p&gt;
&lt;p&gt;You enter your city and birth year. Earth remembers the rest.&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;✨ Features&lt;/h2&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🌿 &lt;strong&gt;Hyper-personalized letters&lt;/strong&gt; — Every output is unique to your exact city + birth year combination. No two letters are the same.&lt;/li&gt;
&lt;li&gt;📊 &lt;strong&gt;Real climate data&lt;/strong&gt; — CO₂ levels at your birth year (Mauna Loa historical data), current CO₂ levels, temperature anomaly…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/SimranShaikh20/Earths-Last-Lettter" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;The full repo includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete React + Vite frontend&lt;/li&gt;
&lt;li&gt;Gemini API integration with structured prompt&lt;/li&gt;
&lt;li&gt;Open-Meteo geocoding and climate data&lt;/li&gt;
&lt;li&gt;Vercel deployment config&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Core Insight — Why This Approach
&lt;/h3&gt;

&lt;p&gt;Every climate app I've seen shows you graphs. Numbers. Percentages. The problem isn't that people don't have information — it's that information doesn't move people. Stories do.&lt;/p&gt;

&lt;p&gt;I wanted to build something that made climate data &lt;em&gt;feel personal&lt;/em&gt; rather than abstract. The insight was simple: if you know exactly what the CO₂ level was the year someone was born, you can tell them something true and specific about how the air has changed in &lt;em&gt;their own lifetime&lt;/em&gt;. That's not a statistic. That's their life.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React + Vite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Styling&lt;/td&gt;
&lt;td&gt;Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Generation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Google Gemini 2.0 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Climate Data&lt;/td&gt;
&lt;td&gt;Open-Meteo API (free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CO₂ Data&lt;/td&gt;
&lt;td&gt;NOAA Mauna Loa historical readings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Geocoding&lt;/td&gt;
&lt;td&gt;Open-Meteo Geocoding API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Vercel&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 1 — Getting Real Data
&lt;/h3&gt;

&lt;p&gt;I pull two real data sources before touching Gemini at all:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;City coordinates&lt;/strong&gt; via Open-Meteo's free geocoding API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`https://geocoding-api.open-meteo.com/v1/search?name=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;count=1`&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;longitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;country&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Historical CO₂ levels&lt;/strong&gt; from NOAA's Mauna Loa Observatory measurements, hardcoded from real published data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;co2Map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="mi"&gt;1950&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;310&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1955&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;313&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1960&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;317&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1965&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1970&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;325&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;1975&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;331&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1980&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;338&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1985&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;345&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1990&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;354&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1995&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;360&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;369&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2005&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2010&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;389&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2015&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2020&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;412&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;422&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;424&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;birthCO2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;co2Map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;birthYear&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;co2Rise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;424&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;birthCO2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For someone born in 1995 in Mumbai, the app now knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CO₂ at birth: 360 ppm&lt;/li&gt;
&lt;li&gt;CO₂ today: 424 ppm&lt;/li&gt;
&lt;li&gt;Rise in their lifetime: +64 ppm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's real data. That goes into the prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — The Gemini Prompt (the hard part)
&lt;/h3&gt;

&lt;p&gt;This is where most of the work is. Getting Gemini to write something that sounds like &lt;em&gt;literature&lt;/em&gt; rather than an AI response required a very specific prompt structure.&lt;/p&gt;

&lt;p&gt;The key decisions I made:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Strict paragraph structure with word counts&lt;/strong&gt;&lt;br&gt;
Instead of asking for "a letter," I specify exactly 6 paragraphs with word counts for each (60, 70, 80, 70, 60, 40 words). This produces consistent, well-paced output every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Banned vocabulary list&lt;/strong&gt;&lt;br&gt;
The prompt explicitly forbids: "carbon footprint", "going green", "save the planet", "climate change" (as a phrase), "sustainability". These words have been so overused in environmental messaging that they've become invisible. Banning them forces Gemini to find fresh, sensory language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. City-specific climate instructions&lt;/strong&gt;&lt;br&gt;
The prompt tells Gemini to reference ONE specific local phenomenon based on the city type — glaciers if mountain, sea level if coastal, heat island if major metro, monsoon patterns if South Asian city. This makes every letter feel locally grounded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Voice constraint — the most important one&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YOUR VOICE: You are not angry. You are not lecturing. 
You are a mother watching her child grow up while she grows sick. 
Ancient patience, deep love, quiet grief.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single voice instruction transforms the output from an environmental essay into something that reads like correspondence.&lt;/p&gt;

&lt;p&gt;Here's the Gemini API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;letter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3 — The UI
&lt;/h3&gt;

&lt;p&gt;The design is intentionally counter to modern app aesthetics. While everyone builds clean, minimal, light-mode dashboards, this app goes dark, ancient, and warm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The parchment card&lt;/strong&gt; uses a warm cream background (&lt;code&gt;#f5f0e8&lt;/code&gt;), dark brown text (&lt;code&gt;#2c1810&lt;/code&gt;), Georgia serif font at 1.9 line height, and subtle box-shadow to create a paper texture without any actual images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The typewriter animation&lt;/strong&gt; reveals the letter character by character. This was a deliberate choice — it forces users to &lt;em&gt;read&lt;/em&gt; rather than skim, and creates the feeling that Earth is writing to you in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Loading states&lt;/strong&gt; rotate through 4 messages every 2 seconds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Earth is remembering your city..."&lt;/li&gt;
&lt;li&gt;"Searching through 4.5 billion years of memory..."&lt;/li&gt;
&lt;li&gt;"Weaving real climate data into your letter..."&lt;/li&gt;
&lt;li&gt;"Almost ready — Earth writes slowly, carefully..."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't just UX filler. They're part of the narrative.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;p&gt;The biggest lesson: &lt;strong&gt;prompt engineering is product design&lt;/strong&gt;. The quality of the letter is entirely a function of how precisely I constrained Gemini's output. Every word count, every banned phrase, every voice instruction translates directly into a better user experience. The AI isn't doing creative work — it's executing a very specific creative brief.&lt;/p&gt;

&lt;p&gt;The second lesson: &lt;strong&gt;real data beats fake data every time&lt;/strong&gt;. Knowing that someone born in 1990 breathed air with 354 ppm CO₂ — and that we're at 424 ppm today — makes the letter hit differently than if I'd just prompted Gemini to "say something emotional about climate change." The specificity is what creates the emotional resonance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prize Categories
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;🏆 Best Use of Google Gemini&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google Gemini 2.0 Flash is the core of this project — not as a chatbot, but as a &lt;strong&gt;structured narrative engine&lt;/strong&gt;. The app feeds real API data (city coordinates, historical CO₂ levels, climate context) into a precisely engineered prompt that produces a consistently high-quality, emotionally grounded 400-word letter every single time.&lt;/p&gt;

&lt;p&gt;The innovation isn't just &lt;em&gt;using&lt;/em&gt; Gemini — it's the architecture around it: real data in → structured prompt → constrained creative output → literature-quality result. This is a meaningfully different use case than most AI integrations, which treat language models as question-answering systems. Here, Gemini is a writer following a very specific brief.&lt;/p&gt;

&lt;p&gt;The prompt includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paragraph-level word count constraints&lt;/li&gt;
&lt;li&gt;Banned vocabulary list (forces fresh language)&lt;/li&gt;
&lt;li&gt;City-specific climate instruction rules&lt;/li&gt;
&lt;li&gt;Strict voice and tone specification&lt;/li&gt;
&lt;li&gt;Mandatory real data integration in every paragraph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is that the app produces output that users genuinely share, screenshot, and send to family members — which is the real test of whether an AI generation is working.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built over one weekend for Earth Day 2026. Every letter is different. Every letter is true.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Try yours at &lt;a href="https://earth-last-letter.netlify.app/" rel="noopener noreferrer"&gt;https://earth-last-letter.netlify.app/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>weekendchallenge</category>
    </item>
    <item>
      <title>🚀 FreelanceOS</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Sun, 08 Mar 2026 11:47:50 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/freelanceos-ai-powered-freelancer-operating-system-15ll</link>
      <guid>https://dev.to/simranshaikh20_50/freelanceos-ai-powered-freelancer-operating-system-15ll</guid>
      <description>&lt;h1&gt;
  
  
  🚀 FreelanceOS — AI-Powered Operating System for Freelancers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;FreelanceOS is a complete AI-powered operating system for freelancers and solopreneurs, built entirely on &lt;strong&gt;Notion MCP + Google Gemini AI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Freelancers waste &lt;strong&gt;5–10 hours every week&lt;/strong&gt; on admin work that doesn't pay — writing contracts, creating invoices, sending client update emails, and chasing unpaid payments. FreelanceOS eliminates all of that.&lt;/p&gt;

&lt;p&gt;You type a few words. FreelanceOS generates a &lt;strong&gt;professional AI-written contract, invoice, or client email — and saves it directly into your Notion workspace automatically.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  The Problem It Solves
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Admin Task&lt;/th&gt;
&lt;th&gt;Time Wasted Per Week&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Writing freelance contracts&lt;/td&gt;
&lt;td&gt;1–2 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creating &amp;amp; formatting invoices&lt;/td&gt;
&lt;td&gt;30–60 mins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writing client update emails&lt;/td&gt;
&lt;td&gt;20–30 mins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracking unpaid invoices&lt;/td&gt;
&lt;td&gt;Hours per month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managing clients &amp;amp; projects across tools&lt;/td&gt;
&lt;td&gt;Daily friction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;FreelanceOS collapses all of this into &lt;strong&gt;one AI-powered Notion workspace&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  ✨ Core Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;📊 AI Dashboard&lt;/strong&gt;&lt;br&gt;
Pulls live data from all 5 Notion databases and feeds it to Gemini AI, which analyzes your portfolio and gives you personalized business insights — total revenue potential, overdue projects, workload balance, and 3 actionable recommendations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w27133wvm11z9igs22q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w27133wvm11z9igs22q.png" alt="AI Dashboard" width="800" height="624"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;📄 AI Contract Generator&lt;/strong&gt;&lt;br&gt;
Enter client name, project description, budget, and deadline. FreelanceOS generates a complete professional freelance contract with scope, payment terms, revision policy, ownership rights, and termination clause — saved instantly to your Notion Contracts database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllukr7nvlvj2sz78cgka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllukr7nvlvj2sz78cgka.png" alt="Contract Generator" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;🧾 AI Invoice Generator&lt;/strong&gt;&lt;br&gt;
Enter client name, amount, and work done. FreelanceOS generates a professional itemized invoice with payment instructions and due dates — saved to your Notion Invoices database as "Unpaid" and tracked automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyq672fcw7ruqr9fq2wgc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyq672fcw7ruqr9fq2wgc.png" alt="Invoice Generator" width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;👥 Client &amp;amp; Project Management&lt;/strong&gt;&lt;br&gt;
Full CRUD operations on Clients and Projects — all stored and managed through Notion MCP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fak8oikv7s92qhfej9jys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fak8oikv7s92qhfej9jys.png" alt="Add User" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85pcgjsvabn42oc6u4ty.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85pcgjsvabn42oc6u4ty.png" alt="Add Project" width="800" height="545"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;🚪 Clean Exit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4max4eovmyw9vq7exef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4max4eovmyw9vq7exef.png" alt="Exit Screen" width="800" height="663"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  🗺️ System Architecture
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input (CLI)
      │
      ▼
FreelanceOS (Python)
      │
      ├──▶ Google Gemini AI ──▶ AI-Generated Content
      │                               │
      └──▶ Notion MCP API ◀───────────┘
                │
                ▼
        Notion Workspace
    ┌──────────────────────┐
    │  Clients   Projects  │
    │  Invoices  Contracts │
    │  Expenses            │
    └──────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Show us the code
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/SimranShaikh20/FreelanceOS" rel="noopener noreferrer"&gt;github.com/SimranShaikh20/FreelanceOS&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;freelance-os/
│
├── main.py                 ← Entry point
├── notion_helper.py        ← All Notion MCP API calls
├── ai_helper.py            ← All Gemini AI calls
├── requirements.txt
├── .env.example
│
└── features/
    ├── dashboard.py        ← AI-powered insights
    ├── clients.py          ← Client management
    ├── projects.py         ← Project tracking
    ├── contracts.py        ← AI contract generation
    ├── invoices.py         ← AI invoice generation
    └── emails.py           ← AI email generation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Code Snippets
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AI Contract Generation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_contract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Write a professional freelance contract:
    Client: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;client_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Project: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_desc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Budget: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Deadline: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Include: scope, payment terms, revision policy,
    ownership rights, termination clause
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Saving to Notion MCP:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_contract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;db_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTRACTS_DB_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;db_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Contract - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;client_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}]},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Client&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rich_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;client_name&lt;/span&gt;&lt;span class="p"&gt;}}]},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rich_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;]}}]},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multi_select&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;notion_post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;AI Dashboard Insights:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_project_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;project_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;budget&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Analyze these freelance projects:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_list&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Give: revenue potential, attention needed,
    workload assessment, 3 recommendations.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Notion MCP
&lt;/h2&gt;

&lt;p&gt;Notion MCP is not just a storage layer in FreelanceOS — &lt;strong&gt;it IS the operating system.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Integration
&lt;/h3&gt;

&lt;p&gt;FreelanceOS uses Notion MCP as its single source of truth across 5 databases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Notion Database&lt;/th&gt;
&lt;th&gt;What FreelanceOS Stores&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Clients&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Name, email, active/inactive status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Projects&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Name, budget, deadline, progress status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Invoices&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI-generated invoice content, amount, paid/unpaid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Contracts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full AI-generated contract text, draft/signed status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Expenses&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Category, amount, date for tax tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What Notion MCP Unlocks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Real-time AI + Notion sync&lt;/strong&gt;&lt;br&gt;
Every AI-generated document (contract, invoice) is immediately written to the correct Notion database via the MCP API. No copy-paste. No manual entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Live business intelligence&lt;/strong&gt;&lt;br&gt;
The Dashboard pulls live data from all 5 Notion databases simultaneously, feeds it to Gemini AI, and returns intelligent business insights about your freelance operation — all in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Persistent workflow memory&lt;/strong&gt;&lt;br&gt;
Because everything lives in Notion, your freelance OS remembers every client, project, invoice, and contract across sessions. Notion MCP turns a Python script into a &lt;strong&gt;stateful business operating system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Human-in-the-loop control&lt;/strong&gt;&lt;br&gt;
Every AI-generated output is reviewed by the freelancer before saving to Notion. The human stays in control — AI handles the generation, Notion handles the storage, the freelancer makes the final call.&lt;/p&gt;




&lt;h3&gt;
  
  
  🛠️ Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Notion MCP&lt;/strong&gt; — Core workspace and data layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini 1.5 Flash&lt;/strong&gt; — AI generation (free tier)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3&lt;/strong&gt; — Application logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich&lt;/strong&gt; — Beautiful terminal UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Requests&lt;/strong&gt; — Notion API HTTP client&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🚀 Try It Yourself
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/SimranShaikh20/FreelanceOS
&lt;span class="nb"&gt;cd &lt;/span&gt;FreelanceOS
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="c"&gt;# Add your API keys to .env&lt;/span&gt;
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full setup guide in the &lt;a href="https://github.com/SimranShaikh20/FreelanceOS" rel="noopener noreferrer"&gt;README&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>notionchallenge</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Docling is a Game-Changer for RAG Systems</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Tue, 03 Feb 2026 09:47:45 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/docling-is-a-game-changer-for-rag-systems-2ce4</link>
      <guid>https://dev.to/simranshaikh20_50/docling-is-a-game-changer-for-rag-systems-2ce4</guid>
      <description>&lt;h1&gt;
  
  
  Why Docling is a Game-Changer for RAG Systems: Moving Beyond Basic Text Extraction
&lt;/h1&gt;

&lt;p&gt;In the rapidly evolving world of Retrieval-Augmented Generation (RAG), we're constantly seeking ways to improve accuracy and reliability. While traditional RAG systems have made great strides, they often stumble when faced with real-world enterprise documents—PDFs with complex layouts, financial reports packed with tables, or technical documentation spanning multiple formats.&lt;/p&gt;

&lt;p&gt;Enter Docling: an open-source document processing library from IBM Research that's transforming how we handle documents in RAG pipelines. In this post, I'll explain what makes Docling special and why it might be the missing piece in your RAG architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Traditional RAG
&lt;/h2&gt;

&lt;p&gt;Let's start by understanding what typically happens in a conventional RAG system when processing a document:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load the document&lt;/strong&gt; using a basic PDF or text extractor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Split the text&lt;/strong&gt; into chunks (usually by character count)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed the chunks&lt;/strong&gt; using your chosen model&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Store in a vector database&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve relevant chunks&lt;/strong&gt; when queried&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate answers&lt;/strong&gt; using an LLM with the retrieved context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sounds straightforward, right? The problem is step 2—the chunking strategy. Traditional approaches treat documents as plain text streams, splitting them arbitrarily based on character counts or token limits. This creates several critical issues:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tables become gibberish.&lt;/strong&gt; A beautifully formatted table showing quarterly revenue becomes: "Q1 Revenue 100M Q2 Revenue 150M Q3..." Good luck querying that accurately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context gets fragmented.&lt;/strong&gt; Important information gets split mid-sentence, mid-paragraph, or worse—right in the middle of a crucial table or chart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structure is lost.&lt;/strong&gt; Headers, sections, lists, figures—all the semantic structure that makes documents readable and meaningful gets stripped away.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layout complexity fails.&lt;/strong&gt; Multi-column layouts get read across columns instead of down them. Headers and footers pollute the content. Footnotes appear randomly in the text.&lt;/p&gt;

&lt;p&gt;The result? A RAG system that works okay on simple text documents but frustrates users when dealing with the complex, structured documents that actually matter in enterprise settings.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Docling Changes the Game
&lt;/h2&gt;

&lt;p&gt;Docling takes a fundamentally different approach. Instead of treating documents as text streams, it understands them as structured objects with semantic meaning. Here's what that means in practice:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Structure-Aware Parsing
&lt;/h3&gt;

&lt;p&gt;Docling doesn't just extract text—it identifies and labels every element in your document: headers, paragraphs, tables, lists, figures, captions. It understands the hierarchical relationships between sections and subsections. It preserves the reading order even in complex multi-column layouts.&lt;/p&gt;

&lt;p&gt;Think of it as the difference between having someone read you individual words from a newspaper versus having them explain the article's structure, where each section fits, and how the information flows.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Intelligent Chunking
&lt;/h3&gt;

&lt;p&gt;With Docling, chunks respect document structure. Instead of splitting every 500 characters, you can chunk by logical units:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete sections or subsections&lt;/li&gt;
&lt;li&gt;Entire tables (preserved in their tabular format)&lt;/li&gt;
&lt;li&gt;Full paragraphs with their associated headers&lt;/li&gt;
&lt;li&gt;Lists with all their items intact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each chunk becomes a semantically complete unit that makes sense on its own, rather than an arbitrary slice of text.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Rich Metadata
&lt;/h3&gt;

&lt;p&gt;Every chunk Docling creates comes with valuable metadata:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which section it belongs to (including the section hierarchy)&lt;/li&gt;
&lt;li&gt;What page it's from&lt;/li&gt;
&lt;li&gt;What type of content it is (heading, paragraph, table, list)&lt;/li&gt;
&lt;li&gt;Its position in the document structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This metadata enables powerful retrieval strategies. You can filter results to only search tables, prioritize content from executive summaries, or boost matches from specific sections.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Table and Structured Data Preservation
&lt;/h3&gt;

&lt;p&gt;This is where Docling truly shines. Financial reports, technical specifications, comparison tables—all preserved in their original structure. When your RAG system retrieves a table, it gets the actual table, with rows and columns intact and queryable.&lt;/p&gt;

&lt;p&gt;No more "What was Q2 revenue?" returning garbled text that might or might not contain the right number.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multi-Format Consistency
&lt;/h3&gt;

&lt;p&gt;Whether you're processing PDFs (including scanned documents with OCR), Word documents, PowerPoint presentations, images, or HTML, Docling provides consistent, high-quality extraction through a unified pipeline. One processing approach, reliable results across all formats.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Impact: The Numbers
&lt;/h2&gt;

&lt;p&gt;Let me share some typical performance improvements when moving from traditional RAG to Docling-enhanced RAG:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table query accuracy:&lt;/strong&gt; 30% → 85%&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Context preservation:&lt;/strong&gt; 50% → 90%&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Multi-column document handling:&lt;/strong&gt; 35% → 88%&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Structured data retrieval:&lt;/strong&gt; 25% → 92%&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Complex PDF processing:&lt;/strong&gt; 40% → 87%&lt;/p&gt;

&lt;p&gt;These aren't minor improvements—they're the difference between a RAG system users tolerate and one they actually rely on.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Practical Example
&lt;/h2&gt;

&lt;p&gt;Let's see this in action. Imagine you're building a RAG system for financial analysis. A user asks: "What was the year-over-year revenue growth in Q3?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional RAG might retrieve:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"...Q3 Rev 180M Product A Sales 50M Product B..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM has to guess at what Q2 was, what the previous year was, and hope it didn't miss relevant chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docling-enhanced RAG retrieves:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Text: [Structured table data]
| Quarter | 2024 Revenue | 2023 Revenue | YoY Growth |
|---------|--------------|--------------|------------|
| Q3      | $180M        | $165M        | +9.1%      |

Metadata:
  Section: "Financial Performance &amp;gt; Quarterly Results"
  Page: 8
  Type: Table
  Parent: "Financial Overview"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM gets the complete table with clear structure, plus contextual metadata. The answer practically writes itself: "Year-over-year revenue growth in Q3 was 9.1%, increasing from $165M to $180M."&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Strategy
&lt;/h2&gt;

&lt;p&gt;Integrating Docling into your RAG pipeline is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install Docling&lt;/strong&gt; and set up the document converter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process your documents&lt;/strong&gt; through Docling instead of basic text extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export to your preferred format&lt;/strong&gt; (markdown, JSON, or custom)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement structure-aware chunking&lt;/strong&gt; using Docling's element boundaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enrich your embeddings&lt;/strong&gt; with Docling's metadata&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store and retrieve&lt;/strong&gt; using your existing vector database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The beauty is that Docling plugs into your existing RAG architecture—you're not rebuilding from scratch, just replacing the document processing layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Docling Makes Sense (and When It Doesn't)
&lt;/h2&gt;

&lt;p&gt;Docling is particularly valuable for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Financial reports and statements&lt;/strong&gt; with extensive tables and charts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical documentation&lt;/strong&gt; with complex layouts and structured information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research papers&lt;/strong&gt; with equations, figures, and citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal documents&lt;/strong&gt; requiring precise section tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise document collections&lt;/strong&gt; spanning multiple formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Any scenario&lt;/strong&gt; where structure and tables matter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You might not need Docling for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple text-only documents (blog posts, novels, articles)&lt;/li&gt;
&lt;li&gt;Clean markdown files without complex structure&lt;/li&gt;
&lt;li&gt;Very short documents where chunking isn't critical&lt;/li&gt;
&lt;li&gt;Use cases where document structure doesn't impact answers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Trade-off
&lt;/h2&gt;

&lt;p&gt;There's always a trade-off, and with Docling it's processing time. Converting documents through Docling's structural analysis takes longer than basic text extraction—sometimes 2-3x longer during the initial ingestion phase.&lt;/p&gt;

&lt;p&gt;But here's the key insight: &lt;strong&gt;you pay this cost once during document processing, and you benefit from it with every single query thereafter.&lt;/strong&gt; Spending an extra second processing a document to get 3x better retrieval accuracy on thousands of future queries is an easy trade-off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Forward: The Future of Document-Aware RAG
&lt;/h2&gt;

&lt;p&gt;Docling represents a broader shift in how we think about RAG systems. We're moving from "search and generate" to "understand and retrieve." The next generation of RAG systems will be document-aware, structure-preserving, and semantically intelligent.&lt;/p&gt;

&lt;p&gt;As RAG moves from proof-of-concept to production deployment in enterprise environments, the difference between basic text extraction and intelligent document processing becomes critical. Users don't just want answers—they want accurate, reliable, well-sourced answers. They want systems that understand the structure and meaning of their documents, not just the words.&lt;/p&gt;

&lt;p&gt;Docling helps us build those systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Ready to try Docling in your RAG pipeline? Here are your next steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check out the &lt;a href="https://github.com/docling-project/docling" rel="noopener noreferrer"&gt;Docling GitHub repository&lt;/a&gt; for documentation and examples&lt;/li&gt;
&lt;li&gt;Start with a small test set of your most problematic documents—the ones where traditional RAG fails&lt;/li&gt;
&lt;li&gt;Compare retrieval quality before and after Docling integration&lt;/li&gt;
&lt;li&gt;Measure the impact on your specific use cases and metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The promise of RAG is that we can give LLMs access to vast knowledge bases and get accurate, grounded answers. But that promise only holds if the retrieval part actually works—if we can find the right information and present it in a way the LLM can understand.&lt;/p&gt;

&lt;p&gt;Docling doesn't just make retrieval better; it makes it fundamentally more aligned with how documents actually work. It respects their structure, preserves their meaning, and maintains their context. For anyone building serious RAG systems on real-world documents, that's not just an improvement—it's essential.&lt;/p&gt;

&lt;p&gt;The question isn't whether to use document-aware processing in your RAG pipeline. It's whether you can afford not to.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you tried Docling in your RAG systems? What results have you seen? Share your experiences in the comments below.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>GitHub Repository Intelligence Assistant</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Mon, 02 Feb 2026 03:03:01 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/github-repository-intelligence-assistant-1lk6</link>
      <guid>https://dev.to/simranshaikh20_50/github-repository-intelligence-assistant-1lk6</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repository Intelligence Assistant&lt;/strong&gt; - A web application that helps developers understand any GitHub repository through AI-powered conversations and automated analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;When developers encounter a new repository, they face several challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⏰ &lt;strong&gt;Time-consuming exploration&lt;/strong&gt;: Spending 2-3 hours reading code to understand structure&lt;/li&gt;
&lt;li&gt;🤔 &lt;strong&gt;No context&lt;/strong&gt;: Difficult to know where to start in large codebases
&lt;/li&gt;
&lt;li&gt;📚 &lt;strong&gt;Documentation gaps&lt;/strong&gt;: Missing or outdated setup instructions&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Repeated questions&lt;/strong&gt;: Same questions asked by every new contributor&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;Setup friction&lt;/strong&gt;: Trial and error to get the project running&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;This tool provides instant repository intelligence by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔍 &lt;strong&gt;Automatic Analysis&lt;/strong&gt;: Fetches and analyzes GitHub repositories in seconds&lt;/li&gt;
&lt;li&gt;💬 &lt;strong&gt;AI Conversations&lt;/strong&gt;: Ask questions about code in natural language&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Smart Answers&lt;/strong&gt;: Get context-aware responses based on actual repository content&lt;/li&gt;
&lt;li&gt;🏗️ &lt;strong&gt;Architecture Insights&lt;/strong&gt;: Understand code structure without digging through files&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;Dependency Detection&lt;/strong&gt;: See what technologies and packages are used&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Enter any GitHub repository URL&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;App fetches repository structure via GitHub API&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyzes and prioritizes important files&lt;/strong&gt; (README, configs, source code)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ask questions in chat interface&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get AI-powered answers&lt;/strong&gt; using Claude API with repository context&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🌐 &lt;strong&gt;&lt;a href="https://repomind-ai.netlify.app/" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt;&lt;/strong&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Screenshots
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Repository Input&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://postimg.cc/QBZCWh9f" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34xbksljbnvf1hhh0hmn.png" alt="Screenshot-2026-02-01-162850.png" width="800" height="483"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Simple interface to enter any GitHub repository URL&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Repository Analysis Dashboard&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://postimg.cc/NLC0k19V" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flv9xkwlk3lg98splcpcz.png" alt="Screenshot-2026-02-01-162658.png" width="800" height="483"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Shows repository stats, files analyzed, and key information&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. AI Chat Interface&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://postimg.cc/BLYJN06F" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmoknzz1xlc7h0fslgbj.png" alt="Screenshot-2026-02-01-162809.png" width="800" height="476"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Natural language conversations about the codebase&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Test It With These Repositories:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/facebook/react
https://github.com/vercel/next.js
https://github.com/django/django
https://github.com/SimranShaikh20/DevOps-Autopilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;p&gt;Building this project gave me hands-on experience with GitHub Copilot CLI's capabilities. Here's how it accelerated my development:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Project Setup &amp;amp; Boilerplate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Initial Setup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh copilot suggest &lt;span class="s2"&gt;"create React app with Vite and Tailwind CSS"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Instead of manually setting up configurations, Copilot CLI provided exact commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm create vite@latest repo-qa &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--template&lt;/span&gt; react
&lt;span class="nb"&gt;cd &lt;/span&gt;repo-qa
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; tailwindcss postcss autoprefixer
npx tailwindcss init &lt;span class="nt"&gt;-p&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;lucide-react
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time Saved:&lt;/strong&gt; 20-30 minutes of setup and configuration&lt;/p&gt;




&lt;h3&gt;
  
  
  2. GitHub API Integration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge:&lt;/strong&gt; Fetch repository structure and file contents&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot CLI Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh copilot suggest &lt;span class="s2"&gt;"write function to fetch GitHub repository tree recursively with error handling"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Generated Code:&lt;/strong&gt; Complete implementation with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API endpoint construction&lt;/li&gt;
&lt;li&gt;Error handling for 404 and rate limits&lt;/li&gt;
&lt;li&gt;Support for both 'main' and 'master' branches&lt;/li&gt;
&lt;li&gt;Base64 decoding for file contents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; Saved 1+ hour of reading GitHub API documentation&lt;/p&gt;




&lt;h3&gt;
  
  
  3. File Prioritization Logic
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge:&lt;/strong&gt; Sort files by importance (README &amp;gt; configs &amp;gt; source code)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot CLI Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh copilot suggest &lt;span class="s2"&gt;"create prioritization function that ranks files by type with README highest priority"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Generated Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filePriority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;readme&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;package.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.py&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.jsx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;700&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; Clean, efficient solution in minutes instead of iterating on logic&lt;/p&gt;




&lt;h3&gt;
  
  
  4. React Component Development
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Copilot CLI Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh copilot suggest &lt;span class="s2"&gt;"create glassmorphic card component with Tailwind CSS"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Beautiful UI components with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backdrop blur effects&lt;/li&gt;
&lt;li&gt;Gradient borders&lt;/li&gt;
&lt;li&gt;Responsive design&lt;/li&gt;
&lt;li&gt;Proper accessibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Productivity:&lt;/strong&gt; Built UI components 2x faster than manual coding&lt;/p&gt;




&lt;h3&gt;
  
  
  5. State Management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge:&lt;/strong&gt; Managing loading states, errors, and data flow&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot CLI Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh copilot suggest &lt;span class="s2"&gt;"React component with useState for repo data, loading, error states"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Generated:&lt;/strong&gt; Clean state management pattern with proper error boundaries&lt;/p&gt;




&lt;h3&gt;
  
  
  6. Debugging &amp;amp; Bug Fixes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Bug:&lt;/strong&gt; Race condition when switching between repositories&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot CLI Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh copilot suggest &lt;span class="s2"&gt;"fix race condition in React useEffect with cleanup"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Implemented abort controller pattern I wasn't familiar with&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time Saved:&lt;/strong&gt; 30+ minutes of debugging&lt;/p&gt;




&lt;h3&gt;
  
  
  Productivity Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Development Task&lt;/th&gt;
&lt;th&gt;Traditional Approach&lt;/th&gt;
&lt;th&gt;With Copilot CLI&lt;/th&gt;
&lt;th&gt;Time Saved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Project Setup&lt;/td&gt;
&lt;td&gt;45 min&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;td&gt;78%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Integration&lt;/td&gt;
&lt;td&gt;2 hours&lt;/td&gt;
&lt;td&gt;30 min&lt;/td&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI Components&lt;/td&gt;
&lt;td&gt;4 hours&lt;/td&gt;
&lt;td&gt;2 hours&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State Management&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;td&gt;20 min&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;1.5 hours&lt;/td&gt;
&lt;td&gt;30 min&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Development&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~10 days&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~7 days&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  What Copilot CLI Helped Me Build
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Features Built with Copilot CLI Assistance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ GitHub API integration (90% generated)&lt;/li&gt;
&lt;li&gt;✅ File fetching and parsing (85% generated)&lt;/li&gt;
&lt;li&gt;✅ React component structure (70% generated)&lt;/li&gt;
&lt;li&gt;✅ Error handling patterns (80% generated)&lt;/li&gt;
&lt;li&gt;✅ UI styling with Tailwind (60% generated)&lt;/li&gt;
&lt;li&gt;✅ State management logic (75% generated)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Estimated:&lt;/strong&gt; ~65-70% of the codebase was written or enhanced with Copilot CLI&lt;/p&gt;




&lt;h3&gt;
  
  
  Key Learnings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Specific Prompts Get Better Results&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ "Create a function"&lt;/li&gt;
&lt;li&gt;✅ "Create async function to fetch GitHub repo with retry logic and error handling"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Iterate and Refine&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ask follow-up questions to improve generated code&lt;/li&gt;
&lt;li&gt;Request alternative implementations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Learn from Generated Code&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Studied patterns I wasn't familiar with (AbortController, proper async/await)&lt;/li&gt;
&lt;li&gt;Discovered Tailwind utilities I didn't know existed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Time Distribution Changed&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less time on boilerplate and setup&lt;/li&gt;
&lt;li&gt;More time on features and user experience&lt;/li&gt;
&lt;li&gt;Better code quality overall&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Best Copilot CLI Moments
&lt;/h3&gt;

&lt;p&gt;🎯 &lt;strong&gt;Most Helpful:&lt;/strong&gt; When it suggested the entire error handling pattern for API failures&lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Biggest Learning:&lt;/strong&gt; Proper React cleanup functions to prevent memory leaks&lt;/p&gt;

&lt;p&gt;⚡ &lt;strong&gt;Biggest Time Save:&lt;/strong&gt; Auto-generating the repository parsing logic&lt;/p&gt;




&lt;h3&gt;
  
  
  The Development Experience
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before Copilot CLI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Constantly switching between editor, browser, and Stack Overflow&lt;/li&gt;
&lt;li&gt;40% time spent looking up syntax and APIs&lt;/li&gt;
&lt;li&gt;Manual boilerplate writing&lt;/li&gt;
&lt;li&gt;Solo debugging with console.log&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Copilot CLI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stay in terminal and editor - better flow state&lt;/li&gt;
&lt;li&gt;Instant answers to "how do I..." questions
&lt;/li&gt;
&lt;li&gt;Generated boilerplate in seconds&lt;/li&gt;
&lt;li&gt;AI-assisted debugging with explanations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Frontend:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚛️ React 18 (UI framework)&lt;/li&gt;
&lt;li&gt;⚡ Vite (build tool)&lt;/li&gt;
&lt;li&gt;🎨 Tailwind CSS (styling)&lt;/li&gt;
&lt;li&gt;🎭 Lucide React (icons)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;APIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🐙 GitHub REST API (repository data)&lt;/li&gt;
&lt;li&gt;🤖 Claude API - Anthropic (AI responses)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🤖 GitHub Copilot CLI (development acceleration)&lt;/li&gt;
&lt;li&gt;🚀 Vercel (deployment)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone repository&lt;/span&gt;
git clone https://github.com/SimranShaikh20/RepoMindAI.git
&lt;span class="nb"&gt;cd &lt;/span&gt;RepoMindAI

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Set up environment variables&lt;/span&gt;
&lt;span class="c"&gt;# Add your Anthropic API key to .env&lt;/span&gt;
&lt;span class="nv"&gt;VITE_ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here

&lt;span class="c"&gt;# Run development server&lt;/span&gt;
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌐 &lt;strong&gt;Live Demo&lt;/strong&gt;: &lt;a href="https://repomind-ai.netlify.app/" rel="noopener noreferrer"&gt;https://repomind-ai.netlify.app/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/SimranShaikh20/RepoMindAI" rel="noopener noreferrer"&gt;https://github.com/SimranShaikh20/RepoMindAI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Support for private repositories (GitHub OAuth)&lt;/li&gt;
&lt;li&gt;[ ] Code search within repositories&lt;/li&gt;
&lt;li&gt;[ ] Save favorite repositories&lt;/li&gt;
&lt;li&gt;[ ] Export chat conversations&lt;/li&gt;
&lt;li&gt;[ ] Compare multiple repositories&lt;/li&gt;
&lt;li&gt;[ ] Browser extension version&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;This project demonstrates real-world AI application in developer tools. By combining repository analysis with conversational AI, it solves the actual pain point developers face: &lt;strong&gt;quickly understanding unfamiliar codebases&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built with GitHub Copilot CLI&lt;/strong&gt;, this tool showcases how AI assistance can accelerate development while maintaining code quality.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Built with ❤️ and GitHub Copilot CLI&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  GitHubCopilotCLI #DevChallenge #AI #React #DeveloperTools
&lt;/h1&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>RAG &amp; Semantic Search</title>
      <dc:creator>Simran Shaikh</dc:creator>
      <pubDate>Fri, 30 Jan 2026 08:35:15 +0000</pubDate>
      <link>https://dev.to/simranshaikh20_50/rag-semantic-search-12gd</link>
      <guid>https://dev.to/simranshaikh20_50/rag-semantic-search-12gd</guid>
      <description>&lt;p&gt;In the rapidly evolving world of AI and large language models, Retrieval-Augmented Generation (RAG) has emerged as a game-changing technique. If you're building AI applications that need to understand and search through your own data, this guide will walk you through every essential concept you need to know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: Why RAG Matters
&lt;/h2&gt;

&lt;p&gt;Traditional language models have a fundamental limitation: they only know what they were trained on. RAG solves this by teaching AI systems to retrieve and use external knowledge before generating answers. Think of it as giving your AI a library card instead of just relying on what it memorized in school.&lt;/p&gt;

&lt;p&gt;Let's dive into the 20 core concepts that make RAG and semantic search work.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Embeddings: Teaching Machines to Understand Meaning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What are embeddings?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An embedding is a numerical representation of data—whether text, images, or audio—that preserves the underlying meaning. Instead of treating words as arbitrary symbols, embeddings capture their semantic relationships.&lt;/p&gt;

&lt;p&gt;For example, the sentence "Neural networks learn patterns" might become:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[0.12, -0.45, 0.88, 0.34, -0.67, ...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why do we need them?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Computers don't inherently understand language. Embeddings bridge this gap by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enabling meaningful comparisons between pieces of text&lt;/li&gt;
&lt;li&gt;Clustering similar concepts together&lt;/li&gt;
&lt;li&gt;Powering semantic search capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Types of embeddings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text embeddings&lt;/strong&gt;: For documents, queries, and general text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image embeddings&lt;/strong&gt;: For visual content like diagrams and photos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal embeddings&lt;/strong&gt;: Combining text and images (e.g., CLIP)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Popular models:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI's &lt;code&gt;text-embedding-3-large&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Open-source options like BGE, E5, and MiniLM&lt;/li&gt;
&lt;li&gt;CLIP for image embeddings&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Semantic Search: Beyond Keywords
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The fundamental shift&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional keyword search looks for exact word matches. Semantic search understands &lt;em&gt;meaning&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example in action:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Query: &lt;em&gt;"How does backpropagation work?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A document containing &lt;em&gt;"Gradient descent updates weights during neural network training"&lt;/em&gt; would be found by semantic search even though it shares no exact keywords with the query. This is the power of understanding meaning over matching words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Convert all documents into embeddings&lt;/li&gt;
&lt;li&gt;Convert the user's query into an embedding&lt;/li&gt;
&lt;li&gt;Compare the query vector with document vectors&lt;/li&gt;
&lt;li&gt;Return the most semantically similar results&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  3. Vectors: The Language of AI
&lt;/h2&gt;

&lt;p&gt;A vector is simply a list of numbers, like &lt;code&gt;[0.32, -0.14, 0.88, ...]&lt;/code&gt;. Each dimension in this list captures a different aspect of meaning—think of them as coordinates in a multi-dimensional space of concepts.&lt;/p&gt;

&lt;p&gt;When two vectors are close together in this space, their meanings are similar.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Vector Databases: Storage Built for Similarity
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why special databases?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional databases excel at exact matches. Vector databases are optimized for a different question: "What's most similar to this?"&lt;/p&gt;

&lt;p&gt;When you're dealing with millions of embeddings, you need specialized tools for fast similarity search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Popular options:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FAISS&lt;/td&gt;
&lt;td&gt;Local development and research&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chroma&lt;/td&gt;
&lt;td&gt;Simple applications and prototyping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pinecone&lt;/td&gt;
&lt;td&gt;Cloud-scale production systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qdrant&lt;/td&gt;
&lt;td&gt;Open-source deployments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5. Similarity Metrics: Measuring Closeness
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cosine similarity&lt;/strong&gt; is the most common metric for comparing embeddings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;similarity = (A · B) / (|A| × |B|)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result ranges from -1 to 1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt;: Vectors point in the same direction (very similar)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt;: Vectors are perpendicular (unrelated)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;-1&lt;/strong&gt;: Vectors point in opposite directions (opposite meanings)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6. Chunking: Breaking Down Documents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The challenge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Large language models have input limits. A 100-page manual won't fit in a single context window. The solution? Break it into smaller, digestible pieces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunking strategies:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fixed-size&lt;/td&gt;
&lt;td&gt;Every 500 tokens&lt;/td&gt;
&lt;td&gt;Simple, consistent chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sliding window&lt;/td&gt;
&lt;td&gt;Overlapping segments&lt;/td&gt;
&lt;td&gt;Preserves context at boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;Split by topic/paragraph&lt;/td&gt;
&lt;td&gt;Maintains logical coherence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Good chunking preserves complete thoughts. Splitting mid-sentence can harm retrieval quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Indexing: Speed Through Structure
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without indexing, finding similar vectors means comparing your query against every single document vector. With millions of documents, this becomes impossibly slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Indexing creates data structures that enable fast approximate nearest neighbor search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common index types:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HNSW&lt;/strong&gt; (Hierarchical Navigable Small World): Fast and accurate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IVF&lt;/strong&gt; (Inverted File Index): Good for large-scale datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flat&lt;/strong&gt;: Exact search, slower but 100% accurate&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  8. Reranking: Refinement for Precision
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The two-stage approach&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector search is fast but sometimes imprecise. Reranking adds a second, more careful analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Process:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Vector database returns top 20 candidates&lt;/li&gt;
&lt;li&gt;Reranker model scores these 20 more carefully&lt;/li&gt;
&lt;li&gt;Return top 5 best matches&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Tools for reranking:&lt;/strong&gt;&lt;br&gt;
Cross-encoder models that jointly analyze the query and each candidate document provide superior accuracy compared to the independent embeddings used in initial retrieval.&lt;/p&gt;


&lt;h2&gt;
  
  
  9. MMR: Avoiding Redundancy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Maximal Marginal Relevance&lt;/strong&gt; solves a common problem: what if your top 5 results all say the same thing?&lt;/p&gt;

&lt;p&gt;MMR balances two goals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Relevance&lt;/strong&gt;: Results should match the query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diversity&lt;/strong&gt;: Results shouldn't duplicate each other&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensures users get comprehensive information, not repetitive answers.&lt;/p&gt;


&lt;h2&gt;
  
  
  10. Metadata Filtering: Adding Structure to Search
&lt;/h2&gt;

&lt;p&gt;Sometimes semantic similarity isn't enough. You might want results from a specific source, time period, or category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example metadata:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The compressor operates at 150 PSI..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"technical_manual.pdf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"page"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"compressor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-01-15"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Filtered query:&lt;/strong&gt; "Find information about compressors, but only from the technical manual"&lt;/p&gt;

&lt;p&gt;This combines semantic search with structured filtering for more precise results.&lt;/p&gt;




&lt;h2&gt;
  
  
  11. Cross-Encoders vs. Bi-Encoders
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Two architectures for comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;How It Works&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bi-encoder&lt;/td&gt;
&lt;td&gt;Encodes query and document separately&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-encoder&lt;/td&gt;
&lt;td&gt;Processes query and document together&lt;/td&gt;
&lt;td&gt;Slow&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Usage pattern:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use bi-encoders (standard embeddings) for initial retrieval&lt;/li&gt;
&lt;li&gt;Use cross-encoders for reranking the top results&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  12. Hybrid Search: Best of Both Worlds
&lt;/h2&gt;

&lt;p&gt;Pure semantic search has a weakness: it might miss exact technical terms or specific phrases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid search combines:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keyword search&lt;/strong&gt; (BM25): Catches exact terms and rare phrases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector search&lt;/strong&gt;: Understands meaning and context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example: A query for "Python asyncio" benefits from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keyword search finding exact mentions of "asyncio"&lt;/li&gt;
&lt;li&gt;Semantic search finding related concepts like "asynchronous programming"&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  13. Knowledge Graphs: Structured Relationships
&lt;/h2&gt;

&lt;p&gt;While vectors capture similarity, knowledge graphs capture &lt;em&gt;relationships&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structure:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nodes&lt;/strong&gt; represent entities (concepts, people, things)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edges&lt;/strong&gt; represent relationships between them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Transformer --uses--&amp;gt; Self-Attention
Self-Attention --enables--&amp;gt; Parallel Processing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Applications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Graph RAG for multi-hop reasoning&lt;/li&gt;
&lt;li&gt;Scientific knowledge representation&lt;/li&gt;
&lt;li&gt;Complex question answering&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  14. Prompts and Context: Controlling Generation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt; consists of the chunks retrieved from your knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts&lt;/strong&gt; are instructions that tell the LLM how to use that context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Answer the following question using ONLY the context provided below. 
If the answer cannot be found in the context, say "I don't know."

Context: [retrieved chunks]

Question: [user query]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well-crafted prompts are essential for preventing hallucinations and ensuring grounded responses.&lt;/p&gt;




&lt;h2&gt;
  
  
  15. Hallucination: The Challenge RAG Solves
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Language models can generate plausible-sounding but entirely fabricated information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG's solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ground responses in retrieved documents&lt;/li&gt;
&lt;li&gt;Include citations to sources&lt;/li&gt;
&lt;li&gt;Use prompts that enforce context-only answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG doesn't eliminate hallucinations entirely, but it dramatically reduces them by anchoring the model to factual sources.&lt;/p&gt;




&lt;h2&gt;
  
  
  16. Tokens: The Currency of Language Models
&lt;/h2&gt;

&lt;p&gt;A token is roughly equivalent to a word fragment. The sentence "Artificial Intelligence is transforming technology" might be tokenized as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;["Art", "ificial", " Intelligence", " is", " transform", "ing", " technology"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLMs have token limits (e.g., 128K tokens for GPT-4)&lt;/li&gt;
&lt;li&gt;Token count affects both cost and performance&lt;/li&gt;
&lt;li&gt;Understanding tokenization helps optimize chunk sizes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  17. Temperature: Controlling Creativity
&lt;/h2&gt;

&lt;p&gt;Temperature is a parameter that controls the randomness of model outputs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Temperature&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;Deterministic, factual&lt;/td&gt;
&lt;td&gt;RAG systems, factual Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.7&lt;/td&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;General conversation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.0+&lt;/td&gt;
&lt;td&gt;Creative, varied&lt;/td&gt;
&lt;td&gt;Creative writing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For RAG applications, lower temperatures (0-0.3) typically work best.&lt;/p&gt;




&lt;h2&gt;
  
  
  18. Top-k: How Many Results to Retrieve
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;top_k&lt;/code&gt; parameter determines how many documents to retrieve from your vector database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Too few (k=1-2): Risk missing relevant information&lt;/li&gt;
&lt;li&gt;Too many (k=50+): Include noise, increase costs&lt;/li&gt;
&lt;li&gt;Sweet spot: Often k=3-10 depending on your use case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Experiment to find the right balance for your application.&lt;/p&gt;




&lt;h2&gt;
  
  
  19. Evaluation Metrics: Measuring Success
&lt;/h2&gt;

&lt;p&gt;How do you know if your RAG system is working well?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;What It Measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recall@k&lt;/td&gt;
&lt;td&gt;Are the right documents in the top-k results?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MRR (Mean Reciprocal Rank)&lt;/td&gt;
&lt;td&gt;How quickly do we find the first relevant result?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NDCG&lt;/td&gt;
&lt;td&gt;Overall quality of the ranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer relevance&lt;/td&gt;
&lt;td&gt;Does the final answer address the question?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Faithfulness&lt;/td&gt;
&lt;td&gt;Is the answer grounded in the retrieved context?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Regular evaluation ensures your system maintains quality as your knowledge base grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  20. The RAG Pipeline: Putting It All Together
&lt;/h2&gt;

&lt;p&gt;A complete RAG system follows this flow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Ingestion Phase:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load documents&lt;/li&gt;
&lt;li&gt;Split into chunks&lt;/li&gt;
&lt;li&gt;Generate embeddings&lt;/li&gt;
&lt;li&gt;Store in vector database with metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Retrieval Phase:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User submits a query&lt;/li&gt;
&lt;li&gt;Convert query to embedding&lt;/li&gt;
&lt;li&gt;Search vector database&lt;/li&gt;
&lt;li&gt;Apply metadata filters&lt;/li&gt;
&lt;li&gt;Rerank results&lt;/li&gt;
&lt;li&gt;Apply MMR for diversity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Generation Phase:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Construct prompt with retrieved context&lt;/li&gt;
&lt;li&gt;Call LLM with controlled temperature&lt;/li&gt;
&lt;li&gt;Generate response with citations&lt;/li&gt;
&lt;li&gt;Return to user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each step is crucial for building a system that's both accurate and reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: The Power of RAG
&lt;/h2&gt;

&lt;p&gt;At its core, RAG and semantic search represent a fundamental shift in how we build AI applications. Instead of hoping a pre-trained model knows everything, we give it the ability to learn from our specific data in real-time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The one-sentence summary:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG + Semantic Search = Teaching AI to read your data before answering&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Whether you're building a customer support bot, a research assistant, or an internal knowledge management system, understanding these 20 concepts gives you the foundation to create intelligent, grounded, and reliable AI applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Ready to go deeper? Consider:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Building a simple RAG system&lt;/strong&gt; using LangChain or LlamaIndex&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experimenting with different embedding models&lt;/strong&gt; to see what works for your domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementing evaluation metrics&lt;/strong&gt; to measure and improve your system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploring advanced techniques&lt;/strong&gt; like Graph RAG or multi-query retrieval&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The field is evolving rapidly, but these fundamentals will serve you well no matter which direction it takes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have questions or want to share your RAG implementation experiences? Let's discuss in the comments below!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
