<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ayush Shekhar</title>
    <description>The latest articles on DEV Community by Ayush Shekhar (@ayushh0110).</description>
    <link>https://dev.to/ayushh0110</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898782%2Fad3df0e7-5f5e-45e7-93c4-495fd4566407.jpeg</url>
      <title>DEV Community: Ayush Shekhar</title>
      <link>https://dev.to/ayushh0110</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ayushh0110"/>
    <language>en</language>
    <item>
      <title>I didn't win the gemma 4 challenge but here's what i actually learned.</title>
      <dc:creator>Ayush Shekhar</dc:creator>
      <pubDate>Fri, 19 Jun 2026 16:06:46 +0000</pubDate>
      <link>https://dev.to/ayushh0110/i-didnt-win-the-gemma-4-challenge-heres-what-i-actually-learned-3fk1</link>
      <guid>https://dev.to/ayushh0110/i-didnt-win-the-gemma-4-challenge-heres-what-i-actually-learned-3fk1</guid>
      <description>&lt;p&gt;Results came out today. ScreenMind didn't make it.&lt;/p&gt;

&lt;p&gt;i'll be honest, i refreshed that page a few times hoping i misread it. i didn't. so okay, that's that.&lt;/p&gt;

&lt;p&gt;then i did the thing you're supposed to do and actually read the winners instead of sulking. and i'm glad i did,because it answered a question i didn't know i had.&lt;/p&gt;

&lt;h3&gt;
  
  
  What won
&lt;/h3&gt;

&lt;p&gt;go look at them, they're good:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LIKAS, an offline disaster app for the philippines that keeps working when the cell towers go down&lt;/li&gt;
&lt;li&gt;AccessLens, basically eyes for blind and low-vision users, on-device&lt;/li&gt;
&lt;li&gt;a local postgres triage co-pilot someone built because HIPAA means they literally can't paste their db panic into
chatgpt at 3am&lt;/li&gt;
&lt;li&gt;a phone agent that runs fully offline, peer to peer, so no prompt ever hits a server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;every single one of these is a "here's a person who was stuck, here's how i unstuck them" story.&lt;/p&gt;

&lt;h3&gt;
  
  
  What i built
&lt;/h3&gt;

&lt;p&gt;ScreenMind is a privacy-first take on microsoft recall. it watches your screen, understands what you're doing with gemma 4, and lets you search and chat with your own history later. "what did aachii say on discord" and it pulls up the actual message. &lt;br&gt;
The engineering, i'm proud of it. it runs all three of gemma 4's modalities, vision audio and reasoning, on a 4gb gtx 1650. that constraint was the whole game. to make continuous analysis survivable on a card that small i had to build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a perceptual-hash cache so it doesn't re-analyze the same vs code window fifty times&lt;/li&gt;
&lt;li&gt;a chat that can cancel an in-flight analysis mid-inference and grab the GPU back in about a second, so you never
wait a minute for a background job to finish&lt;/li&gt;
&lt;li&gt;meeting transcription straight off gemma's audio encoder, no whisper bolted on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;honestly it's the most involved thing i've built.&lt;/p&gt;

&lt;h3&gt;
  
  
  The part that stung in a useful way
&lt;/h3&gt;

&lt;p&gt;none of that is why the winners won.&lt;/p&gt;

&lt;p&gt;i pitched ScreenMind as "recall but private." that's a feature comparison. it's me, an engineer, going "look how it works." the winners went "look who this helps." and that wins. every time, apparently.&lt;/p&gt;

&lt;p&gt;i think i knew this in some abstract way but seeing it laid out next to my own submission made it concrete. the depth of the engineering didn't lose to better engineering. it lost to a better reason for the thing to exist.&lt;br&gt;
that's a genuinely useful thing to get wrong once, early, on a challenge and not on something that mattered more.&lt;/p&gt;

&lt;h3&gt;
  
  
  So
&lt;/h3&gt;

&lt;p&gt;i'm not shelving ScreenMind, it's open source and i'm still building on it: &lt;a href="https://github.com/ayushh0110/ScreenMind" rel="noopener noreferrer"&gt;https://github.com/ayushh0110/ScreenMind&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;but the next thing i build, i'm starting from the person, not the spec. figure out who's actually stuck before i write a line of code.&lt;/p&gt;

&lt;p&gt;congrats to everyone who won, and honestly to everyone who shipped something at all. that part's harder than it looks. On to the next one.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemma</category>
      <category>gemmachallenge</category>
      <category>devchallenge</category>
    </item>
    <item>
      <title>I Built a Privacy-First Alternative to Microsoft Recall — Using All 3 Gemma 4 Modalities</title>
      <dc:creator>Ayush Shekhar</dc:creator>
      <pubDate>Sat, 23 May 2026 10:43:57 +0000</pubDate>
      <link>https://dev.to/ayushh0110/i-built-a-privacy-first-alternative-to-microsoft-recall-using-all-3-gemma-4-modalities-26bb</link>
      <guid>https://dev.to/ayushh0110/i-built-a-privacy-first-alternative-to-microsoft-recall-using-all-3-gemma-4-modalities-26bb</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I kept losing things. Not files — context. "What was that error I saw 2 hours ago?" "What did that Slack message say before I scrolled past it?" "What was I even working on before lunch?"&lt;/p&gt;

&lt;p&gt;Microsoft Recall tried to solve this but stored everything in plaintext with telemetry phoning home. The idea is genuinely useful — the execution was terrible. So I built my own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ScreenMind&lt;/strong&gt; is an open-source screen activity journal that runs entirely on your machine. It captures your screen, analyzes every screenshot with Gemma 4's vision, and builds a searchable, chat-able AI memory of your digital life.&lt;/p&gt;

&lt;p&gt;After two weeks of daily driving it, the thing I didn't expect: you stop screenshotting Slack messages "just in case." You stop bookmarking stuff you know you'll forget. You just trust your computer to remember. The chat is the real feature — "What did Alex say on Discord?" pulls up the actual message. "What was I working on at 3pm?" shows you. It's grep for your visual memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbe6tzu6a06ro1g4cow5t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbe6tzu6a06ro1g4cow5t.png" alt="Timeline view — AI-analyzed screen activity feed" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agents&lt;/th&gt;
&lt;th&gt;Chat with your memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpfs2snj3zzfdyy7xkh6.png" alt="Agents" width="799" height="379"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1wxscl5zk0aof2s4gvb.png" alt="Chat" width="800" height="384"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;But it goes way beyond a Recall clone. Here's what it actually does:&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Capture
&lt;/h3&gt;

&lt;p&gt;Doesn't blindly screenshot every 30 seconds. Uses perceptual hashing to detect when your screen &lt;em&gt;actually&lt;/em&gt; changes. Cursor blinks and clock updates get ignored. Real content changes get captured. This alone significantly reduces unnecessary inference calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemma 4 Vision Analysis
&lt;/h3&gt;

&lt;p&gt;Every screenshot goes through Gemma 4 with OCR text as context. It figures out what app you're using, what you're doing, categorizes the activity, detects mood, and writes a detailed scene description. Not just "user is in Chrome" — more like "user is reading a pull request review on GitHub for the auth-middleware refactor."&lt;/p&gt;

&lt;h3&gt;
  
  
  Spatial Layout Detection
&lt;/h3&gt;

&lt;p&gt;OCR boxes get classified into screen regions (sidebar, chat area, toolbar, profile panel) using coordinate-based parsing. Text gets organized by section so when you search or chat, you get structured context — not a wall of raw OCR.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Search
&lt;/h3&gt;

&lt;p&gt;Semantic search (MiniLM embeddings + cosine similarity) combined with SQLite FTS5 keyword search. Ask "debugging the auth module" and it finds screenshots by meaning, not just exact word matches. Results show OCR text highlighted directly on the screenshot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chat With Your Screen History
&lt;/h3&gt;

&lt;p&gt;Ask "what should I reply to that Discord message?" and it pulls up the relevant screenshot, reads the organized text, and answers. Ask "did I get any email from Zerodha?" and it finds your inbox screenshot and tells you. It's RAG over your actual life, not documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice Memos
&lt;/h3&gt;

&lt;p&gt;Hold Ctrl+Shift+V, speak, release. Gemma 4's native audio encoder transcribes it. A screenshot is captured alongside so you have visual context with every memo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Meeting Transcription
&lt;/h3&gt;

&lt;p&gt;Auto-detects when you're in Zoom, Teams, Discord, or Meet. Records audio, transcribes in 15-second chunks using Gemma's audio encoder, then runs map-reduce summarization. Outputs structured summaries with topics, decisions, and action items.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Platform
&lt;/h3&gt;

&lt;p&gt;Build custom automations by writing a markdown file in plain English:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Daily Focus Report&lt;/span&gt;
&lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;every 6h&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;timeline, apps, mood&lt;/span&gt;
&lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local, obsidian&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
Analyze my screen activity and generate a focus report:
&lt;span class="p"&gt;-&lt;/span&gt; How many hours of deep work vs shallow work?
&lt;span class="p"&gt;-&lt;/span&gt; What were my main distractions?
&lt;span class="p"&gt;-&lt;/span&gt; Give me a focus score out of 10.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Drop it in a folder. It runs automatically. For developers who want more control, there's a full Python SDK with state persistence and GPU-safe LLM access.&lt;/p&gt;
&lt;h3&gt;
  
  
  MCP Server
&lt;/h3&gt;

&lt;p&gt;Exposes your screen history to Claude Desktop, Cursor, and VS Code via Model Context Protocol. 8 tools: search, recent activity, time-range queries, daily summaries, meeting transcripts, instant capture.&lt;/p&gt;
&lt;h3&gt;
  
  
  Privacy
&lt;/h3&gt;

&lt;p&gt;Auto-redacts credit cards, SSNs, API keys, and passwords from captured text before storage. AES encryption at rest. Dashboard PIN lock. App blocklist. Incognito mode. Nothing ever leaves your machine.&lt;/p&gt;
&lt;h3&gt;
  
  
  Plus
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analytics dashboard&lt;/strong&gt; — category breakdown, top apps, hourly heatmap, meeting stats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day Rewind&lt;/strong&gt; — timelapse playback of your entire day with play/pause/scrub controls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrations&lt;/strong&gt; — Obsidian vault sync, Notion export, webhooks (Slack, Discord, IFTTT)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/CxkkBT_EvPw"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;
&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ayushh0110" rel="noopener noreferrer"&gt;
        ayushh0110
      &lt;/a&gt; / &lt;a href="https://github.com/ayushh0110/ScreenMind" rel="noopener noreferrer"&gt;
        ScreenMind
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
       AI-powered screen memory — captures, analyzes, and lets you search/chat your screen history. Powered by Gemma 4 . 100% local, 100% private.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/853212f5e82c3a9cb6462c61e4761aa9e91f9b57c8d39e6375ae61ea2c8bd8ff/68747470733a2f2f696d672e736869656c64732e696f2f62616467652ff09fa7a05f53637265656e4d696e642d596f75725f41495f4d656d6f72792d3842354346363f7374796c653d666f722d7468652d6261646765266c6162656c436f6c6f723d306130653161"&gt;&lt;img src="https://camo.githubusercontent.com/853212f5e82c3a9cb6462c61e4761aa9e91f9b57c8d39e6375ae61ea2c8bd8ff/68747470733a2f2f696d672e736869656c64732e696f2f62616467652ff09fa7a05f53637265656e4d696e642d596f75725f41495f4d656d6f72792d3842354346363f7374796c653d666f722d7468652d6261646765266c6162656c436f6c6f723d306130653161" alt="ScreenMind" height="40"&gt;&lt;/a&gt;

&lt;p&gt;&lt;strong&gt;Captures your screen → Analyzes with Gemma 4 → Builds a searchable AI memory&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;100% local. 100% private. Zero cloud dependencies.&lt;/strong&gt;&lt;/p&gt;
&lt;br&gt;
&lt;p&gt;&lt;a href="https://github.com/ayushh0110/ScreenMind/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/ayushh0110/ScreenMind/actions/workflows/ci.yml/badge.svg" alt="CI"&gt;&lt;/a&gt;
&lt;a href="https://codecov.io/gh/ayushh0110/ScreenMind" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/d55b37a3661c600ec242530fc6c895a07d84e520f42ce697e12e14ac9b6c043e/68747470733a2f2f636f6465636f762e696f2f67682f617975736868303131302f53637265656e4d696e642f67726170682f62616467652e737667" alt="codecov"&gt;&lt;/a&gt;
&lt;a href="https://python.org" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b43cbee196e104f1912e1e1f08745aac72ee904fe95aa463d7b246cc2ccfe691/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31302b2d3337373641423f7374796c653d666c61742d737175617265266c6f676f3d707974686f6e266c6f676f436f6c6f723d7768697465" alt="Python 3.10+"&gt;&lt;/a&gt;
&lt;a href="https://ai.google.dev/gemma" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/844da5300027994562f93c3bb8374b9e8f4dead8dcbc4e86665b35e8ecc81aac/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f47656d6d615f342d4532425f566973696f6e2b417564696f2d3842354346363f7374796c653d666c61742d737175617265266c6f676f3d676f6f676c65266c6f676f436f6c6f723d7768697465" alt="Gemma 4 E2B"&gt;&lt;/a&gt;
&lt;a href="https://github.com/ggerganov/llama.cpp" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/93311f1629279cbfa441a20d30148fe3a19e03c32d9dafddda1d652cf3e2d189/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6c616d612e6370702d4c6f63616c5f496e666572656e63652d3333333f7374796c653d666c61742d737175617265" alt="llama.cpp"&gt;&lt;/a&gt;
&lt;a href="https://github.com/ayushh0110/ScreenMind/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/e5a871941420f5f2b2a1031c619420263e0f19160cc870adc79a65940bd828f4/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d3130423938313f7374796c653d666c61742d737175617265" alt="License MIT"&gt;&lt;/a&gt;
&lt;a href="https://github.com/ayushh0110/ScreenMind/MCP_SETUP.md" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/1d0888d7bd62299ed228e7e2f4813cab76614ea70bde8f0f4efbb69744bd97b0/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4d43502d436c617564655f2537435f437572736f725f2537435f5653436f64652d4635394530423f7374796c653d666c61742d737175617265" alt="MCP Ready"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;br&gt;
&lt;p&gt;&lt;a href="https://github.com/ayushh0110/ScreenMind#-features" rel="noopener noreferrer"&gt;&lt;strong&gt;Features&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/ayushh0110/ScreenMind#-how-gemma-4-is-used" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemma 4 Deep Dive&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/ayushh0110/ScreenMind#-quick-start" rel="noopener noreferrer"&gt;&lt;strong&gt;Quick Start&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/ayushh0110/ScreenMind#-architecture" rel="noopener noreferrer"&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/ayushh0110/ScreenMind#-agent-platform" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Platform&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/ayushh0110/ScreenMind#-mcp-server-claude--cursor--vs-code" rel="noopener noreferrer"&gt;&lt;strong&gt;MCP&lt;/strong&gt;&lt;/a&gt; · &lt;a href="https://github.com/ayushh0110/ScreenMind#-api-reference" rel="noopener noreferrer"&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;br&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/ayushh0110/ScreenMind/docs/screenshots/agents.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fayushh0110%2FScreenMind%2FHEAD%2Fdocs%2Fscreenshots%2Fagents.png" alt="Timeline — AI-analyzed screen activity feed"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agents&lt;/th&gt;
&lt;th&gt;Chat with your memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a rel="noopener noreferrer" href="https://github.com/ayushh0110/ScreenMind/docs/screenshots/timeline.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fayushh0110%2FScreenMind%2FHEAD%2Fdocs%2Fscreenshots%2Ftimeline.png" alt="Agents"&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a rel="noopener noreferrer" href="https://github.com/ayushh0110/ScreenMind/docs/screenshots/chat.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fayushh0110%2FScreenMind%2FHEAD%2Fdocs%2Fscreenshots%2Fchat.png" alt="Chat"&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Microsoft showed the world wants screen-aware AI with Recall.&lt;/strong&gt; But Recall stores data in plaintext, sends telemetry, and was met with massive privacy backlash. ScreenMind is the open-source, privacy-first alternative — every screenshot analyzed, every insight generated, every search result — all computed locally using Gemma 4's multimodal capabilities.&lt;/p&gt;
&lt;p&gt;It's not just a screen recorder. It's an &lt;strong&gt;AI memory&lt;/strong&gt; you can talk to, search through, and build automations on top of.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;✨ Features&lt;/h2&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;🧠 Core Intelligence&lt;/h3&gt;
&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;📸 Smart Capture&lt;/strong&gt; — Content-change detection, not a fixed timer. Captures when your screen &lt;em&gt;actually&lt;/em&gt; changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔬 Gemma 4 Vision Analysis&lt;/strong&gt; — Every screenshot analyzed: app detection, activity categorization, mood…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ayushh0110/ScreenMind" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting. The hard part wasn't capturing screenshots — it was making continuous AI analysis sustainable on a single consumer GPU that also needs to answer questions in real-time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Gemma 4 E2B — and why nothing else works
&lt;/h3&gt;

&lt;p&gt;I chose the Gemma 4 family — and it's not a preference, it's an architectural requirement. E2B is the default for 4GB GPUs, E4B for users with more headroom. Let me explain why no other model family works here.&lt;/p&gt;

&lt;p&gt;ScreenMind runs continuously in the background. It needs to analyze a screenshot every 30-40 seconds, transcribe voice memos on demand, power a chat interface, and run agent prompts — all on a single consumer GPU. These constraints eliminate everything else:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constraint&lt;/th&gt;
&lt;th&gt;What it eliminates&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Must run continuously in background on 4GB VRAM&lt;/td&gt;
&lt;td&gt;Rules out 12B+ models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Must understand screenshots natively&lt;/td&gt;
&lt;td&gt;Rules out text-only models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Must transcribe audio natively&lt;/td&gt;
&lt;td&gt;Rules out models without audio encoder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Must stay 100% local&lt;/td&gt;
&lt;td&gt;Rules out cloud APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Must be fast enough for 30-40s capture cycle&lt;/td&gt;
&lt;td&gt;E2B does it in 12-76s on a 4GB GTX 1650 (faster on bigger GPUs)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemma 4 E2B is the only model that checks all five boxes. One model in VRAM instead of two or three. Runs through llama.cpp with Q4 quantization on my GTX 1650.&lt;/p&gt;
&lt;h3&gt;
  
  
  The GPU scheduling problem
&lt;/h3&gt;

&lt;p&gt;If you send every screenshot to a vision model, your GPU is permanently busy with background work. Nothing left when you actually want to chat. That's useless.&lt;/p&gt;

&lt;p&gt;I built three systems to solve this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Perceptual hash caching (3 tiers)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Screen change&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Identical (pHash diff ≤3)&lt;/td&gt;
&lt;td&gt;Skip everything, copy from cache&lt;/td&gt;
&lt;td&gt;0ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minor change (diff ≤9)&lt;/td&gt;
&lt;td&gt;Re-run OCR only, reuse Gemma analysis&lt;/td&gt;
&lt;td&gt;3-10s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real change (diff 10+)&lt;/td&gt;
&lt;td&gt;Full pipeline&lt;/td&gt;
&lt;td&gt;12-76s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The trick: cache staleness is per-app. Discord/Slack screenshots expire faster than VS Code. Chat moves fast, code doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Chat-first GPU priority&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you ask a question, it kills whatever background analysis is running — closes the HTTP client, llama-server frees the slot in &amp;lt;1s. The cancelled analysis gets re-queued at the front, not the back. Users never wait for background work to finish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Three analysis modes using Gemma's thinking&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fast (~12s)&lt;/strong&gt; — pre-fill &lt;code&gt;&amp;lt;think&amp;gt;\n&amp;lt;/think&amp;gt;&lt;/code&gt; in the assistant message to skip reasoning entirely. Good enough for app detection and basic categorization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balanced (~40s)&lt;/strong&gt; — natural thinking. Better scene descriptions and activity understanding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accurate (~76s)&lt;/strong&gt; — full thinking + spatial layout detection in one call. Used for complex screens and meetings&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Quick note on these numbers: they're all from my GTX 1650. 4GB VRAM, which is about as low as you can go for multimodal local AI. The model doesn't fully fit in 4GB so it's spilling to CPU, and that spill is where most of the time goes.&lt;/p&gt;

&lt;p&gt;On a card where E2B actually fits — 8GB and up, something like a 3060 — the heavier modes get roughly 4-6x faster.Accurate goes from ~76s to somewhere around 10-15s. Fast mode barely moves though. Once you strip out the thinking you're not waiting on Gemma anymore, you're waiting on OCR and embeddings — and right now those run on CPU. I could push them onto the GPU later, which would help Fast mode, but haven't yet.&lt;/p&gt;

&lt;p&gt;Figured I'd be upfront that this is the floor, not the ceiling.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  All three modalities in one product
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Vision&lt;/strong&gt; — Every screenshot gets sent to Gemma 4 with OCR context. The prompt asks for structured JSON: app name, activity category, summary, detailed context, mood, confidence, scene description, and layout regions. The structured output means search and chat get organized data, not a blob of text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audio&lt;/strong&gt; — Gemma 4 E2B has a native audio encoder. I use it for voice memo transcription and meeting transcription. No Whisper, no separate ASR model. One model handles everything. For meetings, audio gets chunked into 15-second segments, each transcribed by Gemma, then a final Gemma call does map-reduce summarization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning&lt;/strong&gt; — Daily summaries use &lt;code&gt;think=True&lt;/code&gt; for deep reasoning over a full day's activities. Chat uses Gemma to answer questions grounded in screen context with retrieved screenshots. Agents feed screen data into Gemma prompts for custom analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  The full pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Screenshot → EasyOCR (text extraction) → Gemma 4 E2B (understanding) → MiniLM (embeddings) → SQLite + FTS5
                ↑ OCR text fed as context to Gemma
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four AI models working together, with Gemma 4 as the brain. OCR extracts what's written. Gemma understands what you're doing. MiniLM enables semantic search. FTS5 handles instant keyword lookup. Each model does what it's best at.&lt;/p&gt;

&lt;p&gt;This significantly cuts Gemma inference calls during typical usage. Combined with the three analysis modes, ScreenMind runs comfortably on my GTX 1650 with 4GB VRAM as a daily driver.&lt;/p&gt;

&lt;p&gt;I've been using this daily for two weeks. The chat feature is genuinely addictive — being able to ask "what was I working on before lunch?" or "what did that email say?" and getting an actual answer from your own screen history changes how you think about your computer. It makes me wonder — does personal AI get fundamentally more useful once it has persistent context about what you're actually doing? Would love to hear what you think.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
    <item>
      <title>From Heuristics to Fine-Tuning: Teaching a Model to Use Tools</title>
      <dc:creator>Ayush Shekhar</dc:creator>
      <pubDate>Sun, 26 Apr 2026 13:28:09 +0000</pubDate>
      <link>https://dev.to/ayushh0110/from-heuristics-to-fine-tuning-teaching-a-model-to-use-tools-1c9g</link>
      <guid>https://dev.to/ayushh0110/from-heuristics-to-fine-tuning-teaching-a-model-to-use-tools-1c9g</guid>
      <description>&lt;p&gt;&lt;em&gt;How I replaced 200 lines of regex with a fine-tuned 7B model — and why it was worth it.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I built an &lt;a href="https://github.com/ayushh0110/autonomous-agent" rel="noopener noreferrer"&gt;autonomous AI agent&lt;/a&gt; with 9 tools: web search, calculator, weather, Wikipedia, translation, and more. The first question every request must answer is deceptively simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Which tool should I use?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My first solution was a heuristic classifier — a function called &lt;code&gt;classify_query()&lt;/code&gt; that uses regex patterns to detect intent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 200+ lines of patterns like this:
&lt;/span&gt;&lt;span class="n"&gt;_SEARCH_INDICATORS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(latest|current|news|today|recent|who won|score|price|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock|update|happening|trending|release|launched)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;_KNOWLEDGE_INDICATORS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b(explain|what is|how does|define|difference between|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;why do|concept of|overview|meaning of|works)\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It worked. About 75% of the time.&lt;/p&gt;

&lt;p&gt;The remaining 25% was a graveyard of edge cases: "say hello in Japanese" (needs &lt;code&gt;translate&lt;/code&gt;, matched nothing), "what's 15% of 2850" (needs &lt;code&gt;calculator&lt;/code&gt;, matched &lt;code&gt;what's&lt;/code&gt; → routed to search), "compare React vs Vue" (needs autonomous executor, matched &lt;code&gt;compare&lt;/code&gt; → routed to direct answer).&lt;/p&gt;

&lt;p&gt;Every fix introduced new regressions. &lt;strong&gt;Regex-based routing doesn't scale.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;What if the model itself could learn the routing? Not a giant foundation model — a small, fast 7B model fine-tuned specifically for this task. The hypothesis:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A QLoRA-adapted 7B model trained on 1K high-quality tool-call traces should outperform hand-crafted regex, with comparable latency.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This became &lt;a href="https://github.com/ayushh0110/toolforge" rel="noopener noreferrer"&gt;&lt;strong&gt;ToolForge&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Generating Training Data (The Hard Part)
&lt;/h2&gt;

&lt;p&gt;I had 9 tools but no labeled dataset. Creating one manually would take weeks. Instead, I used &lt;strong&gt;teacher distillation&lt;/strong&gt; — using a stronger model (Gemini 2.5 Flash) to generate high-quality training examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Distillation Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User queries (generated) → Gemini 2.5 Flash → Structured tool-call traces → Filtered dataset
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trick was &lt;strong&gt;diversity&lt;/strong&gt;. I needed queries covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-tool requests ("What's the weather in Tokyo?")&lt;/li&gt;
&lt;li&gt;Multi-tool chains ("What's the weather in Tokyo and convert 25°C to Fahrenheit?")&lt;/li&gt;
&lt;li&gt;No-tool queries ("Explain recursion")&lt;/li&gt;
&lt;li&gt;Ambiguous queries ("Tell me about Python" — search or direct answer?)&lt;/li&gt;
&lt;li&gt;Edge cases ("sqrt of 44567" — calculator, not search)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I built a &lt;code&gt;ClientPool&lt;/code&gt; that rotates across 6 free-tier Gemini API keys to avoid rate limits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ClientPool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Round-robin pool of (key, model) slots for maximum throughput.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;next_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Pick the slot that has rested the longest
&lt;/span&gt;        &lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_slots&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;best&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_used&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_min_gap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_min_gap&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;best&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After filtering for quality (valid JSON, correct schema, no hallucinated tools), I had &lt;strong&gt;1,173 clean examples&lt;/strong&gt; — enough for fine-tuning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dataset Distribution
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;web_search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;287&lt;/td&gt;
&lt;td&gt;24%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;calculator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;156&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;weather&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;143&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;translate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;132&lt;/td&gt;
&lt;td&gt;11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;wikipedia&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;no_tool&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dictionary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;datetime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;68&lt;/td&gt;
&lt;td&gt;6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;unit_converter&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The distribution is intentionally skewed toward &lt;code&gt;web_search&lt;/code&gt; — mirroring real-world query patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Training with QLoRA
&lt;/h2&gt;

&lt;p&gt;I trained on a Kaggle T4 GPU (free tier). The key insight: &lt;strong&gt;you don't need an A100 for fine-tuning.&lt;/strong&gt; QLoRA with 4-bit NF4 quantization fits a 7B model in ~6GB VRAM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;bnb_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BitsAndBytesConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_quant_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nf4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_compute_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_use_double_quant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Double quantization saves ~0.4GB
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;# LoRA rank
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# Scaling factor (alpha/r = 2)
&lt;/span&gt;    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gate_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;up_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;down_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why these choices?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;r=64&lt;/strong&gt;: Higher rank = more parameters = more capacity to learn tool routing patterns. I tested r=16 (too small) and r=64 (sweet spot).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All attention + MLP layers&lt;/strong&gt;: Tool routing requires understanding query intent (attention) AND mapping it to structured output (MLP). Targeting only attention heads wasn't enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;alpha=128 (2×r)&lt;/strong&gt;: Standard scaling that prevents gradient instability.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 3: The Ablation Study
&lt;/h2&gt;

&lt;p&gt;This is where the project goes from "I fine-tuned a model" to "I systematically evaluated design choices." I ran 4 experiments:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th&gt;Base Model&lt;/th&gt;
&lt;th&gt;LoRA Rank&lt;/th&gt;
&lt;th&gt;LR&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Mistral-7B-Instruct-v0.3&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;2e-4&lt;/td&gt;
&lt;td&gt;78.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Mistral-7B-Instruct-v0.3&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;2e-4&lt;/td&gt;
&lt;td&gt;81.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Qwen2.5-7B-Instruct&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;2e-4&lt;/td&gt;
&lt;td&gt;83.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Qwen2.5-7B-Instruct&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;64&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2e-4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;86.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All tracked on &lt;a href="https://wandb.ai/shekharayush56-cognizant/toolforge" rel="noopener noreferrer"&gt;Weights &amp;amp; Biases&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Findings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Qwen &amp;gt; Mistral for tool routing (+4.5%)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Qwen2.5-7B-Instruct has stronger structured output capabilities out of the box. Its chat template naturally handles tool-call JSON, while Mistral required more prompt engineering to produce valid output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. r=64 &amp;gt; r=16 for both models (+3-4%)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The routing task isn't trivial — the model needs to learn mappings between natural language patterns and 9 discrete tool categories plus argument extraction. r=16 underfits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Eval loss converges by epoch 2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All runs showed minimal improvement after epoch 2, with some showing slight overfitting in epoch 3. &lt;code&gt;load_best_model_at_end=True&lt;/code&gt; was essential.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Integration
&lt;/h2&gt;

&lt;p&gt;The integration into the autonomous agent was designed as a &lt;strong&gt;feature flag&lt;/strong&gt; — zero behavior change in production unless explicitly enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In executor.py
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;is_toolforge_available&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;toolforge_classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;has_memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;router_source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolforge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;has_memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# heuristic fallback
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;toolforge_classify()&lt;/code&gt; function:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Loads the LoRA adapter lazily on first query&lt;/li&gt;
&lt;li&gt;Runs inference with greedy decoding (deterministic routing)&lt;/li&gt;
&lt;li&gt;Parses the model's tool-call output&lt;/li&gt;
&lt;li&gt;Maps specific tools to the agent's decision types (&lt;code&gt;web_search&lt;/code&gt; → &lt;code&gt;needs_search&lt;/code&gt;, no tool → &lt;code&gt;direct_answer&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Returns &lt;code&gt;None&lt;/code&gt; on any failure → heuristic takes over&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production (HF Spaces, CPU)&lt;/strong&gt;: heuristic runs as before&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU-enabled environments&lt;/strong&gt;: ToolForge model handles routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The code is always visible&lt;/strong&gt;: interviewers can see the integration pattern&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Heuristic (Regex)&lt;/th&gt;
&lt;th&gt;ToolForge (QLoRA)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~75%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;86.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Approach&lt;/td&gt;
&lt;td&gt;200 lines of regex&lt;/td&gt;
&lt;td&gt;Fine-tuned Qwen2.5-7B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;0ms (regex)&lt;/td&gt;
&lt;td&gt;~200ms (GPU)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handles edge cases&lt;/td&gt;
&lt;td&gt;❌ Constant regressions&lt;/td&gt;
&lt;td&gt;✅ Learned from data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance cost&lt;/td&gt;
&lt;td&gt;High (new regex per bug)&lt;/td&gt;
&lt;td&gt;Low (retrain on new data)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 15% accuracy improvement isn't just a number — it means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Say hello in Japanese" → correctly routes to &lt;code&gt;translate&lt;/code&gt; (was: missed entirely)&lt;/li&gt;
&lt;li&gt;"sqrt(44567)" → correctly routes to &lt;code&gt;calculator&lt;/code&gt; (was: matched "what" → search)&lt;/li&gt;
&lt;li&gt;"Compare React vs Vue for 2026" → correctly routes to &lt;code&gt;autonomous_task&lt;/code&gt; (was: partial match → direct answer)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Update: Honest Base-vs-Fine-Tuned Evaluation
&lt;/h2&gt;

&lt;p&gt;The 86.2% above is measured on a held-out split of the &lt;em&gt;same&lt;/em&gt; synthetic data the&lt;br&gt;
  Gemini teacher labeled — so it's partly &lt;strong&gt;circular&lt;/strong&gt; (the teacher grades the test&lt;br&gt;
  set it helped create). It's a fair &lt;em&gt;internal&lt;/em&gt; comparison of hyperparameters, but&lt;br&gt;
  not an unbiased estimate of routing quality.&lt;/p&gt;

&lt;p&gt;To check what fine-tuning &lt;em&gt;actually&lt;/em&gt; added, I built a separate &lt;strong&gt;hand-written,&lt;br&gt;
  non-circular test set&lt;/strong&gt; (36 realistic, indirectly-phrased queries, hand-labeled —&lt;br&gt;
  no teacher) and ran &lt;strong&gt;base Qwen2.5-7B vs the fine-tuned adapter&lt;/strong&gt; on identical&lt;br&gt;
  inputs, with &lt;strong&gt;format-agnostic grading&lt;/strong&gt; so the base model isn't penalized for not&lt;br&gt;
  using my trained output format.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Routing accuracy&lt;/th&gt;
&lt;th&gt;Strict-format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base Qwen2.5-7B&lt;/td&gt;
&lt;td&gt;75.0%&lt;/td&gt;
&lt;td&gt;75.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuned (QLoRA r=64)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.3%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+8.3 pp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+8.3 pp&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things stand out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The gain is routing, not formatting.&lt;/strong&gt; Strict and lenient scores are identical&lt;br&gt;
for both models — base Qwen already emits clean tool-call formats. Fine-tuning&lt;br&gt;
improved &lt;em&gt;which tool gets picked&lt;/em&gt;, not how the call is written.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;An honest tradeoff:&lt;/strong&gt; the fine-tune slightly over-triggers tools on no-tool&lt;br&gt;
chit-chat (e.g. "what is 2 plus 2"). Teaching a model to reach for tools costs&lt;br&gt;
some restraint — a precision/recall tradeoff worth naming rather than hiding.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More data&lt;/strong&gt;: 1.1K examples is enough for proof-of-concept, but 5K+ would likely push accuracy above 90%. The distillation pipeline can scale — I just ran out of free API quota.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Argument extraction evaluation&lt;/strong&gt;: I evaluated tool &lt;em&gt;selection&lt;/em&gt; accuracy but didn't formally measure argument &lt;em&gt;extraction&lt;/em&gt; quality (e.g., did the model extract "Tokyo" from "weather in Tokyo?"). The traces show it works, but a proper F1 metric would be stronger.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GGUF quantization for CPU inference&lt;/strong&gt;: The current serving path requires GPU. Converting to GGUF and using llama.cpp would enable CPU inference at ~1-2s latency — viable for production on free-tier hosting.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;This project isn't about fine-tuning. Fine-tuning is a technique — anyone can run &lt;code&gt;SFTTrainer&lt;/code&gt;. The story is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;I built an agent&lt;/strong&gt; with hand-crafted routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I measured where it failed&lt;/strong&gt; (75% accuracy, constant regex regressions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I generated training data&lt;/strong&gt; using teacher distillation from my own pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I trained and compared models&lt;/strong&gt; with systematic ablation studies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I proved it works&lt;/strong&gt; with quantitative evaluation (86.2% accuracy)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I integrated it&lt;/strong&gt; as a production-ready feature flag&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's not a tutorial project. That's the ML engineering loop — identify problem → collect data → train → evaluate → deploy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ToolForge repo&lt;/strong&gt;: &lt;a href="https://github.com/ayushh0110/toolforge" rel="noopener noreferrer"&gt;github.com/ayushh0110/toolforge&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous Agent&lt;/strong&gt;: &lt;a href="https://github.com/ayushh0110/autonomous-agent" rel="noopener noreferrer"&gt;github.com/ayushh0110/autonomous-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;W&amp;amp;B Dashboard&lt;/strong&gt;: &lt;a href="https://wandb.ai/shekharayush56-cognizant/toolforge" rel="noopener noreferrer"&gt;wandb.ai/shekharayush56-cognizant/toolforge&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Agent Demo&lt;/strong&gt;: &lt;a href="https://autonomous-agent-one.vercel.app" rel="noopener noreferrer"&gt;autonomous-agent-one.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://github.com/ayushh0110" rel="noopener noreferrer"&gt;Ayush Shekhar&lt;/a&gt;. If you're working on tool-use fine-tuning, I'd love to hear what approach you're taking — reach out on &lt;a href="https://linkedin.com/in/ayush-shekhar" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
