<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sreejit Pradhan</title>
    <description>The latest articles on DEV Community by Sreejit Pradhan (@sreejit_).</description>
    <link>https://dev.to/sreejit_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904430%2Fbd74576f-cef8-4620-a63e-8a001f1e9d6c.png</url>
      <title>DEV Community: Sreejit Pradhan</title>
      <link>https://dev.to/sreejit_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sreejit_"/>
    <language>en</language>
    <item>
      <title>My AI Kept Hallucinating Career Paths. I Abandoned the Project. GitHub Copilot Helped Me Fix What Was Actually Broken.</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Thu, 28 May 2026 14:34:18 +0000</pubDate>
      <link>https://dev.to/sreejit_/my-ai-kept-hallucinating-career-paths-i-abandoned-the-project-github-copilot-helped-me-fix-what-7c5</link>
      <guid>https://dev.to/sreejit_/my-ai-kept-hallucinating-career-paths-i-abandoned-the-project-github-copilot-helped-me-fix-what-7c5</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every developer has a graveyard repo. You know the one. It lives in a pinned tab you stopped opening. The commit history stops mid-sentence. The README has a section called "Roadmap" that you wrote with too much ambition and too little sleep.&lt;/p&gt;

&lt;p&gt;Mine was PathForge AI.&lt;/p&gt;

&lt;p&gt;The idea was real: a career intelligence engine for students in India and Southeast Asia who don't have a guidance counselor, can't afford consultants, and are one bad decision away from a degree that doesn't match their goals, grades, or budget. You enter your marks, your dream career, your financial reality — PathForge gives you three ranked career paths, real institutions, scholarship intelligence, and what it called a "brutal honesty" score.&lt;/p&gt;

&lt;p&gt;Good idea. And then the engine started hallucinating.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Likes Talking About with AI Career Tools
&lt;/h2&gt;

&lt;p&gt;The institution matching engine was confidently wrong. Not occasionally wrong — &lt;em&gt;structurally&lt;/em&gt; wrong. It would return a medical college in a tier a student couldn't afford, recommend a stream with a 12% subject overlap to their actual grades, suggest scholarships that had been discontinued. The probability scores looked precise — "78.4% fit" — but the math underneath was guessing.&lt;/p&gt;

&lt;p&gt;This is the specific kind of brokenness that makes you close the laptop.&lt;/p&gt;

&lt;p&gt;It wasn't a bug I could debug with a stack trace. It was an architecture problem. The AI reasoning layer had no anchoring system — no structured parameters to constrain what "good match" actually meant. The model was doing freeform pattern matching on career data and calling it intelligence. It wasn't. It was vibes with decimal points.&lt;/p&gt;

&lt;p&gt;I had three commits in four months. The last one was: &lt;em&gt;"fix: remove hallucinated university from results (again)".&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's when I stopped.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Was Actually in the Graveyard
&lt;/h2&gt;

&lt;p&gt;Here's what PathForge looked like before I came back to it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Next.js 16, TypeScript, NVIDIA NIM (Llama-3.1-70b-Instruct), Prisma + Supabase, Clerk auth, Zustand for state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What worked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The 6-step onboarding wizard&lt;/li&gt;
&lt;li&gt;Basic auth flow via Clerk&lt;/li&gt;
&lt;li&gt;UI and design system (ember/dark forge aesthetic — still proud of this)&lt;/li&gt;
&lt;li&gt;NVIDIA NIM integration was live&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What was broken:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The institution matching engine — hallucinating ~40% of recommendations&lt;/li&gt;
&lt;li&gt;No real scoring logic, just prompts asking the model to "rank these paths"&lt;/li&gt;
&lt;li&gt;No parameter constraints on what constituted a valid match&lt;/li&gt;
&lt;li&gt;No penalty system for budget mismatches or stream misalignment&lt;/li&gt;
&lt;li&gt;The "Reality Check Engine" was aspirational text in a README, not code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gap between what the README promised and what the code delivered was significant. I knew it. That's partly why I stopped — finishing it felt dishonest without fixing the core thing first.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Comeback: What I Actually Fixed
&lt;/h2&gt;

&lt;p&gt;The Finish-Up-A-Thon was the forcing function I needed. Here's what changed:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Multi-Factor Probability Calculator — Now With Actual Math
&lt;/h3&gt;

&lt;p&gt;The old version asked the LLM to produce a probability score. That's the problem. LLMs don't do probability. They do &lt;em&gt;plausible-sounding&lt;/em&gt; probability.&lt;/p&gt;

&lt;p&gt;The new scoring engine is deterministic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Multi-Factor Probability Calculator&lt;/span&gt;
&lt;span class="c1"&gt;// marks fit (40%) + stream fit (30%) + budget fit (20%) + base score (10%)&lt;/span&gt;
&lt;span class="c1"&gt;// + trend bonus/penalty adjustments&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;calculateCareerScore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;studentProfile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StudentProfile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;careerPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CareerPath&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;marksFit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculateMarksFit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;studentProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;grades&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;careerPath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiredMarks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.40&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamFit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculateStreamFit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;studentProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;careerPath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;preferredStreams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;budgetFit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculateBudgetFit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;studentProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;careerPath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;estimatedCost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;baseScore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;trendBonus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getTrendBonus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;careerPath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;marketDemand&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;penalties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;applyPenalties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;studentProfile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;careerPath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;marksFit&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;streamFit&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;budgetFit&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;baseScore&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;trendBonus&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;penalties&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM now receives &lt;em&gt;this score&lt;/em&gt; and uses it as a hard anchor. It can't recommend a path with a 34% fit as a primary option. The math comes first. The language comes second.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Matchmaking System — Parameter Constraints That Actually Constrain
&lt;/h3&gt;

&lt;p&gt;The hallucination problem wasn't the model. The model was doing what models do: generating plausible text. The problem was that I was asking it to do constraint satisfaction without giving it constraints.&lt;/p&gt;

&lt;p&gt;The new matchmaking system defines explicit parameters before the AI ever sees a student's profile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Budget ceiling enforcement&lt;/strong&gt;: if an institution's fee exceeds the student's stated budget by more than 15%, it gets filtered before the prompt is built&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream compatibility matrix&lt;/strong&gt;: a lookup table mapping board streams to career family compatibility scores — not inferred, hardcoded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scholarship pre-filtering&lt;/strong&gt;: institutions are matched against the live scholarship database &lt;em&gt;before&lt;/em&gt; being passed to the model, not after&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimum threshold gates&lt;/strong&gt;: a career path below 45% combined fit score never reaches the output, regardless of what the model wants to surface
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Parameter constraint layer — runs BEFORE the AI prompt&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildConstrainedCandidateSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StudentProfile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;allPaths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CareerPath&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CareerPath&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;allPaths&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;estimatedCost&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;streamCompatibilityMatrix&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;family&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;calculateCareerScore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;calculateCareerScore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;calculateCareerScore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Top 6 candidates passed to AI for narrative generation&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI now narrates. It no longer decides. That's the fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Reality Check Engine — Actually Built This Time
&lt;/h3&gt;

&lt;p&gt;The README mentioned this feature. The code did not have it. Now it does.&lt;/p&gt;

&lt;p&gt;The Reality Check Engine generates specific flags, not generic warnings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Budget gap flag: &lt;em&gt;"Your stated budget (₹4L/year) is ₹2.8L below the average cost of top Engineering colleges in your shortlist. Here are 3 institutions within range."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Salary arbitrage flag: &lt;em&gt;"Your target career (Data Science) pays a median ₹8.2L in Year 3. Your backup path (Actuarial Science) pays ₹11.4L in Year 3 with a 23% lower admission bar."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Survival odds flag: &lt;em&gt;"4,200 students applied to your top-choice stream last year. 340 were admitted. Your profile puts you in the 61st percentile of that applicant pool."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not motivational statements. They are information.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Persistent Career Memory
&lt;/h3&gt;

&lt;p&gt;Previously: your data lived in &lt;code&gt;localStorage&lt;/code&gt; only, reset if you cleared your browser.&lt;/p&gt;

&lt;p&gt;Now: Clerk + Supabase gives you persistent career profiles across devices. Your history is yours, and it's actually stored.&lt;/p&gt;




&lt;h2&gt;
  
  
  How GitHub Copilot Fit Into This
&lt;/h2&gt;

&lt;p&gt;I want to be honest about how I used it, because the honest version is more useful than "Copilot wrote my app."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where Copilot actually changed the work:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Clarifying what I was actually building.&lt;/em&gt; When I came back to the codebase after months away, I used Copilot Chat to explain my own code back to me. I'd highlight a function and ask: "What does this actually do and what are its failure modes?" That sounds embarrassing. It's also just accurate. It's faster than rereading 400 lines of cold TypeScript with no context.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The stream compatibility matrix.&lt;/em&gt; I had the concept but not the structure. I asked Copilot: &lt;em&gt;"I need a lookup table that maps Indian board exam streams to career family compatibility scores between 0 and 1. What schema would you use for this?"&lt;/em&gt; It gave me a direction. I rewrote it substantially — the values are mine, the institution data is mine — but the schema idea saved me an hour of second-guessing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Bug detection on the constraint layer.&lt;/em&gt; The budget ceiling calculation had an off-by-one logic issue where students who &lt;em&gt;exactly&lt;/em&gt; matched the budget threshold were being filtered out instead of included. I'd been looking at it for 20 minutes. I pasted the function into Copilot and asked it to review for edge cases. It caught it in about 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Code review on the scoring function.&lt;/em&gt; Before I was confident the math was right, I asked Copilot to check whether my weighted scoring formula would behave unexpectedly at edge values (marks = 0, budget = maximum, stream = no match). It flagged a division-by-zero risk I'd missed in the marks fit calculation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where Copilot couldn't help:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The actual domain logic — what score a "Science stream" student should get when applying to an Arts career, what the right budget threshold multiplier should be, how to weight market demand trends — none of that came from Copilot. It doesn't know that the JEE Advanced has 150,000 serious applicants for 16,000 seats, or that a budget of ₹3L/year in India eliminates roughly 70% of private engineering colleges. That knowledge is local and specific. I had to supply it.&lt;/p&gt;

&lt;p&gt;Copilot is a very good tool for the craft of code. It's not a substitute for knowing what you're building.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I'd Push Back
&lt;/h2&gt;

&lt;p&gt;The constraint-based approach I built has a ceiling. It's good at filtering out bad matches. It's less good at surfacing &lt;em&gt;surprising&lt;/em&gt; good matches — the career path a student wouldn't have thought to consider but actually fits them well.&lt;/p&gt;

&lt;p&gt;The old hallucinating engine was wrong 40% of the time, but the remaining 60% occasionally included genuinely creative suggestions the deterministic system wouldn't generate. There's a version of PathForge that uses the constraint layer as a floor and the AI as an exploration layer on top. I haven't built that yet.&lt;/p&gt;

&lt;p&gt;Also: the institution database is India-first. Southeast Asia support is planned but thin. The scholarship data needs regular updates to stay accurate. These aren't excuses — they're the honest version of the product right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/ogMaverick12/pathforge-ai" rel="noopener noreferrer"&gt;github.com/ogMaverick12/pathforge-ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The 6-step wizard takes about 90 seconds. Enter your stream, marks, dream career, and budget. What comes out is three ranked career paths with real institutions, probability scores built on actual math, and a Reality Check section that doesn't soften the numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part That Surprised Me
&lt;/h2&gt;

&lt;p&gt;Coming back to this project was harder than starting it.&lt;/p&gt;

&lt;p&gt;Starting a project is pure possibility. Coming back to one you abandoned means confronting the gap between what you said you'd build and what you actually did. There's a specific discomfort in reading your own old comments — &lt;em&gt;"// TODO: fix hallucination issue"&lt;/em&gt; — and knowing you left that there for months.&lt;/p&gt;

&lt;p&gt;The Finish-Up-A-Thon forced the question: is this worth finishing, or is this a project I'm attached to for the wrong reasons?&lt;/p&gt;

&lt;p&gt;PathForge is worth finishing because the problem is real. Students in India making career decisions on incomplete information, with no structured support, making choices they can't easily undo — that's not an abstract use case. The hallucinating engine was embarrassing. The deterministic scoring system isn't embarrassing.&lt;/p&gt;

&lt;p&gt;That's the difference between the version I abandoned and the version I'm shipping.&lt;/p&gt;




&lt;p&gt;What's the project in your graveyard that's worth coming back to? Drop the repo below — especially curious about anyone else who's hit the "AI is confidently wrong" wall and had to build structure around it to fix it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Vi-Bit Technologies. ⚡ Solving problems smarter, faster, and better.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>githubcopilot</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>I Built a Local Interview Coach That Learns From Every Submission With Hermes Agent.</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Tue, 26 May 2026 17:02:23 +0000</pubDate>
      <link>https://dev.to/sreejit_/i-built-a-local-interview-coach-that-learns-from-every-submission-with-hermes-agent-1jja</link>
      <guid>https://dev.to/sreejit_/i-built-a-local-interview-coach-that-learns-from-every-submission-with-hermes-agent-1jja</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; PrepPilot is a local Hermes Agent-powered interview coach that remembers how you solve, reviews code in a full browser workspace, updates your profile after every submission, and visibly evolves its own skill files. It runs locally: Next.js, FastAPI, SQLite, 208 original shared problems, private custom problems, and Hermes verified from WSL Ubuntu on my machine.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Every developer I know has the same complaint about interview prep.&lt;/p&gt;

&lt;p&gt;LeetCode does not know you. It does not know that you can solve array problems half asleep but freeze when a graph problem uses recursion. It does not know that a visual explanation helps you more than a formal proof. It does not remember that you keep missing the same edge case.&lt;/p&gt;

&lt;p&gt;PrepPilot is my attempt to fix that.&lt;/p&gt;

&lt;p&gt;It is a local-first interview coach where each learner creates a local email/password profile, solves problems in a full dashboard workspace, gets reviewed by Hermes, rates the feedback, and watches the coaching model adapt. One email means one profile. Each profile has separate sessions, custom problems, assessment progress, stats, and Hermes memory.&lt;/p&gt;

&lt;p&gt;The important bit: Hermes is not a decorative wrapper around the app. Hermes is the core intelligence layer. The dashboard, problem bank, custom problems, and session history all flow into the same FastAPI submission pipeline, and that pipeline calls Hermes skills.&lt;/p&gt;

&lt;p&gt;That was the line I did not want to fake. If the app claims to learn, the learning path should be visible in the product and stored in the local system, not hidden behind a demo prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Working App
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Local Profiles
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu48hchvpzk9ysalad2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu48hchvpzk9ysalad2k.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the event build, I dropped the deployment/auth complexity. No Google OAuth. No GitHub OAuth. No external auth wall. You run it locally and create as many local profiles as you want, one per email.&lt;/p&gt;

&lt;p&gt;Telegram remains available as an optional username identity shortcut, but the main path is local email/password profiles because that is what makes sense when the database is on your own machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dashboard And Assessment
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mfh4l5mjni9er6jbtjw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mfh4l5mjni9er6jbtjw.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dashboard is where Hermes becomes visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;current assessment progress&lt;/li&gt;
&lt;li&gt;score trend&lt;/li&gt;
&lt;li&gt;topic map&lt;/li&gt;
&lt;li&gt;active problem&lt;/li&gt;
&lt;li&gt;skill evolution timeline&lt;/li&gt;
&lt;li&gt;local profile data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New users go through a 3-problem calibration assessment. Hermes reviews those submissions and assigns a starting level like &lt;code&gt;foundation&lt;/code&gt;, &lt;code&gt;interview-ready&lt;/code&gt;, or &lt;code&gt;advanced&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expanded Problem Bank
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83rmjtk2lz7uuhoys2jy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83rmjtk2lz7uuhoys2jy.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The app now ships with &lt;strong&gt;208 original shared problems&lt;/strong&gt;, not scraped statements. The bank covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classic DSA patterns&lt;/li&gt;
&lt;li&gt;GSoC and open-source contribution tasks&lt;/li&gt;
&lt;li&gt;GSSoC-style community tasks&lt;/li&gt;
&lt;li&gt;Unstop-style hiring challenges&lt;/li&gt;
&lt;li&gt;backend, frontend, full stack, and web platform problems&lt;/li&gt;
&lt;li&gt;data/Kaggle-style tasks&lt;/li&gt;
&lt;li&gt;AI/ML and RAG tasks&lt;/li&gt;
&lt;li&gt;database, security, and system design basics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users can also create private custom problems. Those stay attached to that local profile only.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem Card To Solve Flow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ggkru8dqpnt56pq8d1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ggkru8dqpnt56pq8d1r.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clicking a problem is now a real flow, not a dead card. The card opens with the problem statement, examples, constraints, hints, status, and a &lt;code&gt;Solve&lt;/code&gt; button. That button creates or reuses a PrepPilot session and sends the learner into the full &lt;code&gt;/solve&lt;/code&gt; workspace.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Solve Workspace
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oblexitxfv6n6vsl9kc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oblexitxfv6n6vsl9kc.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The small modal is gone. Solving now happens in &lt;code&gt;/solve&lt;/code&gt;, a full-page workspace that fills the app like a serious practice tab:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem Selector tab&lt;/li&gt;
&lt;li&gt;Solve tab&lt;/li&gt;
&lt;li&gt;problem statement, examples, constraints, hints&lt;/li&gt;
&lt;li&gt;Monaco editor&lt;/li&gt;
&lt;li&gt;language selector&lt;/li&gt;
&lt;li&gt;submit button&lt;/li&gt;
&lt;li&gt;Hermes review output&lt;/li&gt;
&lt;li&gt;score breakdown&lt;/li&gt;
&lt;li&gt;1-5 helpfulness rating&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hermes Review
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fku5cfcuiku6b4z5uifb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fku5cfcuiku6b4z5uifb8.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a solution is submitted, Hermes stores the code, language, time taken, score, score breakdown, status, and feedback. Then it updates topic stats and the coaching profile.&lt;/p&gt;

&lt;p&gt;After the learner rates the feedback, Hermes can update the skill file itself. That is the loop I care about most: feedback does not disappear into a black box. It changes the coach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom Problems
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79hihywubappocfp506q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79hihywubappocfp506q.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If a learner wants to bring their own problem, they can. Custom problems use the exact same pipeline as seeded problems:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;custom problem -&amp;gt; session -&amp;gt; submit code -&amp;gt; Hermes review -&amp;gt; stats/profile update -&amp;gt; feedback rating -&amp;gt; skill evolution check&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That keeps the system honest. There is not one "real" path and one demo path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local Hermes Status
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsnt2jv5khegd187gkaa8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsnt2jv5khegd187gkaa8.png" alt=" " width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Settings page shows whether the local foundation is actually connected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI backend&lt;/li&gt;
&lt;li&gt;SQLite memory&lt;/li&gt;
&lt;li&gt;loaded Hermes skill versions&lt;/li&gt;
&lt;li&gt;WSL Ubuntu Hermes CLI&lt;/li&gt;
&lt;li&gt;heuristic or optional external inference mode&lt;/li&gt;
&lt;li&gt;Telegram configured or off&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On my machine, Hermes is installed in WSL Ubuntu:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/home/sreej/.local/bin/hermes
Hermes Agent v0.12.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That status is surfaced in the app itself through &lt;code&gt;GET /api/v1/hermes/status&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Hermes Is Used
&lt;/h2&gt;

&lt;p&gt;PrepPilot has three core Hermes skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;problem_selector.md&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This chooses what you should solve next. It looks at weak topics, recent scores, difficulty, assessment state, and problem freshness. It can pick from the 208 shared problems or from custom problems owned by the profile.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;solution_reviewer.md&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This reviews your code. The review is structured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correctness: 40 points&lt;/li&gt;
&lt;li&gt;complexity: 30 points&lt;/li&gt;
&lt;li&gt;edge cases: 20 points&lt;/li&gt;
&lt;li&gt;style: 10 points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The feedback is personalized by the profile Hermes has built about you.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;coaching_profiler.md&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the memory layer. It watches repeated submissions and updates the learner profile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recurring mistakes&lt;/li&gt;
&lt;li&gt;explanation style preference&lt;/li&gt;
&lt;li&gt;topic strengths&lt;/li&gt;
&lt;li&gt;weak areas&lt;/li&gt;
&lt;li&gt;pacing&lt;/li&gt;
&lt;li&gt;calibration level&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That profile feeds back into both the problem selector and reviewer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Learning Loop
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User solves a problem
  -&amp;gt; FastAPI session submission endpoint
  -&amp;gt; Hermes solution_reviewer
  -&amp;gt; score and feedback stored
  -&amp;gt; topic stats update
  -&amp;gt; coaching profile update
  -&amp;gt; user rates helpfulness
  -&amp;gt; Hermes checks whether a skill should improve
  -&amp;gt; new skill version appears in Skill Evolution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whole point is compounding context. A generic prep platform starts fresh every time. PrepPilot should feel like it remembers what happened last week.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Local Browser
  -&amp;gt; Next.js dashboard and /solve workspace
  -&amp;gt; FastAPI backend
  -&amp;gt; Hermes core layer
       - problem_selector
       - solution_reviewer
       - coaching_profiler
       - skill evolution
  -&amp;gt; SQLite memory
  -&amp;gt; WSL Ubuntu Hermes CLI status check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Telegram bot code exists, but for this event build I am intentionally keeping the demo local. That removes deployment risk, auth-provider setup, webhooks, and cloud database problems. The local product is the thing I want judges to run and feel.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Backend&lt;/span&gt;
py &lt;span class="nt"&gt;-3&lt;/span&gt;.10 &lt;span class="nt"&gt;-m&lt;/span&gt; venv backend/.venv
backend/.venv/Scripts/python.exe &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; backend/requirements.txt
backend/.venv/Scripts/python.exe &lt;span class="nt"&gt;-m&lt;/span&gt; backend.seed_problems
backend/.venv/Scripts/python.exe &lt;span class="nt"&gt;-m&lt;/span&gt; uvicorn backend.main:app &lt;span class="nt"&gt;--host&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Frontend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://127.0.0.1:3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a local profile, solve three calibration problems, and then check Settings to see the local Hermes connection.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Tested Locally
&lt;/h2&gt;

&lt;p&gt;For the final event build I tested:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local email/password profile creation&lt;/li&gt;
&lt;li&gt;duplicate email protection&lt;/li&gt;
&lt;li&gt;two profiles with separate progress&lt;/li&gt;
&lt;li&gt;expanded 208-problem bank&lt;/li&gt;
&lt;li&gt;problem card -&amp;gt; &lt;code&gt;/solve&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;code submission -&amp;gt; Hermes review&lt;/li&gt;
&lt;li&gt;resubmission after Hermes review&lt;/li&gt;
&lt;li&gt;feedback rating -&amp;gt; profile/skill path&lt;/li&gt;
&lt;li&gt;private custom problem creation and solving&lt;/li&gt;
&lt;li&gt;Settings Hermes status with WSL Ubuntu Hermes CLI&lt;/li&gt;
&lt;li&gt;no OpenRouter key required&lt;/li&gt;
&lt;li&gt;Telegram off unless explicitly configured&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where I Would Push Back
&lt;/h2&gt;

&lt;p&gt;The learning loop takes time to become impressive. On day one, the selector is mostly a thoughtful rule engine. Around session 8-10, the interesting behavior starts: difficulty changes, explanation style shifts, and problem choice begins to reflect the mistakes you keep making.&lt;/p&gt;

&lt;p&gt;That is also why I wanted the skill files visible. If the app claims it is learning, the learner should be able to inspect what changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Would Build Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Mock interview mode:&lt;/strong&gt; a timed two-problem round where Hermes behaves like an interviewer and writes a post-interview assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Company/program tracks:&lt;/strong&gt; GSoC, GSSoC, Unstop, backend interviews, frontend interviews, data roles, and system design foundations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment later:&lt;/strong&gt; I deliberately removed deployment from this build so the event demo is stable. Later, the same FastAPI backend can be packaged with a persistent database and webhook setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;There are a hundred interview prep tools. Most of them give you more problems.&lt;/p&gt;

&lt;p&gt;PrepPilot tries to give you a coach that notices patterns.&lt;/p&gt;

&lt;p&gt;It runs locally. It stores your history locally. It lets you solve real problems in a real editor. It reviews your code through Hermes. It updates its model of you. And then the next problem is not random anymore.&lt;/p&gt;

&lt;p&gt;That is the part I built this for.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;code&gt;https://github.com/ogMaverick12/preppilot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Hermes Agent: &lt;code&gt;https://hermes-agent.nousresearch.com&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built on Hermes Agent via Hermes Agent itself by Nous Research. Local runtime: FastAPI, SQLite, Next.js 14, WSL Ubuntu Hermes CLI. Event build: local-only, no cloud auth required, no external inference key required.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File.</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Sun, 24 May 2026 08:13:34 +0000</pubDate>
      <link>https://dev.to/sreejit_/everyones-talking-about-gemini-35-flash-the-real-story-at-google-io-2026-was-a-skill-file-4f3c</link>
      <guid>https://dev.to/sreejit_/everyones-talking-about-gemini-35-flash-the-real-story-at-google-io-2026-was-a-skill-file-4f3c</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Everyone walked away from Google I/O 2026 talking about Gemini 3.5 Flash benchmarks. Veo 3. Gemini Omni doing multimodal physics. The usual keynote sugar rush. Good stuff. Expected.&lt;/p&gt;

&lt;p&gt;But if you want to understand &lt;em&gt;why&lt;/em&gt; this I/O actually changes how developers build — not in theory, in production, this week — you need to look at something that got maybe four sentences in the developer keynote.&lt;/p&gt;

&lt;p&gt;A markdown file called &lt;code&gt;SKILL.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I didn't read about this. I ran it. Here's what actually happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Antigravity CLI Actually Creates (Not What the Slides Said)
&lt;/h2&gt;

&lt;p&gt;Every I/O recap I've read describes AGENTS.md as the agent configuration primitive. Clean. Simple. One file.&lt;/p&gt;

&lt;p&gt;That's not quite right. Here's what &lt;code&gt;/agents&lt;/code&gt; shows in a fresh Antigravity CLI 1.0.2 session on a real project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create New Agents
  Workspace: C:/Users/sreej/Downloads/Projects/SoilSense AI/.agents/agents/{agent_name}/agent.json
  Global:    C:\Users\sreej\.gemini\antigravity-cli\agents\{agent_name}\agent.json

▼ Available Agents
  • /default   Default agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent definitions are &lt;strong&gt;JSON&lt;/strong&gt;, not markdown. The markdown lives one level down — in skills:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Skills  
129 skills

Create new skills
  Workspace: ~/Downloads/Projects/SoilSense AI/.agents/skills/{skill_name}/SKILL.md
  Global:    ~/.gemini/antigravity-cli/skills/{skill_name}/SKILL.md
  Shared:    ~/.gemini/skills/{skill_name}/SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the actual structure is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
└── .agents/
    ├── agents/
    │   └── {agent_name}/
    │       └── agent.json      ← agent behavior (JSON)
    └── skills/
        └── {skill_name}/
            └── SKILL.md        ← reusable capabilities (markdown)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And Antigravity ships with &lt;strong&gt;129 built-in skills&lt;/strong&gt; already — everything from &lt;code&gt;agency-agentic-search-optimizer&lt;/code&gt; to &lt;code&gt;agency-code-reviewer&lt;/code&gt;. You're not starting from zero. You're extending a library.&lt;/p&gt;

&lt;p&gt;That's not a minor correction. That's a different mental model from what the keynote implied.&lt;/p&gt;




&lt;h2&gt;
  
  
  I Tested It on a Real Project
&lt;/h2&gt;

&lt;p&gt;I ran this on SoilSense AI — a Capacitor/Android app with an existing codebase, git history, and a &lt;code&gt;src/&lt;/code&gt; directory full of React components. Not a demo project. A real one.&lt;/p&gt;

&lt;p&gt;One prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create a skill for SoilSense AI that reviews any new component 
for accessibility issues before committing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What followed was not autocomplete. The agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read &lt;code&gt;package.json&lt;/code&gt; to understand the stack&lt;/li&gt;
&lt;li&gt;Scanned &lt;code&gt;src/&lt;/code&gt;, &lt;code&gt;src/lib/&lt;/code&gt;, &lt;code&gt;docs/PROJECT_STRUCTURE.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Checked &lt;code&gt;ListPermissions&lt;/code&gt; — confirmed read/write access&lt;/li&gt;
&lt;li&gt;Ran &lt;code&gt;git status&lt;/code&gt; to understand current state&lt;/li&gt;
&lt;li&gt;Proposed a hybrid approach and asked for approval before proceeding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The plan it proposed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A global AI agent skill (&lt;code&gt;soilsense-accessibility-reviewer&lt;/code&gt;) — a &lt;code&gt;SKILL.md&lt;/code&gt; that instructs the agent to audit git-staged components using LLM-level reasoning&lt;/li&gt;
&lt;li&gt;A standalone Python checker (&lt;code&gt;check-a11y.py&lt;/code&gt;) for static WCAG rule enforcement&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;pre-commit&lt;/code&gt; git hook that blocks commits containing critical violations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I typed &lt;code&gt;proceed&lt;/code&gt;. Here's what it built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create(~/.gemini/config/skills/soilsense-accessibility-reviewer/SKILL.md)
Create(~/.gemini/config/skills/soilsense-accessibility-reviewer/scripts/check_a11y.py)
Create(SoilSense AI/scripts/check-a11y.py)
Create(SoilSense AI/.git/hooks/pre-commit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then — without me asking — it created a mock broken component with intentional violations, staged it, and ran the hook against itself to verify.&lt;/p&gt;

&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5 Critical issues detected:
  - Missing alt tags
  - Custom clickable divs lacking tabIndex/onKeyDown handlers  
  - Empty button
  - Unlabeled form inputs

3 Warnings:
  - Redundant alt terms
  - Positive tabIndex anti-patterns
  - Unlabelled decorative SVG/Lucide icons

→ Commit blocked. Fix critical issues or use --no-verify to bypass.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It caught real violations, blocked the commit, displayed results in a console table, then cleaned up the mock component and reset git state. The pre-commit hook is now active in the SoilSense AI repo.&lt;/p&gt;

&lt;p&gt;One prompt. No orchestration code. No config files written by hand.&lt;/p&gt;

&lt;p&gt;That's the thing nobody is explaining in I/O coverage: the skill file didn't just change what the agent &lt;em&gt;knows&lt;/em&gt; — it changed what the agent &lt;em&gt;does to your repository&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gemini CLI Retirement Nobody Is Explaining Clearly
&lt;/h2&gt;

&lt;p&gt;Here's the detail buried in the Antigravity 2.0 announcement: &lt;strong&gt;Gemini CLI shuts down for consumer tiers on June 18, 2026.&lt;/strong&gt; That's not optional. Free tier, AI Pro, AI Ultra — same message for all.&lt;/p&gt;

&lt;p&gt;What you're migrating to:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gemini CLI&lt;/th&gt;
&lt;th&gt;Antigravity CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node.js runtime&lt;/td&gt;
&lt;td&gt;Go binary — zero runtime dependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GEMINI.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;agent.json&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.gemini/skills/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.agents/skills/{name}/SKILL.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini models only&lt;/td&gt;
&lt;td&gt;Gemini 3.5 Flash + Claude + GPT-OSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat-first&lt;/td&gt;
&lt;td&gt;Agent orchestration-first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Closed software&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The multi-model routing is worth pausing on. Antigravity CLI supports Claude and GPT-OSS models through the same interface — you're not locked to Gemini at the CLI layer. The Managed Agents API is Gemini 3.5 Flash specifically, but locally you have model choice.&lt;/p&gt;

&lt;p&gt;The last row is the one I keep thinking about. Gemini CLI was open source. Tens of thousands of contributors, forks, extensions built on it. Antigravity is closed. Google is moving developer tooling into its monetization stack and calling it an upgrade. That's accurate. It's also incomplete.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the 129 Built-In Skills Actually Signal
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;/skills&lt;/code&gt; showed 129 built-in skills, I scrolled through them. A few that caught my eye:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;agency-agentic-search-optimizer&lt;/code&gt; — audits whether AI agents can actually accomplish tasks on your site (WebMCP readiness)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agency-ai-data-remediation-engineer&lt;/code&gt; — self-healing data pipelines using air-gapped local SLMs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agency-autonomous-optimization-architect&lt;/code&gt; — shadow-tests APIs for performance while enforcing financial constraints&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agency-codebase-onboarding-engineer&lt;/code&gt; — helps new engineers understand unfamiliar codebases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't autocomplete improvements. They're &lt;em&gt;behaviors&lt;/em&gt; — things the agent will do autonomously when invoked. The skill file is the instruction set. The agent is the executor.&lt;/p&gt;

&lt;p&gt;The accessibility reviewer I built for SoilSense AI is now skill number 130. It lives at &lt;code&gt;~/.gemini/config/skills/soilsense-accessibility-reviewer/SKILL.md&lt;/code&gt;. Every future Antigravity session in that project can invoke it.&lt;/p&gt;

&lt;p&gt;That's the primitive. Not a feature. A composable unit of agent behavior that lives in version control.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I'd Push Back
&lt;/h2&gt;

&lt;p&gt;A few things I'm not ready to be hyped about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The closed-source problem is real.&lt;/strong&gt; Gemini CLI being open source meant the community could audit the tool that had file system access to their codebases. Antigravity is closed. The pre-commit hook it created runs code from &lt;code&gt;~/.gemini/config/skills/&lt;/code&gt; — a path Google controls the contents of at install time. For personal projects, fine. For anything enterprise, you need answers about what the agent runtime can and can't do with your code before you're committed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;proceed&lt;/code&gt; is doing a lot of work.&lt;/strong&gt; The agent asked for approval before executing. I typed &lt;code&gt;proceed&lt;/code&gt; without reading the full implementation plan. It created files in four locations, modified git hooks, and ran &lt;code&gt;git commit&lt;/code&gt; against a real repository. The workflow assumes you'll review the plan carefully. In practice, under deadline pressure, most developers won't. That's a governance problem, not a technical one — but it's the kind of thing that causes incidents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill scope creep is easy.&lt;/strong&gt; The accessibility reviewer skill is global — it lives in &lt;code&gt;~/.gemini/config/skills/&lt;/code&gt;, not in the SoilSense AI project directory. That means it's available in &lt;em&gt;every&lt;/em&gt; Antigravity session across &lt;em&gt;every&lt;/em&gt; project on this machine. That's convenient. It's also how you end up with 60 global skills that conflict with each other in ways that are hard to debug. Antigravity's skill priority system (Workspace → Global → Shared) handles this, but you have to know it exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started (Windows, since that's what I actually used)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download from https://antigravity.google.com/download&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# Or via winget (if available in your region)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;winget&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Google.AntigravityCLI&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Navigate to your project&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"C:\Users\you\Projects\your-project"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Launch&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;agy&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Inside the shell — explore what's available&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;/skills&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c"&gt;# See 129 built-in skills + any you've created&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;/agents&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c"&gt;# See available agents (just /default to start)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Create your first skill with a plain English prompt&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# Example: "create a skill that enforces our API response schema before any PR"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start with &lt;code&gt;/skills&lt;/code&gt; before writing anything. There's a good chance what you want already exists in the 129 built-ins. The skill creator workflow (plain English → agent builds SKILL.md + supporting scripts + tests) is the fastest path to something that actually runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Take
&lt;/h2&gt;

&lt;p&gt;Google didn't ship a better autocomplete at I/O 2026. They shipped a runtime for agent behavior — and gave you a text file as the configuration interface.&lt;/p&gt;

&lt;p&gt;One prompt to Antigravity CLI created a WCAG accessibility reviewer, a Python static analysis engine, a git pre-commit hook, and a self-verification test — for a real Android/Capacitor project I'm actually building. The commit hook is active right now. It will block the next accessibility violation before it hits the repo.&lt;/p&gt;

&lt;p&gt;The Gemini 3.5 Flash benchmarks will be obsolete in six months. A skill file that enforces your team's standards on every commit — that compounds.&lt;/p&gt;

&lt;p&gt;The platform is impressive. The 130th skill is what makes it real.&lt;/p&gt;

&lt;p&gt;What would you build as your first custom skill — a linter rule, a PR description generator, or something specific to your stack? Especially curious if anyone has gotten workspace-scoped skills working alongside the global ones without conflicts.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Kimi WebBridge just gave AI agents hands inside your browser — and kept your data local</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Tue, 19 May 2026 16:49:57 +0000</pubDate>
      <link>https://dev.to/sreejit_/kimi-webbridge-just-gave-ai-agents-hands-inside-your-browser-and-kept-your-data-local-b76</link>
      <guid>https://dev.to/sreejit_/kimi-webbridge-just-gave-ai-agents-hands-inside-your-browser-and-kept-your-data-local-b76</guid>
      <description>&lt;p&gt;Most AI browser automation tools pipe your sessions through their cloud. Kimi WebBridge doesn't. That's the entire point.&lt;/p&gt;

&lt;h2&gt;
  
  
  What dropped
&lt;/h2&gt;

&lt;p&gt;On May 15, 2026, Moonshot AI shipped &lt;a href="https://kimi.com/features/webbridge" rel="noopener noreferrer"&gt;Kimi WebBridge&lt;/a&gt; — a Chrome/Edge extension paired with a local background service that lets AI agents operate your browser the way you would. Click, scroll, type, fill forms, extract data, take screenshots. All of it.&lt;/p&gt;

&lt;p&gt;The key architectural decision: everything runs through &lt;strong&gt;Chrome DevTools Protocol&lt;/strong&gt; on your machine. Your cookies, your logged-in sessions, your bank dashboard, your internal tools — the agent can touch all of it and Moonshot never sees a byte of it. It's not a sandboxed headless browser. It's your actual Chrome window, with all your existing logins intact.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model behind it
&lt;/h2&gt;

&lt;p&gt;WebBridge runs on the Kimi K2 family. If you haven't been tracking Moonshot AI, here's a quick picture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 trillion parameters (MoE)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SWE-Bench Pro (K2.6)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;58.6% — #1 overall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel sub-agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to 300, across 4,000 steps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;K2.6 (April 2026) sits at &lt;strong&gt;58.6%&lt;/strong&gt; on SWE-Bench Pro — ahead of GPT-5.4 at 57.7% and Claude Opus 4.6 at 53.4%. Open-source. Mixture-of-experts.&lt;/p&gt;

&lt;p&gt;And if the name Kimi only just landed on your radar, the Cursor controversy is why — Cursor's Composer 2 launched in March marketing "frontier-level proprietary intelligence," and a dev later identified it as K2.5 under the hood. Elon Musk confirmed it in a post. Awkward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent-agnostic by design
&lt;/h2&gt;

&lt;p&gt;This is the part that actually matters for builders: WebBridge isn't locked to Kimi's own model. It officially supports &lt;strong&gt;Claude Code, Cursor, Codex, Hermes, Kimi Code CLI, and OpenClaw&lt;/strong&gt; as driving agents. You install it, paste a connection command into your agent, and it links to the local WebBridge service automatically. The extension becomes a universal browser-control layer — model-agnostic infrastructure for agentic workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — core install
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt; Download the &lt;strong&gt;Kimi Desktop App&lt;/strong&gt; from &lt;a href="https://kimi.com/features/webbridge" rel="noopener noreferrer"&gt;kimi.com/features/webbridge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt; Install the extension from the &lt;a href="https://chromewebstore.google.com/search/kimi%20webbridge" rel="noopener noreferrer"&gt;Chrome Web Store&lt;/a&gt; or Edge Add-ons. Pin it to your toolbar so you can see the connection status at a glance.&lt;br&gt;
Windows users, run this in PowerShell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://kimi-web-img.moonshot.cn/webbridge/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3.&lt;/strong&gt; Open the Desktop App → find &lt;strong&gt;Kimi Claw&lt;/strong&gt; in the left sidebar → add a new Claw → select &lt;strong&gt;"On my computer."&lt;/strong&gt; Local service is now running.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   or
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Directly install the extension from the &lt;a href="https://chromewebstore.google.com/search/kimi%20webbridge" rel="noopener noreferrer"&gt;Chrome Web Store&lt;/a&gt; or Edge Add-ons.&lt;br&gt;
Windows users, run this in PowerShell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://kimi-web-img.moonshot.cn/webbridge/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MacOS users, run this in Terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;curl -fsSL https://kimi-web-img.moonshot.cn/webbridge/install.sh | bash
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 2 — connect your agent
&lt;/h3&gt;

&lt;p&gt;During install, Kimi automatically drops skill files into Claude Code, Codex, Hermes, and other supported agents. After that it's a one-liner per agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code / Codex / Hermes&lt;/strong&gt;&lt;br&gt;
Skill file is pre-installed. Just invoke the slash command and WebBridge connects automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/kimi-webbridge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cursor / other agents&lt;/strong&gt;&lt;br&gt;
Copy the connection command from the official setup page and paste it into your agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://kimi-web-img.moonshot.cn/webbridge/connect.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verify the connection&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kimi-webbridge status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Should say &lt;code&gt;Connected&lt;/code&gt;. If it says &lt;code&gt;Disconnected&lt;/code&gt;, make sure the Desktop App is running first, then re-run the connection command.&lt;/p&gt;




&lt;p&gt;Once connected, prompt your agent naturally — WebBridge handles the browser side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/kimi-webbridge Go to LinkedIn, find 2 senior AI engineers at top AI companies,
return a CSV with their name, profile URL, and current role.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;blockquote&gt;
&lt;p&gt;The space is getting crowded — Claude's computer use, OpenAI Operator, Perplexity Comet. The differentiator Kimi is betting on is simple: your data doesn't leave your machine. For enterprise use cases, internal dashboards, anything auth-gated — that's not a small thing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Worth watching closely. The open-source MoE model that secretly powered Cursor's flagship feature is now building infrastructure. That's a statement of intent.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kimi.com/features/webbridge" rel="noopener noreferrer"&gt;Official setup page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://decrypt.co/367916/kimi-webbridge-ai-agents-browser-local" rel="noopener noreferrer"&gt;Decrypt deep-dive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2026/05/kimi-webbridge/" rel="noopener noreferrer"&gt;Hands-on guide — Analytics Vidhya&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
    <item>
      <title>“Most AI agents forget everything between sessions.
I ran Hermes continuously for 7 days to see what would happen.”</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Mon, 18 May 2026 08:47:02 +0000</pubDate>
      <link>https://dev.to/sreejit_/most-ai-agents-forget-everything-between-sessions-i-ran-hermes-continuously-for-7-days-to-see-5bg8</link>
      <guid>https://dev.to/sreejit_/most-ai-agents-forget-everything-between-sessions-i-ran-hermes-continuously-for-7-days-to-see-5bg8</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8" class="crayons-story__hidden-navigation-link"&gt;I Ran Hermes Agent on the Same Task for 7 Days. The Skill File on Day 7 Looked Nothing Like Day 1.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;Hermes Agent Challenge Submission&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/sreejit_" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904430%2Fbd74576f-cef8-4620-a63e-8a001f1e9d6c.png" alt="sreejit_ profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/sreejit_" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Sreejit Pradhan
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Sreejit Pradhan
                &lt;a href="/++"&gt;&lt;img alt="Subscriber" class="subscription-icon" src="https://assets.dev.to/assets/subscription-icon-805dfa7ac7dd660f07ed8d654877270825b07a92a03841aa99a1093bd00431b2.png"&gt;&lt;/a&gt;
              
              &lt;div id="story-author-preview-content-3681236" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/sreejit_" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904430%2Fbd74576f-cef8-4620-a63e-8a001f1e9d6c.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Sreejit Pradhan&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 16&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8" id="article-link-3681236"&gt;
          I Ran Hermes Agent on the Same Task for 7 Days. The Skill File on Day 7 Looked Nothing Like Day 1.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/hermesagentchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;hermesagentchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;25&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              19&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            12 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>agents</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>“What good is AI if it stops working the moment the internet dies?
Built an offline Gemma 4 farm doctor for real-world rural use.”</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Mon, 18 May 2026 08:45:56 +0000</pubDate>
      <link>https://dev.to/sreejit_/what-good-is-ai-if-it-stops-working-the-moment-the-internet-dies-built-an-offline-gemma-4-farm-48a1</link>
      <guid>https://dev.to/sreejit_/what-good-is-ai-if-it-stops-working-the-moment-the-internet-dies-built-an-offline-gemma-4-farm-48a1</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b" class="crayons-story__hidden-navigation-link"&gt;Everyone's Talking About Bigger AI Models. I Built a Gemma 4 Farm Doctor That Works When the Internet Doesn't.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;Gemma 4 Challenge: Build With Gemma 4 Submission&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/sreejit_" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904430%2Fbd74576f-cef8-4620-a63e-8a001f1e9d6c.png" alt="sreejit_ profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/sreejit_" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Sreejit Pradhan
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Sreejit Pradhan
                &lt;a href="/++"&gt;&lt;img alt="Subscriber" class="subscription-icon" src="https://assets.dev.to/assets/subscription-icon-805dfa7ac7dd660f07ed8d654877270825b07a92a03841aa99a1093bd00431b2.png"&gt;&lt;/a&gt;
              
              &lt;div id="story-author-preview-content-3692022" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/sreejit_" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904430%2Fbd74576f-cef8-4620-a63e-8a001f1e9d6c.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Sreejit Pradhan&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 18&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b" id="article-link-3692022"&gt;
          Everyone's Talking About Bigger AI Models. I Built a Gemma 4 Farm Doctor That Works When the Internet Doesn't.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/gemmachallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;gemmachallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/gemma"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;gemma&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/git"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;git&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;16&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              2&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Everyone's Talking About Bigger AI Models. I Built a Gemma 4 Farm Doctor That Works When the Internet Doesn't.</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Mon, 18 May 2026 08:40:55 +0000</pubDate>
      <link>https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b</link>
      <guid>https://dev.to/sreejit_/everyones-talking-about-bigger-ai-models-i-built-a-gemma-4-farm-doctor-that-works-when-the-3j1b</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Google's AI story is usually told from the top of the stack.&lt;/p&gt;

&lt;p&gt;Bigger models. Better reasoning. More multimodal demos. More cloud endpoints.&lt;/p&gt;

&lt;p&gt;That is useful. But there is a different question that kept nagging at me:&lt;/p&gt;

&lt;p&gt;What happens when the person who needs the AI is not sitting in a perfect cloud environment? What happens when they are on a farm, the internet is weak, the data is local, and the decision is not abstract at all?&lt;/p&gt;

&lt;p&gt;That is why I built &lt;strong&gt;SoilSense AI&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;SoilSense AI is an offline-first farm intelligence app powered by Gemma 4. It runs on a PC, phone, tablet, or Raspberry Pi hub. It connects farm profiles, live soil sensor readings, plant analysis, chat history, and local memory into one app — and it works without a cloud account.&lt;/p&gt;

&lt;p&gt;The core problem it solves: most AI agriculture demos skip the boring part. They show a nice chat box, the farmer asks a question, the model gives a confident answer. But real farms are not a single prompt.&lt;/p&gt;

&lt;p&gt;A farmer may have one field for vegetables, another for flowers, another for fruit. Each has different crops, soil conditions, irrigation patterns, and disease risks. If those readings get mixed together, the AI advice becomes worse than useless. It becomes confidently wrong.&lt;/p&gt;

&lt;p&gt;There is also the connectivity problem. A system that only works when the cloud is reachable is not good enough for many rural environments.&lt;/p&gt;

&lt;p&gt;So SoilSense is built on a different assumption: &lt;strong&gt;the farm is the source of truth. Gemma 4 is the reasoning layer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the app includes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-farm profiles — fruit farms, flower farms, vegetable farms, greenhouses, and field zones&lt;/li&gt;
&lt;li&gt;Farm-scoped sensor feeds, so each farm only sees its own pH, moisture, temperature, and NPK readings&lt;/li&gt;
&lt;li&gt;A local bridge that receives sensor packets over HTTP, stores readings locally, and streams live updates over WebSocket&lt;/li&gt;
&lt;li&gt;Gemma 4 analysis that receives the active farm profile, current sensor snapshot, sensor freshness, node identity, prior verdicts, chat memory, and an optional plant image&lt;/li&gt;
&lt;li&gt;Persistent local memory — analyses and chats are stored per farm, not lost after one session&lt;/li&gt;
&lt;li&gt;Three deployment paths: API mode (internet), phone-local Gemma (supported Android devices), or a PC/Pi hub over LAN&lt;/li&gt;
&lt;li&gt;Judge Mode for reproducible demos when physical hardware is not available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important part is not just that Gemma answers farm questions. It is that Gemma answers with the &lt;strong&gt;farm's context already in front of it&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Demo video: &lt;a href="https://youtu.be/EJ6397EinNw?si=a9Ei3Fxmsh_UHHjV" rel="noopener noreferrer"&gt;https://youtu.be/EJ6397EinNw?si=a9Ei3Fxmsh_UHHjV&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/ogMaverick12/soilsense-ai/tree/beta" rel="noopener noreferrer"&gt;https://github.com/ogMaverick12/soilsense-ai/tree/beta&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The demo flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First launch and farm profile setup&lt;/li&gt;
&lt;li&gt;Gemma setup — API, phone-local, or PC/Pi hub&lt;/li&gt;
&lt;li&gt;Sensor pairing via QR payload&lt;/li&gt;
&lt;li&gt;Judge Mode replay — 9 sensor packets across 3 farms&lt;/li&gt;
&lt;li&gt;Switching between fruit, flower, and vegetable farms while watching sensor readings change&lt;/li&gt;
&lt;li&gt;Running Gemma 4 analysis using live farm context&lt;/li&gt;
&lt;li&gt;Viewing saved history and local memory&lt;/li&gt;
&lt;li&gt;About, Terms &amp;amp; Conditions, and Gemma 4 Hackathon links in the app footer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One honest note: I do not currently have physical sensor hardware connected for the submission demo. The demo uses &lt;strong&gt;Judge Mode&lt;/strong&gt; — it replays sensor packets through the same local bridge endpoints that real ESP32 and Raspberry Pi nodes use. It proves the ingestion and routing pipeline, but it is replayed data, not live hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/ogMaverick12/soilsense-ai/tree/beta" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tech stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;React + Vite for the UI&lt;/li&gt;
&lt;li&gt;Express local bridge (&lt;code&gt;bridge/server.mjs&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;WebSocket for live sensor streaming&lt;/li&gt;
&lt;li&gt;Farm-scoped HTTP sensor API with &lt;code&gt;farmId&lt;/code&gt; and &lt;code&gt;sensorNodeId&lt;/code&gt; on every reading&lt;/li&gt;
&lt;li&gt;QR pairing payloads for hardware sensor nodes&lt;/li&gt;
&lt;li&gt;Electron for Windows desktop packaging&lt;/li&gt;
&lt;li&gt;Capacitor for Android wrapping&lt;/li&gt;
&lt;li&gt;Ollama/OpenAI-compatible local bridge for Gemma-family models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project structure keeps concerns clean: &lt;code&gt;src/&lt;/code&gt; has the React app and app logic, &lt;code&gt;bridge/&lt;/code&gt; has the local hub, &lt;code&gt;desktop/&lt;/code&gt; has the Electron shell, &lt;code&gt;android/&lt;/code&gt; has the Capacitor project, and &lt;code&gt;scripts/&lt;/code&gt; has Pi and Windows launchers. Docs cover setup, mobile deployment, Raspberry Pi use, sensor integration, and packaging.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;SoilSense is intentionally &lt;strong&gt;Gemma-only&lt;/strong&gt;. The UI and local bridge both reject non-Gemma model tags. That matters because I did not want the submission to quietly become a generic model-switcher wrapper. The point is to show Gemma 4 as the reasoning engine for a local-first farm workflow, not as one option among many.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model strategy is hardware-aware:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gemma4:e2b&lt;/code&gt; — preferred for Raspberry Pi and weak edge devices&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gemma4:e4b&lt;/code&gt; — the balanced local desktop path (detected and tested on my machine through Ollama)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;google/gemma-4-31b-it&lt;/code&gt; — the API path for stronger disease reasoning when internet is available&lt;/li&gt;
&lt;li&gt;Phone-local Android mode — available on devices that support the Google on-device GenAI runtime; unsupported phones are guided toward PC/Pi hub mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For analysis, Gemma receives structured context:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active farm profile and crop selection&lt;/li&gt;
&lt;li&gt;Current sensor snapshot (pH, moisture, temperature, N/P/K)&lt;/li&gt;
&lt;li&gt;Sensor freshness and latency&lt;/li&gt;
&lt;li&gt;Sensor node identity&lt;/li&gt;
&lt;li&gt;Previous analysis history&lt;/li&gt;
&lt;li&gt;Recent chat memory&lt;/li&gt;
&lt;li&gt;Optional plant image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The output is also structured.&lt;/strong&gt; SoilSense prompts Gemma for a JSON response covering health status, primary issue, urgency, confidence, disease evidence, soil interpretation, sensor correlation, memory used, spoken alert, and next actions. That lets the UI show a practical verdict instead of dumping raw prose into a chat window.&lt;/p&gt;

&lt;p&gt;Gemma also powers a free-form farm chat mode. Even there, the selected farm's local memory is passed as context — so chat answers are grounded in that farm's history, not a blank slate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local evidence from my test machine&lt;/strong&gt; (Windows x64, 8 CPU threads, 15.8 GB RAM, &lt;code&gt;gemma4:e4b&lt;/code&gt; via Ollama):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;Honesty note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript check&lt;/td&gt;
&lt;td&gt;Passed (&lt;code&gt;npm run typecheck&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Static check only, not a full end-to-end test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production build&lt;/td&gt;
&lt;td&gt;Passed (&lt;code&gt;npm run build&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;dist/&lt;/code&gt; is generated output, not committed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows installer&lt;/td&gt;
&lt;td&gt;Built (&lt;code&gt;SoilSense AI Setup 0.1.0.exe&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Attached to GitHub Releases, not the repo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Android debug APK&lt;/td&gt;
&lt;td&gt;Built via Capacitor + Gradle&lt;/td&gt;
&lt;td&gt;Debug build unless signed release is created&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup health endpoint&lt;/td&gt;
&lt;td&gt;Passed: app, LAN, sensor, node, and model checks&lt;/td&gt;
&lt;td&gt;Depends on local machine and network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Judge Mode replay&lt;/td&gt;
&lt;td&gt;Accepted 9 packets across 3 farms&lt;/td&gt;
&lt;td&gt;Replayed packets, not physical hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Farm-scoped sensor route&lt;/td&gt;
&lt;td&gt;Returned orchard readings for &lt;code&gt;judge-fruit-orchard&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Proves routing by &lt;code&gt;farmId&lt;/code&gt; through the bridge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Gemma detection&lt;/td&gt;
&lt;td&gt;Detected &lt;code&gt;gemma4:e4b&lt;/code&gt; through Ollama&lt;/td&gt;
&lt;td&gt;Recommendation changes by hardware and installed models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Gemma smoke test&lt;/td&gt;
&lt;td&gt;Returned a response in ~70 seconds&lt;/td&gt;
&lt;td&gt;Latency depends heavily on hardware and model warm state&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last number is not a universal benchmark. Local inference speed depends on hardware, model size, and warm state. But it proves the local path is wired end to end on my machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Would Push Back On My Own Project
&lt;/h2&gt;

&lt;p&gt;SoilSense does not prove field-grade sensor calibration. Low-cost pH and NPK sensors can be noisy. A real deployment needs calibration records, placement guidance, and local agronomy validation.&lt;/p&gt;

&lt;p&gt;It does not prove every Android phone can run local Gemma. Phone-local support depends on the Android on-device GenAI runtime and device capability — which is exactly why the PC/Pi hub mode exists as the fallback.&lt;/p&gt;

&lt;p&gt;It is also not a replacement for local experts, lab tests, pesticide labels, or official agricultural guidance. SoilSense is decision support. It is not a magic agronomist in a box.&lt;/p&gt;

&lt;p&gt;I think that honesty makes the project stronger. The architecture is built for the real constraint: use local data, keep the farmer in control, and let Gemma reason over context instead of isolated prompts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;The most exciting AI apps are not always the ones with the largest model call.&lt;/p&gt;

&lt;p&gt;Sometimes the real product is the boring glue around the model — local data, memory, sensor routing, setup flows, privacy boundaries, and a UI that someone can actually use on a farm with weak internet.&lt;/p&gt;

&lt;p&gt;That is what SoilSense AI is trying to be.&lt;/p&gt;

&lt;p&gt;Not just "Gemma, answer a farm question."&lt;/p&gt;

&lt;p&gt;Gemma 4 with the farm profile, the sensor feed, the history, the crop, and the constraints in front of it.&lt;/p&gt;

&lt;p&gt;That is the difference between a chatbot and a tool. And for farmers working with local hardware and unreliable connectivity, that difference matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/ogMaverick12/soilsense-ai" rel="noopener noreferrer"&gt;https://github.com/ogMaverick12/soilsense-ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>git</category>
    </item>
    <item>
      <title>I Tested Gemma 4 E4B vs 31B on 50 Real Student Career Queries — The Results Surprised Me</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Sun, 17 May 2026 05:33:24 +0000</pubDate>
      <link>https://dev.to/sreejit_/i-tested-gemma-4-e4b-vs-31b-on-50-real-student-career-queries-the-results-surprised-me-kbi</link>
      <guid>https://dev.to/sreejit_/i-tested-gemma-4-e4b-vs-31b-on-50-real-student-career-queries-the-results-surprised-me-kbi</guid>
      <description>&lt;p&gt;I'm building &lt;strong&gt;PathForge AI&lt;/strong&gt; — a career guidance platform for Indian students. The pitch is simple: AI-powered counselling for students who can't afford a human counsellor. The engineering problem underneath is not simple at all.&lt;/p&gt;

&lt;p&gt;When Gemma 4 dropped in April 2026, I had a decision to make. The family ships four models. I had two obvious candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;E4B&lt;/strong&gt; (~4.5B effective params): runs locally on a mid-range phone, free, completely private&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;31B Dense&lt;/strong&gt;: server-side via API, costs real money per query, much slower&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The conventional wisdom was clear: small model for quick tasks, big model for complex reasoning, route intelligently. Done.&lt;/p&gt;

&lt;p&gt;Except I didn't trust the conventional wisdom. So I ran 50 real queries through both models — actual queries from PathForge AI's private beta — and measured everything: output quality, schema compliance, latency, and cost per query.&lt;/p&gt;

&lt;p&gt;The results were not what I expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What I was testing:&lt;/strong&gt; Career guidance queries from real Indian students (anonymised). Not clean test prompts. Messy, code-switched, emotionally loaded, often under-specified — exactly the way real users type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query categories (50 total):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple eligibility check&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;"Can I apply for NSP if family income is 2.8L?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-path career question&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;"PCB student, 78%, interested in AI field, what options?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-constraint planning&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;"JEE rank 52000, budget 4L/year, prefer Karnataka, open to abroad if full scholarship"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ambiguous / emotional&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;"parents want CA but I want game dev, marks average, what should I do honestly"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Scoring rubric (blind, three evaluators, averaged):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Constraint compliance — did it actually honour all stated constraints?&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema fidelity — valid parseable JSON matching our output spec?&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Practical accuracy — is the career/institution advice actually correct?&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tone — does it read like a counsellor, not a Wikipedia article?&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;E4B: Q4_K_M quantised GGUF, llama.cpp, laptop (16GB RAM, no dedicated GPU). Simulating a real developer machine serving requests.&lt;/li&gt;
&lt;li&gt;31B Dense: Gemma 4 31B endpoint via Gemini API. Server-side, billed per token.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Headline results first. Details follow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overall Score (out of 10)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;E4B&lt;/th&gt;
&lt;th&gt;31B Dense&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple eligibility&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.4&lt;/td&gt;
&lt;td&gt;E4B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-path career&lt;/td&gt;
&lt;td&gt;7.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.9&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-constraint planning&lt;/td&gt;
&lt;td&gt;5.1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ambiguous / emotional&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7.6&lt;/td&gt;
&lt;td&gt;E4B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall average&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Latency
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Median&lt;/th&gt;
&lt;th&gt;P95&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;E4B (local, no GPU)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31B Dense (API)&lt;/td&gt;
&lt;td&gt;9.4s&lt;/td&gt;
&lt;td&gt;17.2s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cost per Query
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;E4B local&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;₹0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31B via Gemini API&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~₹0.13&lt;/strong&gt; (~$0.0015 USD)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At projected 50,000 queries/month, running 31B for everything = &lt;strong&gt;₹6,500/month&lt;/strong&gt;. E4B for everything = essentially ₹0.&lt;/p&gt;

&lt;p&gt;Sounds like an obvious choice. Here's why it isn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 1: Simple Eligibility — E4B Wins
&lt;/h2&gt;

&lt;p&gt;Expected result. Give both models a bounded factual question and the smaller one handles it fine. What I didn't expect was &lt;em&gt;how&lt;/em&gt; E4B won.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"my family income is 2.8 lakhs, i'm in 11th, can i get NSP scholarship? SC category"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;E4B output (9/10):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"eligible"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scheme"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NSP Post-Matric Scholarship"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"income_cutoff_met"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category_benefit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SC category qualifies for higher scholarship amount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"next_step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Register on scholarships.gov.in after Class 12 results"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"caution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Student is in Class 11 — Post-Matric applies from Class 12 onwards. Apply in first month of Class 12 admission."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;caution&lt;/code&gt; flag — catching that the student is currently in Class 11 so the application timing is wrong — wasn't prompted for. E4B inferred it from the grade level stated in the query. Proactive, correct, and actually useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;31B output (8/10):&lt;/strong&gt; Correct, but added an unrequested &lt;code&gt;"general_advice"&lt;/code&gt; field outside our schema, requiring stripping in post-processing. Small thing. At 50,000 queries/month it's not small.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; E4B is faster, free, and &lt;em&gt;more schema-compliant&lt;/em&gt; on simple queries. Route these locally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 2: Single-Path Career Questions — 31B Wins Clearly
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Query:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"PCB student, 78% in boards, interested in AI and biotech both, family income 6L, can spend upto 8L for full course, which way should I go?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;E4B output (6/10):&lt;/strong&gt;&lt;br&gt;
Identified Bioinformatics correctly as the overlap zone. Listed three institution options. One problem: it recommended a private university in Pune at ₹3.2L/year — which is ₹12.8L for four years. Over the stated ₹8L total budget. E4B either didn't do the multiplication or lost the budget constraint between the reasoning step and the institution selection step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;31B output (9/10):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommended_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"B.Sc. Bioinformatics → M.Tech. Computational Biology"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sits at the intersection of PCB background and AI interest. Growing field with government push under BioE3 Policy 2023."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"institutions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"University of Hyderabad"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Central University"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"annual_fee"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"total_cost_estimate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;72000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"budget_check"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PASS — well within ₹8L total"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"entrance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CUET-UG"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Amrita Vishwa Vidyapeetham"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Private (NAAC A++)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"annual_fee"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;185000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"total_cost_estimate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;740000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"budget_check"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PASS — within ₹8L total"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"entrance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AEEE / Direct admission"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scholarship_flags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"CSIR-UGC JRF eligible post-graduation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"DST INSPIRE — check if boards percentage qualifies for top 1% state cutoff"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 31B computed total cost (annual fee × 4), checked it against the budget, and labelled each result PASS/FAIL without being asked. It also flagged DST INSPIRE proactively — exactly the counselling behaviour that makes the difference between a generic AI answer and a useful one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt; E4B loses the thread of a constraint when it has to maintain it across multiple reasoning steps inside a single output. 31B doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 3: Multi-Constraint Planning — Biggest Gap (E4B: 5.1 vs 31B: 9.1)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Query:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"JEE mains rank around 52000, family income 3.8 lakhs, want to stay in south india preferably karnataka, open to abroad only if full scholarship no loan, interested in CS or ECE, parents want govt job security, I want startup eventually"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Six simultaneous constraints, including one internal tension (parents vs student). This is representative of a real complex session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E4B output (4/10):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Listed NIT Surathkal as a viable option. NIT Surathkal CS closing rank in 2025 was approximately 8,500. The student's rank is 52,000. That's not close. The model generated a confident recommendation that was factually wrong about a piece of information that's directly determinative of whether the advice works.&lt;/p&gt;

&lt;p&gt;It also listed two German universities as abroad options (correct — near-zero tuition) but added: &lt;em&gt;"loan may be required for living costs"&lt;/em&gt; — directly contradicting the "no loan" constraint stated in the query.&lt;/p&gt;

&lt;p&gt;A student acting on E4B's output for this query would apply to a college they have near-zero chance of getting into, and might take a loan they said they couldn't take.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;31B output (9/10):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jee_rank_analysis"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rank"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;52000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"nit_flagship_verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"UNLIKELY — NIT Surathkal/Trichy/Warangal CS/ECE cutoffs are 5,000–15,000. Do not apply without checking current year cutoffs."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"viable_nits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"NIT Goa CS (~45,000–58,000)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"NIT Puducherry ECE (~50,000–62,000)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"state_alternative"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Karnataka KCET is entirely separate from JEE — strong option with equivalent state rank. Check KCET rank via conversion tool."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"abroad_assessment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"constraint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Full scholarship, zero loan — hard limit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"germany"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DAAD scholarships exist and cover living costs — viable but competitive. Include only if student is top academic performer overall."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"excluded_with_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"USA"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Full cost-of-attendance scholarship near-impossible at this academic profile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"UK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Tuition fees too high to cover without loan given income level"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parent_student_tension"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"observation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Government security preference and startup aspiration are reconcilable — not opposing paths."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"framing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DRDO, C-DAC, and ISRO R&amp;amp;D tracks increasingly resemble startup environments. Bengaluru government R&amp;amp;D → startup transition is a well-documented career arc. CS + ECE builds infrastructure for both."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"recommended_narrative"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Frame to parents as government R&amp;amp;D with startup optionality, not startup instead of stability."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tension-handling section got a unanimous "this is exactly right" from all three evaluators. It didn't pick the student's side or the parents' side. It reframed the conflict as a phased path. That's what a good counsellor does. E4B didn't attempt to address the tension at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 4: Ambiguous / Emotional Queries — E4B Wins Unexpectedly
&lt;/h2&gt;

&lt;p&gt;This was the genuine surprise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"parents want CA but I want game dev, marks are average, what should I do honestly"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I expected 31B to win here because nuance requires capacity. It didn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E4B output (8/10):&lt;/strong&gt;&lt;br&gt;
Short. Direct. No hedging. Acknowledged the conflict in one sentence, gave a concrete middle-ground (BBA + game dev certification track), named two Indian studios that hire from non-CS backgrounds (Nodding Heads, Rockstar India Pune), and closed with:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Both paths are real. The question is which regret you can live with more."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;31B output (7/10):&lt;/strong&gt;&lt;br&gt;
Structurally excellent. Balanced. Full of caveats. Longer. One evaluator wrote: &lt;em&gt;"technically correct, emotionally inert."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is a real pattern: large dense models over-optimise for completeness and under-optimise for voice on short emotional queries. E4B, with its smaller output budget, was forced to be direct. The directness worked. For a stressed 17-year-old reading this at midnight, "technically correct, emotionally inert" is a failure mode that matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Routing Logic I'm Now Running
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;QUERY&lt;/span&gt; &lt;span class="n"&gt;ROUTING&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;PathForge&lt;/span&gt; &lt;span class="n"&gt;AI&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;query_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eligibility_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;E4B&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;          &lt;span class="c1"&gt;# Fast, free, more schema-compliant
&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;query_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emotional&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ambiguous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;E4B&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;          &lt;span class="c1"&gt;# Brevity is a feature, not a limitation
&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;query_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;single_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;constraint_count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;E4B&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;          &lt;span class="c1"&gt;# Handles 80% correctly; retry on parse error → 31B
&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;query_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multi_constraint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;constraint_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt; &lt;span class="n"&gt;via&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt;  &lt;span class="c1"&gt;# ₹0.13/query, worth it
&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;query_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_plan_generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;
    &lt;span class="c1"&gt;# Full profile + institution corpus + scholarship ruleset loaded in one pass
&lt;/span&gt;    &lt;span class="c1"&gt;# No RAG, no retrieval miss, full coherence
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At our projected query mix — roughly 60% eligibility/emotional, 40% complex planning — this routing brings API cost from &lt;strong&gt;₹6,500/month to ~₹1,800/month&lt;/strong&gt;. A 72% reduction. With no quality drop on complex queries and a genuine quality &lt;em&gt;improvement&lt;/em&gt; on emotional ones.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Things I Didn't Expect
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;E4B's schema compliance was better than 31B's on simple queries.&lt;/strong&gt; The 31B over-explains easy questions — like a person who writes three paragraphs when one sentence was asked for. At 50,000 queries, extra fields in the output are a post-processing tax.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E4B handles Hinglish far better than benchmarks suggest.&lt;/strong&gt; Queries like &lt;em&gt;"maths mein weak hoon but PCB strong, AI side jaana hai"&lt;/em&gt; were processed correctly without preprocessing. Standard English benchmarks tell you nothing about this. Test with your actual users' actual language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The quality gap between E4B and 31B disappears — and reverses — on emotional queries.&lt;/strong&gt; This is the finding I'd most want another developer building for real users to know. Don't assume bigger = better for the queries where tone matters most.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Tell Another Developer
&lt;/h2&gt;

&lt;p&gt;Don't benchmark with clean prompts.&lt;/p&gt;

&lt;p&gt;In my case, real queries were Hinglish, emotionally loaded, under-specified, and carried six simultaneous constraints in a single run-on sentence. Clean-prompt benchmarks would have told me to use 31B for everything. Real queries told me E4B is better for 60% of my volume.&lt;/p&gt;

&lt;p&gt;The Gemma 4 family isn't a ladder where you climb as high as hardware allows. It's a toolkit. The routing decision is the engineering. And if you're building for a market where ₹0.13 per query actually matters — where the difference between ₹1,800/month and ₹6,500/month determines whether a student platform is financially viable at all — that routing decision is the whole business.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;

&lt;p&gt;The anonymised 50-query test set (categories labelled, personal details stripped) and the scoring rubric are available on request. Drop a comment if you're building career or education AI for Indian or emerging-market users — happy to share. Real benchmark data from production-adjacent queries is rare enough in this space that it's worth pooling.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What was the biggest gap between benchmark performance and real user query performance in your Gemma 4 work? Comments below — the interesting stuff lives in that gap.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Ran Hermes Agent on the Same Task for 7 Days. The Skill File on Day 7 Looked Nothing Like Day 1.</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Sat, 16 May 2026 04:05:21 +0000</pubDate>
      <link>https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8</link>
      <guid>https://dev.to/sreejit_/i-ran-hermes-agent-on-the-same-task-for-7-days-the-skill-file-on-day-7-looked-nothing-like-day-1-2oa8</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Hermes Agent is the only open-source agent that gets better at &lt;em&gt;your specific work&lt;/em&gt; without you touching anything. I ran it on the same task every day for 7 days and watched the skill file evolve from a 12-line rough draft to a 60-line intelligent procedure. Here's every step, every output, and why this changes what I think an AI agent should be.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Every AI agent framework you've used starts from zero.&lt;/p&gt;

&lt;p&gt;LangChain, AutoGen, CrewAI — they all do real work. Multi-step planning, tool use, parallelism. But you close the terminal, restart the session, and the agent that spent twenty minutes figuring out exactly how to handle your data structure has forgotten all of it. You're back to square one.&lt;/p&gt;

&lt;p&gt;We've been so focused on what agents can &lt;em&gt;do&lt;/em&gt; that nobody's asking what they &lt;em&gt;keep&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That's the question Hermes Agent is actually answering. And after running it daily for a week, I can tell you: the difference between Day 1 and Day 7 isn't marginal. It's a different agent.&lt;/p&gt;



&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I run a web app that deals with a lot of research — new models, framework updates, open-source releases. Every morning I was manually scanning HackerNews, arXiv, and GitHub to find the 3-4 things that actually mattered. 30-40 minutes. Boring, repetitive, and I kept missing things because I can only read so fast.&lt;/p&gt;

&lt;p&gt;That's the perfect task for this experiment: give Hermes the same job every day, watch what it learns, and see whether Day 7 is actually better than Day 1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My hardware:&lt;/strong&gt; Windows 11, GTX 1650 (4GB VRAM), 16GB RAM — same machine from my Gemma 4 tests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My setup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install (Linux/macOS/WSL2 — I used WSL2)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

&lt;span class="c"&gt;# Launch&lt;/span&gt;
hermes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No YAML. No environment variables. No dependency hell. The installer asks you for a model provider — I pointed it at OpenRouter with a Nous Hermes model. First prompt came back in under 10 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The task I gave it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Every morning at 8AM, find the 3 most relevant AI and developer 
news items from the past 24 hours. I care about open-source models, 
agent frameworks, and local inference. Skip anything that's just hype 
with no technical substance. Post the results to my Telegram.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One instruction. Then I walked away.&lt;/p&gt;



&lt;h2&gt;
  
  
  Day 1: Raw and Messy
&lt;/h2&gt;

&lt;p&gt;The first run came back with 6 items. Two were from TechCrunch articles with zero technical depth — the kind of "AI is changing everything" pieces that don't tell you anything. One was a GitHub release that was three weeks old. One was actually good: a new quantization method for running LLMs on consumer hardware.&lt;/p&gt;

&lt;p&gt;The Telegram message was long, unformatted, no clear hierarchy. The summaries were one-sentence restatements of the headline, not actual analysis.&lt;/p&gt;

&lt;p&gt;Here's what the skill file looked like after Day 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# skill: daily_ai_digest&lt;/span&gt;
version: 1.0
created: 2026-05-09

&lt;span class="gu"&gt;## task&lt;/span&gt;
Search for AI and developer news. Summarize and post to Telegram.

&lt;span class="gu"&gt;## steps&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Search "AI news today"
&lt;span class="p"&gt;2.&lt;/span&gt; Search "developer tools news"
&lt;span class="p"&gt;3.&lt;/span&gt; Collect top results
&lt;span class="p"&gt;4.&lt;/span&gt; Write summary
&lt;span class="p"&gt;5.&lt;/span&gt; Post to Telegram

&lt;span class="gu"&gt;## tools_used&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; web_search
&lt;span class="p"&gt;-&lt;/span&gt; telegram_send

&lt;span class="gu"&gt;## notes&lt;/span&gt;
First run. Results were broad. User wants 3 items.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Twelve lines. Basically a placeholder. But it exists — and that matters, because this is what Hermes builds on.&lt;/p&gt;



&lt;h2&gt;
  
  
  Day 2: First Sign of Learning
&lt;/h2&gt;

&lt;p&gt;I didn't touch anything.&lt;/p&gt;

&lt;p&gt;Day 2 came back with 5 items. The TechCrunch pieces were gone. Hermes had started pulling from Hacker News and GitHub Releases — better signal sources. One item was still irrelevant (a VentureBeat funding round that mentioned AI in the headline), but the other four were legitimately useful.&lt;/p&gt;

&lt;p&gt;The summaries were longer. They had context, not just restatements. One of them noted that a specific library update was a breaking change — information that wasn't in the headline but was in the release notes. Hermes had gone deeper.&lt;/p&gt;

&lt;p&gt;The Telegram format was cleaner. Numbered list. Each item had a title, a one-sentence summary, and a link.&lt;/p&gt;

&lt;p&gt;Skill file, end of Day 2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# skill: daily_ai_digest&lt;/span&gt;
version: 1.2
created: 2026-05-09
last_improved: 2026-05-10

&lt;span class="gu"&gt;## task&lt;/span&gt;
Find and deliver 3 relevant AI/dev news items. 
User wants technical depth, not hype.

&lt;span class="gu"&gt;## search_strategy&lt;/span&gt;
queries:
&lt;span class="p"&gt;  -&lt;/span&gt; "AI developer tools release site:github.com"
&lt;span class="p"&gt;  -&lt;/span&gt; "open source LLM 2026"
&lt;span class="p"&gt;  -&lt;/span&gt; "AI news site:news.ycombinator.com"
source_deprioritize: [techcrunch.com, venturebeat.com]

&lt;span class="gu"&gt;## steps&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Run search queries
&lt;span class="p"&gt;2.&lt;/span&gt; Score results by technical depth
&lt;span class="p"&gt;3.&lt;/span&gt; Select top 3
&lt;span class="p"&gt;4.&lt;/span&gt; Format as numbered list with title + summary + link
&lt;span class="p"&gt;5.&lt;/span&gt; Post to Telegram

&lt;span class="gu"&gt;## tools_used&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; web_search
&lt;span class="p"&gt;-&lt;/span&gt; telegram_send

&lt;span class="gu"&gt;## notes&lt;/span&gt;
v1.2: Added source filtering after first run returned low-quality sources.
Switched to HN and GitHub as primary. Results improved.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It added source filtering on its own. I did not tell it TechCrunch was bad. It inferred it from the task description — "no hype, technical substance" — and encoded that into the skill.&lt;/p&gt;



&lt;h2&gt;
  
  
  Day 4: It Built a Scoring Rubric
&lt;/h2&gt;

&lt;p&gt;This is the day I started paying attention.&lt;/p&gt;

&lt;p&gt;The Day 4 Telegram message had something new: a score on each item. &lt;code&gt;[7/10]&lt;/code&gt; &lt;code&gt;[9/10]&lt;/code&gt; &lt;code&gt;[6/10]&lt;/code&gt;. I hadn't asked for scores. Hermes decided scores were useful for the task — probably because "top 3 most relevant" implies there's a ranking, and making that ranking explicit makes the output more useful.&lt;/p&gt;

&lt;p&gt;The 9/10 item was genuinely the best thing from that day — a benchmark paper comparing local inference speeds across different quantization methods. Exactly what I care about. The 6/10 item was a borderline include — a framework update that was interesting but not breaking news.&lt;/p&gt;

&lt;p&gt;Skill file, end of Day 4:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# skill: daily_ai_digest&lt;/span&gt;
version: 1.4
created: 2026-05-09
last_improved: 2026-05-12

&lt;span class="gu"&gt;## task&lt;/span&gt;
Find, score, and deliver 3 AI/dev news items.
Filter: open-source models, agent frameworks, local inference.
Exclude hype with no technical depth.

&lt;span class="gu"&gt;## search_strategy&lt;/span&gt;
queries:
&lt;span class="p"&gt;  -&lt;/span&gt; "open source LLM release site:github.com OR huggingface.co"
&lt;span class="p"&gt;  -&lt;/span&gt; "agentic AI framework update -ChatGPT -Gemini"
&lt;span class="p"&gt;  -&lt;/span&gt; "local inference benchmark 2026"
&lt;span class="p"&gt;  -&lt;/span&gt; "AI developer tools release this week"
source_priority: [arxiv.org, github.com, huggingface.co, news.ycombinator.com]
source_deprioritize: [techcrunch.com, venturebeat.com, medium.com]

&lt;span class="gu"&gt;## scoring_rubric&lt;/span&gt;
score each item 0-10:
  technical_depth: 0-4  (has code/benchmarks/architecture details)
  novelty: 0-3          (not covered in previous runs)
  relevance: 0-3        (matches user focus: OSS/local inference)
threshold: include if score &amp;gt;= 6

&lt;span class="gu"&gt;## output_format&lt;/span&gt;
&lt;span class="gs"&gt;**[Score: X/10]**&lt;/span&gt; Title
&lt;span class="gt"&gt;&amp;gt; One sentence: what it is and why it matters.&lt;/span&gt;
Link

&lt;span class="gu"&gt;## tools_used&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; web_search
&lt;span class="p"&gt;-&lt;/span&gt; telegram_send

&lt;span class="gu"&gt;## notes&lt;/span&gt;
v1.2: Added source filtering.
v1.4: Added scoring rubric. User task implies ranking — made it explicit.
      Added novelty check to avoid repeating items from prior runs.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things happened autonomously between Day 2 and Day 4:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It built a formal scoring rubric with sub-dimensions&lt;/li&gt;
&lt;li&gt;It added negative query filters (&lt;code&gt;-ChatGPT -Gemini&lt;/code&gt;) to reduce noise&lt;/li&gt;
&lt;li&gt;It started checking previous runs for novelty — so it wouldn't resurface the same items&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I didn't write a single line of prompt engineering.&lt;/p&gt;



&lt;h2&gt;
  
  
  Day 7: The Skill That Won
&lt;/h2&gt;

&lt;p&gt;By Day 7, the digest was good enough that I was reading it before my coffee instead of after my manual scan. That's the bar — useful enough to change behavior.&lt;/p&gt;

&lt;p&gt;Here's the full Day 7 skill file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# skill: daily_ai_digest&lt;/span&gt;
version: 1.7
created: 2026-05-09
last_improved: 2026-05-15

&lt;span class="gu"&gt;## task&lt;/span&gt;
Find, score, and deliver the 3 most relevant AI/developer news items 
for the day. Focus: open-source models, agent frameworks, local inference.
Exclude hype with no technical depth. Deliver to Telegram at 08:00 IST.

&lt;span class="gu"&gt;## search_strategy&lt;/span&gt;
queries:
&lt;span class="p"&gt;  -&lt;/span&gt; "open source LLM release site:github.com OR huggingface.co"
&lt;span class="p"&gt;  -&lt;/span&gt; "agentic AI framework update -ChatGPT -Gemini -GPT"
&lt;span class="p"&gt;  -&lt;/span&gt; "local inference benchmark OR quantization 2026"
&lt;span class="p"&gt;  -&lt;/span&gt; "AI developer tools release this week site:news.ycombinator.com"
&lt;span class="p"&gt;  -&lt;/span&gt; "arxiv LLM agent reasoning 2026"
source_priority: [arxiv.org, github.com, huggingface.co, news.ycombinator.com]
source_deprioritize: [techcrunch.com, venturebeat.com, medium.com, forbes.com]
dedup_window: 7d  # skip items covered in the last 7 days

&lt;span class="gu"&gt;## scoring_rubric&lt;/span&gt;
score each item 0-10:
  technical_depth: 0-4
    4 = has code, benchmarks, or architecture details
    2 = has methodology but no reproducible artifacts  
    0 = opinion/news with no technical content
  novelty: 0-3
    3 = not covered in past 7 days
    1 = follow-up to prior story, adds new info
    0 = repeat
  relevance: 0-3
    3 = directly about OSS models, agents, or local inference
    2 = adjacent (cloud AI but with OSS implications)
    0 = enterprise SaaS, no OSS angle
threshold: score &amp;gt;= 6 to include
fallback: if &amp;lt; 3 items qualify, lower threshold to 5

&lt;span class="gu"&gt;## output_format&lt;/span&gt;
&lt;span class="gs"&gt;**[Score: X/10]**&lt;/span&gt; Title
&lt;span class="gt"&gt;&amp;gt; Summary: what it is. Why it matters for open-source/local AI specifically.&lt;/span&gt;
🔗 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="gu"&gt;## delivery&lt;/span&gt;
platform: telegram
timing: 08:00 IST
max_items: 3
failure_alert: if run fails, send "digest failed: {error}" to Telegram

&lt;span class="gu"&gt;## improvement_log&lt;/span&gt;
v1.0: Broad search. Too many results. No scoring.
v1.2: Added source filtering. Removed TechCrunch/VentureBeat. -60% noise.
v1.4: Added scoring rubric. Added novelty check vs previous runs.
v1.6: Added IST timezone scheduling. Added Forbes to deprioritize list.
v1.7: Added fallback threshold. Improved arxiv query. Added failure alert.
      Scoring rubric now has sub-criterion descriptions for consistency.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Day 1 skill file: 12 lines.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Day 7 skill file: 62 lines.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Day 7 version has a search strategy I wouldn't have written myself — the &lt;code&gt;-GPT -Gemini&lt;/code&gt; exclusion that cuts proprietary model noise, the 7-day deduplication window, the fallback threshold so the agent always delivers something even on slow news days, the failure alert so I know if it breaks.&lt;/p&gt;

&lt;p&gt;I didn't write any of that. I didn't review the skill file during the week. Hermes built it, improved it, and documented its own reasoning in the improvement log.&lt;/p&gt;



&lt;h2&gt;
  
  
  How the Learning Loop Actually Works
&lt;/h2&gt;

&lt;p&gt;The reason this is possible — and the reason most other frameworks can't do it — is an architecture Nous Research calls the closed learning loop. Four components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Skills&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After each successful run, Hermes compiles the trajectory into a skill — a structured, versioned procedure stored as a file on your machine. The skill is readable (it's markdown), editable, shareable (compatible with &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;), and most importantly, evolvable. Hermes loads the existing skill at the start of each run, executes it, observes the result, and updates the skill if it found a better way.&lt;/p&gt;

&lt;p&gt;A LangChain agent runs the same code every time. A Hermes skill runs better code every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Persistent Memory&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FTS5 full-text search across all past sessions, with LLM summarization for cross-session recall. The deduplication in my digest skill — "skip items from the past 7 days" — comes from this. Hermes searched memory, found a pattern (user doesn't want repeated items), and encoded the fix into the skill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. User Modeling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hermes integrates &lt;a href="https://github.com/plastic-labs/honcho" rel="noopener noreferrer"&gt;Honcho&lt;/a&gt; for dialectic user modeling — a continuously updated inference about your preferences. This is how it learned "open-source focus" and "no hype" from one sentence of initial instruction, and kept refining that over the week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Autonomous Nudges&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent periodically decides what's worth remembering without being told. The &lt;code&gt;dedup_window: 7d&lt;/code&gt; parameter in the Day 7 skill? That came from a nudge — Hermes noticed it was retrieving items it had already surfaced, flagged the pattern, and embedded a fix.&lt;/p&gt;



&lt;h2&gt;
  
  
  The Framework Comparison Nobody Is Having
&lt;/h2&gt;

&lt;p&gt;Most agent framework comparisons are feature lists. Tool support? ✅ Multi-step planning? ✅ Parallel agents? ✅&lt;/p&gt;

&lt;p&gt;That comparison misses the dimension that actually matters over weeks of real use: &lt;strong&gt;what does the agent keep, and who owns it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's the honest breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Memory Model&lt;/th&gt;
&lt;th&gt;Skill/Learning System&lt;/th&gt;
&lt;th&gt;Who Owns Accumulated Intelligence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain / LangGraph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You build it&lt;/td&gt;
&lt;td&gt;None built-in&lt;/td&gt;
&lt;td&gt;You (in your code/prompts)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AutoGen&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conversation context&lt;/td&gt;
&lt;td&gt;None built-in&lt;/td&gt;
&lt;td&gt;You (in your config)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Session-scoped&lt;/td&gt;
&lt;td&gt;None built-in&lt;/td&gt;
&lt;td&gt;You (in your role definitions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hermes Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Persistent cross-session&lt;/td&gt;
&lt;td&gt;Built-in, self-improving&lt;/td&gt;
&lt;td&gt;You (on your machine, MIT)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Assistants&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Platform-managed&lt;/td&gt;
&lt;td&gt;None built-in&lt;/td&gt;
&lt;td&gt;OpenAI (on their servers)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LangChain is the most widely deployed and has the largest ecosystem — if you need a specific integration, it's there. But everything accumulates in &lt;em&gt;your&lt;/em&gt; code. The agent itself is always a blank slate. You are the memory layer.&lt;/p&gt;

&lt;p&gt;AutoGen's multi-agent conversation model is genuinely interesting for debate-style reasoning — Planner talks to Executor talks to Critic, and the conversation is the state. It works well for tasks where explicit agent dialogue is valuable. Same ceiling: no cross-session learning.&lt;/p&gt;

&lt;p&gt;CrewAI's role-based abstraction maps well onto business workflows with stable, defined outputs. Best when you know exactly what roles you need. Same ceiling.&lt;/p&gt;

&lt;p&gt;The ceiling is identical across all three: &lt;strong&gt;session ten with LangChain/AutoGen/CrewAI is identical to session one.&lt;/strong&gt; The agent hasn't learned your preferences, hasn't refined its procedures, hasn't built a working theory of your use case. The maturity lives in your wrapper code. The agent itself stays naive.&lt;/p&gt;

&lt;p&gt;Hermes bets on a different model. The agent accumulates across sessions. The skill file on Day 7 reflects 7 days of observed outcomes. You own all of it — MIT licensed, stored on your machine, readable text files. If Nous Research disappeared tomorrow, your skills still run.&lt;/p&gt;



&lt;h2&gt;
  
  
  Where I'd Push Back
&lt;/h2&gt;

&lt;p&gt;Hermes is genuinely impressive after a week. It's also genuinely early in some ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The learning loop requires a capable model.&lt;/strong&gt; Skills are only as good as the reasoning that generates them. I used a Nous Hermes model via OpenRouter and results were excellent. If you're using a weaker endpoint, the skills it writes will reflect that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangChain and LangGraph have a vastly larger ecosystem.&lt;/strong&gt; If you need a specific vector store adapter, a custom evaluation framework, or fine-grained observability into every reasoning step — LangGraph is better suited. Hermes makes tradeoffs to deliver the learning loop. Those tradeoffs mean some things are less configurable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The memory system has edge cases.&lt;/strong&gt; Stale preferences can accumulate. If you told Hermes "I prefer X" three months ago and your preference changed, you need to correct it explicitly. The memory doesn't auto-expire. There's active work on making memory management more transparent, but it's not fully there yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's a research project at production scale.&lt;/strong&gt; The GitHub repo is active, the Discord community is engaged, and the documentation is solid. But you will hit edge cases. You will occasionally see a skill degrade instead of improve. The right mental model is "powerful and evolving," not "stable and mature."&lt;/p&gt;

&lt;p&gt;None of these killed the experiment. But you should know what you're signing up for.&lt;/p&gt;



&lt;h2&gt;
  
  
  Who Should Actually Use This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Choose Hermes Agent when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have recurring tasks where session-to-session improvement creates compounding value&lt;/li&gt;
&lt;li&gt;You're a solo developer or small team that can't maintain a custom memory architecture&lt;/li&gt;
&lt;li&gt;You want the agent to improve without you manually encoding every lesson learned&lt;/li&gt;
&lt;li&gt;You want to own what the agent accumulates — readable, portable, MIT licensed files on your machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose LangChain / LangGraph when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need maximum ecosystem breadth and integration options&lt;/li&gt;
&lt;li&gt;You have engineering resources to build and maintain custom memory and state layers&lt;/li&gt;
&lt;li&gt;You need fine-grained observability and control over every agent decision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose AutoGen when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-agent deliberation adds value — tasks where watching agents debate improves quality&lt;/li&gt;
&lt;li&gt;The workflow benefits from visible, auditable agent-to-agent reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose CrewAI when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your workflow maps onto stable, defined roles&lt;/li&gt;
&lt;li&gt;The output structure is predictable and you want a business-legible abstraction&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Install in 60 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Linux / macOS / WSL2&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

&lt;span class="c"&gt;# Windows (PowerShell)&lt;/span&gt;
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex

&lt;span class="c"&gt;# Android (Termux) — same curl command, auto-detects&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set up Telegram delivery (optional but worth it):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Tell Hermes in plain English:&lt;/span&gt;
"Connect to Telegram and send me a message when tasks complete"
&lt;span class="gh"&gt;# It walks you through the bot token setup conversationally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure a recurring task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;"Every morning at 8AM, [your task]. Post results to Telegram."
&lt;span class="gh"&gt;# Hermes parses this into a cron job and registers it.&lt;/span&gt;
&lt;span class="gh"&gt;# No cron syntax. No webhook configuration. Just English.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then walk away. Come back on Day 7 and read your skill file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes Agent Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;GitHub Repo&lt;/a&gt; (MIT License)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=R3YOGfTBcQg" rel="noopener noreferrer"&gt;Quickstart Video&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;Skills Hub&lt;/a&gt; — community-shared skills&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/NousResearch" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;The AI agent space has a specific failure mode: things that look impressive in a 15-minute demo and feel identical after three weeks of real use. Every agent can complete a task in a single session. That's not the bar anymore.&lt;/p&gt;

&lt;p&gt;The bar is: does the agent get better at &lt;em&gt;your&lt;/em&gt; work without you doing the maintenance work of manually encoding every improvement?&lt;/p&gt;

&lt;p&gt;Day 1 Hermes gave me 6 unfiltered results, no scoring, no format.&lt;br&gt;&lt;br&gt;
Day 7 Hermes gave me 3 scored, deduplicated, source-filtered, IST-timed, failure-alerted items — with a reasoning trail showing exactly how it got there.&lt;/p&gt;

&lt;p&gt;I wrote one sentence of instruction on Day 1 and nothing after that.&lt;/p&gt;

&lt;p&gt;That's not a feature. That's a different kind of tool. And it's available right now, free, MIT licensed, on whatever hardware is sitting on your desk.&lt;/p&gt;

&lt;p&gt;Pull it. Give it something you do every day. Then read the skill file on Day 7.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Research was done using hermes agent itself and was asked to write a draft.&lt;br&gt;
The final post was written and created by me using 40% of its research and keywords. Tested on Windows 11 / WSL2 with a GTX 1650 (4GB VRAM) and 16GB RAM. Model: Nous Hermes via OpenRouter. All skill files shown are from actual Hermes runs. Hermes Agent is built by &lt;a href="https://nousresearch.com" rel="noopener noreferrer"&gt;Nous Research&lt;/a&gt; — MIT licensed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What's the first recurring task you'd hand off? Drop it in the comments — I'm curious what skill files look like across different use cases after a week.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Let AI Write My Entire App — Here's What Actually Happened</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Tue, 12 May 2026 11:02:06 +0000</pubDate>
      <link>https://dev.to/sreejit_/i-let-ai-write-my-entire-app-heres-what-actually-happened-3bkg</link>
      <guid>https://dev.to/sreejit_/i-let-ai-write-my-entire-app-heres-what-actually-happened-3bkg</guid>
      <description>

&lt;p&gt;&lt;em&gt;This post is my submission for &lt;a href="https://dev.to/deved/build-apps-with-google-ai-studio"&gt;DEV Education Track: Build Apps with Google AI Studio&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A few days ago I stumbled onto the DEV x Google AI Studio education track. The premise sounded almost too good: type a prompt, get a deployed web app. I was skeptical. Here's an honest account of how it went.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Prompt
&lt;/h2&gt;

&lt;p&gt;I wanted to build something creative, not just another todo app. After some thought I landed on &lt;strong&gt;Mythic Nations&lt;/strong&gt; — a fantasy country generator where you describe an imaginary land and the AI generates a unique flag, national motto, origin story, key exports, and fun facts about it.&lt;/p&gt;

&lt;p&gt;My prompt to Google AI Studio was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Please create an app called 'Mythic Nations' that lets users describe an imaginary country — its culture, terrain, values, and vibe — and then generates a unique flag image for it using Gemini 2.0 Flash image generation, along with a national motto, a short origin story, key exports, and three fun facts about the country using Gemini. The UI should feel like an encyclopedia or atlas entry, with the flag displayed prominently alongside the generated lore."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One prompt. That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened Next
&lt;/h2&gt;

&lt;p&gt;Within seconds, AI Studio started scaffolding a full React + TypeScript application — components, API service layers, type definitions, everything. I watched it think out loud, catch its own errors, and self-correct. It felt less like a code generator and more like a senior developer who types very fast.&lt;/p&gt;

&lt;p&gt;The one hiccup I ran into: &lt;strong&gt;Imagen is paywalled.&lt;/strong&gt; The app threw a &lt;code&gt;PERMISSION_DENIED&lt;/code&gt; error when trying to generate flag images. I asked the assistant to swap Imagen out for Gemini 2.0 Flash image generation — which is free — and it handled the migration without breaking anything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live App:&lt;/strong&gt; &lt;a href="https://ai.studio/apps/b0225949-fd9c-46d5-9d8f-e834d6a69eea" rel="noopener noreferrer"&gt;https://ai.studio/apps/b0225949-fd9c-46d5-9d8f-e834d6a69eea&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first country I generated was &lt;strong&gt;Orena&lt;/strong&gt; — a nation where clocks don't exist, wealth is stored in glass vials of lucid dreams, and legal disputes are settled by harmonic duels. The flag it generated was genuinely beautiful. I did not write a single line of code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj08tcwhlluk3lkhkjoo9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj08tcwhlluk3lkhkjoo9.png" alt=" " width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The prompt is the most important thing.&lt;/strong&gt; The more specific, vivid, and opinionated your prompt, the better the output. Vague prompts produce generic apps. Describe the &lt;em&gt;feel&lt;/em&gt; of what you want, not just the features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't be afraid to iterate.&lt;/strong&gt; The build feature isn't a one-shot deal. You can keep talking to the assistant, ask it to change the UI, fix bugs, or swap out APIs. Treat it like a conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free tier has limits.&lt;/strong&gt; Imagen requires billing. Gemini 2.0 Flash image generation does not. Know the difference before you start so you don't hit a wall mid-build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The generated code is real, readable code.&lt;/strong&gt; You can open it, understand it, modify it, and learn from it. This isn't a black box — it's a great way to study how production-quality React apps are structured.&lt;/p&gt;

&lt;h2&gt;
  
  
  Would I Recommend This Track?
&lt;/h2&gt;

&lt;p&gt;Absolutely. Even if you're an experienced developer, there's something genuinely exciting about watching an idea materialize in real time. And if you're just getting started, this is one of the most confidence-building things you can do — you'll have a live, deployed app with your name on it in under an hour.&lt;/p&gt;

&lt;p&gt;Give it a try. Come up with something weird. The weirder the better.&lt;/p&gt;

</description>
      <category>deved</category>
      <category>learngoogleaistudio</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>From One Prompt to a Full Fantasy Nation Generator — No Code, No Cost</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Tue, 12 May 2026 10:57:42 +0000</pubDate>
      <link>https://dev.to/sreejit_/from-one-prompt-to-a-full-fantasy-nation-generator-no-code-no-cost-3j4b</link>
      <guid>https://dev.to/sreejit_/from-one-prompt-to-a-full-fantasy-nation-generator-no-code-no-cost-3j4b</guid>
      <description>&lt;p&gt;This post is my submission for DEV Education Track: Build Apps with Google AI Studio.&lt;/p&gt;

&lt;h1&gt;
  
  
  I Described Imaginary Countries and AI Built Me an Atlas
&lt;/h1&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Mythic Nations&lt;/strong&gt; — an AI-powered imaginary country generator that lets you describe an undiscovered land and instantly brings it to life with a unique flag image and a full encyclopedia-style lore entry. You describe the culture, terrain, and vibe; Gemini handles the rest.&lt;/p&gt;

&lt;p&gt;The app was generated using Google AI Studio's "Build apps with Gemini" feature with this prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Please create an app called 'Mythic Nations' that lets users describe an imaginary country — its culture, terrain, values, and vibe — and then generates a unique flag image for it using Gemini 2.0 Flash image generation, along with a national motto, a short origin story, key exports, and three fun facts about the country using Gemini. The UI should feel like an encyclopedia or atlas entry, with the flag displayed prominently alongside the generated lore."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Since Imagen requires a paid billing account, I asked the AI assistant to swap it out for &lt;strong&gt;Gemini 2.0 Flash&lt;/strong&gt; image generation, which works on the free tier — and the results speak for themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live App:&lt;/strong&gt; &lt;a href="https://ai.studio/apps/b0225949-fd9c-46d5-9d8f-e834d6a69eea" rel="noopener noreferrer"&gt;https://ai.studio/apps/b0225949-fd9c-46d5-9d8f-e834d6a69eea&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's &lt;strong&gt;Orena&lt;/strong&gt; — a nation hidden in the mist-veiled peaks of the ethereal Aethelian Basin, where wealth is measured in crystallized melodies and legal disputes are settled by harmonic duels:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaeqww7zby9weomq66h0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaeqww7zby9weomq66h0.png" alt=" " width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgbrx8vdxhz85hyjkt0h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgbrx8vdxhz85hyjkt0h.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa30okr6brdrtgh5mberl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa30okr6brdrtgh5mberl.png" alt=" " width="799" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience
&lt;/h2&gt;

&lt;p&gt;What surprised me most was how little effort it took to go from a vague idea to something that genuinely looks polished. I typed one prompt, watched Gemini think through the architecture out loud, and within minutes had a fully structured React + TypeScript app with components, services, and API integrations all wired up.&lt;/p&gt;

&lt;p&gt;The self-healing error correction was the real eye-opener — at one point the assistant flagged and resolved multiple type conflicts on its own, without me touching a single line of code.&lt;/p&gt;

&lt;p&gt;One thing to keep in mind: &lt;strong&gt;Imagen is not available on the free tier.&lt;/strong&gt; If you hit a &lt;code&gt;PERMISSION_DENIED&lt;/code&gt; error like I did, just ask the AI assistant to switch to Gemini 2.0 Flash for image generation instead — it works seamlessly and produces beautiful results.&lt;/p&gt;

&lt;p&gt;The whole experience felt less like using a tool and more like pair-programming with someone who never gets tired. Highly recommend trying it with an idea that's a little weird and creative — the more imaginative your prompt, the more magical the output.without me touching a single line of code.&lt;br&gt;
One thing to keep in mind: Imagen is not available on the free tier. If you hit a PERMISSION_DENIED error like I did, just ask the AI assistant to switch to Gemini 2.0 Flash for image generation instead — it works seamlessly and produces beautiful results.&lt;br&gt;
The whole experience felt less like using a tool and more like pair-programming with someone who never gets tired. Highly recommend trying it with an idea that's a little weird and creative — the more imaginative your prompt, the more magical the output.&lt;/p&gt;

</description>
      <category>deved</category>
      <category>learngoogleaistudio</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>From One Prompt to a Full Fantasy Nation Generator — No Code, No Cost</title>
      <dc:creator>Sreejit Pradhan</dc:creator>
      <pubDate>Tue, 12 May 2026 10:49:40 +0000</pubDate>
      <link>https://dev.to/sreejit_/from-one-prompt-to-a-full-fantasy-nation-generator-no-code-no-cost-5da3</link>
      <guid>https://dev.to/sreejit_/from-one-prompt-to-a-full-fantasy-nation-generator-no-code-no-cost-5da3</guid>
      <description>&lt;p&gt;This post is my submission for DEV Education Track: Build Apps with Google AI Studio.&lt;/p&gt;

&lt;h1&gt;
  
  
  I Described Imaginary Countries and AI Built Me an Atlas
&lt;/h1&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Mythic Nations&lt;/strong&gt; — an AI-powered imaginary country generator that lets you describe an undiscovered land and instantly brings it to life with a unique flag image and a full encyclopedia-style lore entry. You describe the culture, terrain, and vibe; Gemini handles the rest.&lt;/p&gt;

&lt;p&gt;The app was generated using Google AI Studio's "Build apps with Gemini" feature with this prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Please create an app called 'Mythic Nations' that lets users describe an imaginary country — its culture, terrain, values, and vibe — and then generates a unique flag image for it using Gemini 2.0 Flash image generation, along with a national motto, a short origin story, key exports, and three fun facts about the country using Gemini. The UI should feel like an encyclopedia or atlas entry, with the flag displayed prominently alongside the generated lore."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Since Imagen requires a paid billing account, I asked the AI assistant to swap it out for &lt;strong&gt;Gemini 2.0 Flash&lt;/strong&gt; image generation, which works on the free tier — and the results speak for themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live App:&lt;/strong&gt; &lt;a href="https://ai.studio/apps/b0225949-fd9c-46d5-9d8f-e834d6a69eea" rel="noopener noreferrer"&gt;https://ai.studio/apps/b0225949-fd9c-46d5-9d8f-e834d6a69eea&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's &lt;strong&gt;Orena&lt;/strong&gt; — a nation hidden in the mist-veiled peaks of the ethereal Aethelian Basin, where wealth is measured in crystallized melodies and legal disputes are settled by harmonic duels:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaeqww7zby9weomq66h0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaeqww7zby9weomq66h0.png" alt=" " width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgbrx8vdxhz85hyjkt0h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgbrx8vdxhz85hyjkt0h.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa30okr6brdrtgh5mberl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa30okr6brdrtgh5mberl.png" alt=" " width="799" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience
&lt;/h2&gt;

&lt;p&gt;What surprised me most was how little effort it took to go from a vague idea to something that genuinely looks polished. I typed one prompt, watched Gemini think through the architecture out loud, and within minutes had a fully structured React + TypeScript app with components, services, and API integrations all wired up.&lt;/p&gt;

&lt;p&gt;The self-healing error correction was the real eye-opener — at one point the assistant flagged and resolved multiple type conflicts on its own, without me touching a single line of code.&lt;/p&gt;

&lt;p&gt;One thing to keep in mind: &lt;strong&gt;Imagen is not available on the free tier.&lt;/strong&gt; If you hit a &lt;code&gt;PERMISSION_DENIED&lt;/code&gt; error like I did, just ask the AI assistant to switch to Gemini 2.0 Flash for image generation instead — it works seamlessly and produces beautiful results.&lt;/p&gt;

&lt;p&gt;The whole experience felt less like using a tool and more like pair-programming with someone who never gets tired. Highly recommend trying it with an idea that's a little weird and creative — the more imaginative your prompt, the more magical the output.without me touching a single line of code.&lt;br&gt;
One thing to keep in mind: Imagen is not available on the free tier. If you hit a PERMISSION_DENIED error like I did, just ask the AI assistant to switch to Gemini 2.0 Flash for image generation instead — it works seamlessly and produces beautiful results.&lt;br&gt;
The whole experience felt less like using a tool and more like pair-programming with someone who never gets tired. Highly recommend trying it with an idea that's a little weird and creative — the more imaginative your prompt, the more magical the output.&lt;/p&gt;

</description>
      <category>deved</category>
      <category>learngoogleaistudio</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
