<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prasad Thiriveedi</title>
    <description>The latest articles on DEV Community by Prasad Thiriveedi (@tvprasad).</description>
    <link>https://dev.to/tvprasad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3852468%2F9d58d5da-d03d-48f4-bc5c-33c2eb10d594.jpg</url>
      <title>DEV Community: Prasad Thiriveedi</title>
      <link>https://dev.to/tvprasad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tvprasad"/>
    <language>en</language>
    <item>
      <title>Repair Before Replace: an AI-powered circularity assistant with persistent repair memory</title>
      <dc:creator>Prasad Thiriveedi</dc:creator>
      <pubDate>Mon, 20 Apr 2026 00:30:06 +0000</pubDate>
      <link>https://dev.to/tvprasad/repair-before-replace-an-ai-powered-circularity-assistant-with-persistent-repair-memory-47am</link>
      <guid>https://dev.to/tvprasad/repair-before-replace-an-ai-powered-circularity-assistant-with-persistent-repair-memory-47am</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for &lt;a href="https://dev.to/challenges/weekend-2026-04-16"&gt;Weekend Challenge: Earth Day Edition&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Repair Before Replace&lt;/strong&gt;, an AI-powered circularity assistant that helps people decide whether a damaged household item should be repaired at home, patched temporarily, or replaced responsibly.&lt;/p&gt;

&lt;p&gt;The problem I wanted to solve is simple: people throw away useful items because they are unsure what is fixable, what is safe, and whether repair is worth it. I wanted to build something practical that nudges users toward repair first while still being honest about safety and limits.&lt;/p&gt;

&lt;p&gt;The app lets a user:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose a supported category&lt;/li&gt;
&lt;li&gt;upload a photo of a damaged item&lt;/li&gt;
&lt;li&gt;get a structured damage assessment&lt;/li&gt;
&lt;li&gt;see whether the item is safe to repair at home&lt;/li&gt;
&lt;li&gt;get materials and step-by-step guidance when DIY repair is appropriate&lt;/li&gt;
&lt;li&gt;understand the waste impact of replacing versus repairing&lt;/li&gt;
&lt;li&gt;benefit from persistent memory across sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The differentiator is &lt;strong&gt;memory&lt;/strong&gt;. The app does not behave like a stateless image classifier. It remembers prior attempts, preferences, and what worked before, then uses that history to influence future recommendations.&lt;/p&gt;

&lt;p&gt;Examples of the memory behavior:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Your previous glue-only patch failed after washing, so this recommendation uses hand stitching instead.”&lt;/p&gt;

&lt;p&gt;“You’ve had success with wood glue and no power tools, so the same approach applies here.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is an AI repair companion that gets smarter the more you use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://repair-before-replace.web.app" rel="noopener noreferrer"&gt;Repair Before Replace&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7lr21f6zfduttj1djas.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7lr21f6zfduttj1djas.png" alt=" " width="800" height="1151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Suggested demo flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign in&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Furniture&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Upload a photo of a damaged item such as a scratched table, cracked shelf, or broken chair&lt;/li&gt;
&lt;li&gt;Review the assessment, repair guidance, and environmental impact summary&lt;/li&gt;
&lt;li&gt;Scroll to the repair history and note how prior attempts influence the current recommendation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Home page with sofa loaded (Furniture selected, sofa photo, Analyze Damage button)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqa3sqej2jjocokfc8m8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqa3sqej2jjocokfc8m8.png" alt="Home page with sofa loaded (Furniture selected, sofa photo, Analyze Damage button)" width="800" height="1263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnraaiu53ujlvjhj3hlx4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnraaiu53ujlvjhj3hlx4.png" alt="Assessent" width="800" height="1151"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftae3jpvj0mtr8c6s9wdz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftae3jpvj0mtr8c6s9wdz.png" alt="Assessment - Steps by step Guide" width="800" height="1151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggl9j0oyzetuitemzaal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggl9j0oyzetuitemzaal.png" alt="Impact" width="800" height="1297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cbgnvf7sf06jdra0pvz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cbgnvf7sf06jdra0pvz.png" alt="History" width="800" height="1796"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub repo:&lt;/strong&gt; &lt;a href="https://github.com/tvprasad/repair-before-replace" rel="noopener noreferrer"&gt;repair-before-replace&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Product approach
&lt;/h3&gt;

&lt;p&gt;I intentionally did &lt;strong&gt;not&lt;/strong&gt; build a generic carbon calculator.&lt;/p&gt;

&lt;p&gt;Instead, I focused on one real-world decision:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I fix this, patch it, get professional help, or replace it responsibly?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That led to a much more practical Earth Day project than a broad sustainability dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Front end
&lt;/h3&gt;

&lt;p&gt;The UI is built with &lt;strong&gt;React 19 + TypeScript + Vite + Tailwind CSS v4&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I kept the flow simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;select category&lt;/li&gt;
&lt;li&gt;upload photo&lt;/li&gt;
&lt;li&gt;review structured assessment&lt;/li&gt;
&lt;li&gt;understand the next best action&lt;/li&gt;
&lt;li&gt;revisit repair history and memory-aware recommendations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of the work here was product work, not just UI work: narrowing scope, separating safe DIY from professional repair, and making the recommendations feel trustworthy instead of overly confident.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Gemini
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; powers the core multimodal experience.&lt;/p&gt;

&lt;p&gt;It analyzes the uploaded image and returns structured output for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;visible damage&lt;/li&gt;
&lt;li&gt;confidence level&lt;/li&gt;
&lt;li&gt;safety check&lt;/li&gt;
&lt;li&gt;recommended action&lt;/li&gt;
&lt;li&gt;materials needed&lt;/li&gt;
&lt;li&gt;repair steps&lt;/li&gt;
&lt;li&gt;environmental impact band&lt;/li&gt;
&lt;li&gt;history-aware recommendation reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important design decision was using schema-constrained JSON output so the app could reliably render the same assessment structure every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google AntiGravity IDE with Gemini AI
&lt;/h3&gt;

&lt;p&gt;I used &lt;strong&gt;Google AntiGravity IDE with Gemini AI&lt;/strong&gt; to accelerate the initial scaffold and part of the core implementation.&lt;/p&gt;

&lt;p&gt;It helped compress the early build phases, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;app structure&lt;/li&gt;
&lt;li&gt;component setup&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;service wiring&lt;/li&gt;
&lt;li&gt;rapid iteration on working UI flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemini powers the runtime intelligence, while AntiGravity helped accelerate the path from concept to working prototype.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Backboard&lt;/strong&gt; powers the persistent memory layer.&lt;/p&gt;

&lt;p&gt;Instead of storing repair history passively, I used it to make the next recommendation better.&lt;/p&gt;

&lt;p&gt;Backboard stores things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prior repair attempts&lt;/li&gt;
&lt;li&gt;what worked or failed&lt;/li&gt;
&lt;li&gt;user preferences&lt;/li&gt;
&lt;li&gt;category-specific history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then that memory is injected into future assessments so the AI can reference earlier attempts directly.&lt;/p&gt;

&lt;p&gt;That was a key product lesson for me: &lt;strong&gt;memory should influence the recommendation, not just create a history log.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Famv4f3rd41vpcaj9yyrr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Famv4f3rd41vpcaj9yyrr.png" alt="memory should influence the recommendation, not just create a history log" width="800" height="1368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Auth0 for Agents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Auth0 for Agents&lt;/strong&gt; provides identity so the memory can stay personal and persistent across sessions.&lt;/p&gt;

&lt;p&gt;Without identity, every user session becomes anonymous and the memory layer loses most of its value. Auth0 made it possible to keep repair history tied to a real user instead of treating every assessment like a first-time interaction.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtk6b63wq0f52lb61ryd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtk6b63wq0f52lb61ryd.png" alt="Auth0 for Agents" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical flow
&lt;/h3&gt;

&lt;p&gt;User uploads photo&lt;br&gt;
↓&lt;br&gt;
Auth0 identifies the user&lt;br&gt;
↓&lt;br&gt;
Backboard fetches prior repair history for that user and category&lt;br&gt;
↓&lt;br&gt;
Gemini 2.5 Flash analyzes the image plus relevant memory&lt;br&gt;
↓&lt;br&gt;
App renders structured assessment, repair guidance, impact summary, and memory-aware explanation&lt;br&gt;
↓&lt;br&gt;
Completed assessment is written back to Backboard for future use&lt;/p&gt;

&lt;h3&gt;
  
  
  Stack
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Runtime stack:&lt;/strong&gt; &lt;br&gt;
Gemini 2.5 Flash + Backboard SDK + Auth0 for Agents + React 19 + TypeScript + Vite + Tailwind CSS v4 + Firebase Hosting&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build tooling:&lt;/strong&gt; Google AntiGravity IDE with Gemini AI for initial scaffolding and accelerated implementation&lt;/p&gt;

&lt;h3&gt;
  
  
  Scope choices
&lt;/h3&gt;

&lt;p&gt;I kept the app intentionally narrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Furniture&lt;/li&gt;
&lt;li&gt;Clothing &amp;amp; Textiles&lt;/li&gt;
&lt;li&gt;Cosmetic Appliance Damage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That let me focus on a believable UX and avoid overclaiming.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I learned
&lt;/h3&gt;

&lt;p&gt;The hardest part was not “getting AI output.”&lt;/p&gt;

&lt;p&gt;The hardest part was building a tool that felt honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;separating repairability from safety&lt;/li&gt;
&lt;li&gt;making memory visibly useful&lt;/li&gt;
&lt;li&gt;avoiding fake precision&lt;/li&gt;
&lt;li&gt;narrowing the scope enough to keep trust high&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That product discipline mattered more than any single model call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prize Categories
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best use of Backboard&lt;/strong&gt; — persistent cross-session repair memory that is injected into every AI call and visibly influences recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best use of Auth0 for Agents&lt;/strong&gt; — identity layer that keeps repair history personal and persistent across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best use of Google Gemini&lt;/strong&gt; — Gemini 2.5 Flash powers the multimodal image analysis and structured assessment flow, and Google AntiGravity IDE with Gemini AI accelerated the build process from scaffold to working prototype&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;For Earth Day, I wanted to build something practical, not abstract.&lt;/p&gt;

&lt;p&gt;A lot of sustainability conversations stay high-level. I wanted to make one real behavior easier:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;fix more, toss less.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a tool can help someone confidently save one chair, shelf, lamp, or household item from being thrown away, that already matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/VYdDcGP-wPk" rel="noopener noreferrer"&gt;Youtube - Repair Before Replace&lt;/a&gt;&lt;/p&gt;

</description>
      <category>weekendchallenge</category>
      <category>backboard</category>
      <category>auth0challenge</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Repair Before Replace: an AI-powered circularity assistant with persistent repair memory</title>
      <dc:creator>Prasad Thiriveedi</dc:creator>
      <pubDate>Mon, 20 Apr 2026 00:18:03 +0000</pubDate>
      <link>https://dev.to/tvprasad/repair-before-replace-an-ai-powered-circularity-assistant-with-persistent-repair-memory-5578</link>
      <guid>https://dev.to/tvprasad/repair-before-replace-an-ai-powered-circularity-assistant-with-persistent-repair-memory-5578</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for &lt;a href="https://dev.to/challenges/weekend-2026-04-16"&gt;Weekend Challenge: Earth Day Edition&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Repair Before Replace&lt;/strong&gt;, an AI-powered circularity assistant that helps people decide whether a damaged household item should be repaired at home, patched temporarily, or replaced responsibly.&lt;/p&gt;

&lt;p&gt;The problem I wanted to solve is simple: people throw away useful items because they are unsure what is fixable, what is safe, and whether repair is worth it. I wanted to build something practical that nudges users toward repair first while still being honest about safety and limits.&lt;/p&gt;

&lt;p&gt;The app lets a user:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose a supported category&lt;/li&gt;
&lt;li&gt;upload a photo of a damaged item&lt;/li&gt;
&lt;li&gt;get a structured damage assessment&lt;/li&gt;
&lt;li&gt;see whether the item is safe to repair at home&lt;/li&gt;
&lt;li&gt;get materials and step-by-step guidance when DIY repair is appropriate&lt;/li&gt;
&lt;li&gt;understand the waste impact of replacing versus repairing&lt;/li&gt;
&lt;li&gt;benefit from persistent memory across sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The differentiator is &lt;strong&gt;memory&lt;/strong&gt;. The app does not behave like a stateless image classifier. It remembers prior attempts, preferences, and what worked before, then uses that history to influence future recommendations.&lt;/p&gt;

&lt;p&gt;Examples of the memory behavior:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Your previous glue-only patch failed after washing, so this recommendation uses hand stitching instead.”&lt;/p&gt;

&lt;p&gt;“You’ve had success with wood glue and no power tools, so the same approach applies here.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is an AI repair companion that gets smarter the more you use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Live app:&lt;/strong&gt; &lt;a href="https://repair-before-replace.web.app" rel="noopener noreferrer"&gt;Repair Before Replace&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7lr21f6zfduttj1djas.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7lr21f6zfduttj1djas.png" alt=" " width="800" height="1151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Suggested demo flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign in&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Furniture&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Upload a photo of a damaged item such as a scratched table, cracked shelf, or broken chair&lt;/li&gt;
&lt;li&gt;Review the assessment, repair guidance, and environmental impact summary&lt;/li&gt;
&lt;li&gt;Scroll to the repair history and note how prior attempts influence the current recommendation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Home page with sofa loaded (Furniture selected, sofa photo, Analyze Damage button)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqa3sqej2jjocokfc8m8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqa3sqej2jjocokfc8m8.png" alt=" " width="800" height="1263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnraaiu53ujlvjhj3hlx4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnraaiu53ujlvjhj3hlx4.png" alt=" " width="800" height="1151"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftae3jpvj0mtr8c6s9wdz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftae3jpvj0mtr8c6s9wdz.png" alt=" " width="800" height="1151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fint944zfipjar1c6i01h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fint944zfipjar1c6i01h.png" alt=" " width="800" height="2240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggl9j0oyzetuitemzaal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggl9j0oyzetuitemzaal.png" alt=" " width="800" height="1297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cbgnvf7sf06jdra0pvz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cbgnvf7sf06jdra0pvz.png" alt=" " width="800" height="1796"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Repair Journey / History page - The app builds a record over time
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6lp7tzr5ie41thcqd2xb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6lp7tzr5ie41thcqd2xb.png" alt=" " width="800" height="1368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub repo:&lt;/strong&gt; &lt;a href="https://github.com/tvprasad/repair-before-replace" rel="noopener noreferrer"&gt;repair-before-replace&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Product approach
&lt;/h3&gt;

&lt;p&gt;I intentionally did &lt;strong&gt;not&lt;/strong&gt; build a generic carbon calculator.&lt;/p&gt;

&lt;p&gt;Instead, I focused on one real-world decision:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I fix this, patch it, get professional help, or replace it responsibly?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That led to a much more practical Earth Day project than a broad sustainability dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Front end
&lt;/h3&gt;

&lt;p&gt;The UI is built with &lt;strong&gt;React 19 + TypeScript + Vite + Tailwind CSS v4&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I kept the flow simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;select category&lt;/li&gt;
&lt;li&gt;upload photo&lt;/li&gt;
&lt;li&gt;review structured assessment&lt;/li&gt;
&lt;li&gt;understand the next best action&lt;/li&gt;
&lt;li&gt;revisit repair history and memory-aware recommendations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of the work here was product work, not just UI work: narrowing scope, separating safe DIY from professional repair, and making the recommendations feel trustworthy instead of overly confident.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Gemini
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; powers the core multimodal experience.&lt;/p&gt;

&lt;p&gt;It analyzes the uploaded image and returns structured output for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;visible damage&lt;/li&gt;
&lt;li&gt;confidence level&lt;/li&gt;
&lt;li&gt;safety check&lt;/li&gt;
&lt;li&gt;recommended action&lt;/li&gt;
&lt;li&gt;materials needed&lt;/li&gt;
&lt;li&gt;repair steps&lt;/li&gt;
&lt;li&gt;environmental impact band&lt;/li&gt;
&lt;li&gt;history-aware recommendation reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important design decision was using schema-constrained JSON output so the app could reliably render the same assessment structure every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google AntiGravity IDE with Gemini AI
&lt;/h3&gt;

&lt;p&gt;I used &lt;strong&gt;Google AntiGravity IDE with Gemini AI&lt;/strong&gt; to accelerate the initial scaffold and part of the core implementation.&lt;/p&gt;

&lt;p&gt;It helped compress the early build phases, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;app structure&lt;/li&gt;
&lt;li&gt;component setup&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;service wiring&lt;/li&gt;
&lt;li&gt;rapid iteration on working UI flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemini powers the runtime intelligence, while AntiGravity helped accelerate the path from concept to working prototype.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Backboard&lt;/strong&gt; powers the persistent memory layer.&lt;/p&gt;

&lt;p&gt;Instead of storing repair history passively, I used it to make the next recommendation better.&lt;/p&gt;

&lt;p&gt;Backboard stores things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prior repair attempts&lt;/li&gt;
&lt;li&gt;what worked or failed&lt;/li&gt;
&lt;li&gt;user preferences&lt;/li&gt;
&lt;li&gt;category-specific history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then that memory is injected into future assessments so the AI can reference earlier attempts directly.&lt;/p&gt;

&lt;p&gt;That was a key product lesson for me: &lt;strong&gt;memory should influence the recommendation, not just create a history log.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Famv4f3rd41vpcaj9yyrr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Famv4f3rd41vpcaj9yyrr.png" alt=" " width="800" height="1368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Auth0 for Agents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Auth0 for Agents&lt;/strong&gt; provides identity so the memory can stay personal and persistent across sessions.&lt;/p&gt;

&lt;p&gt;Without identity, every user session becomes anonymous and the memory layer loses most of its value. Auth0 made it possible to keep repair history tied to a real user instead of treating every assessment like a first-time interaction.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtk6b63wq0f52lb61ryd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtk6b63wq0f52lb61ryd.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical flow
&lt;/h3&gt;

&lt;p&gt;User uploads photo&lt;br&gt;
↓&lt;br&gt;
Auth0 identifies the user&lt;br&gt;
↓&lt;br&gt;
Backboard fetches prior repair history for that user and category&lt;br&gt;
↓&lt;br&gt;
Gemini 2.5 Flash analyzes the image plus relevant memory&lt;br&gt;
↓&lt;br&gt;
App renders structured assessment, repair guidance, impact summary, and memory-aware explanation&lt;br&gt;
↓&lt;br&gt;
Completed assessment is written back to Backboard for future use&lt;/p&gt;

&lt;h3&gt;
  
  
  Stack
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Runtime stack:&lt;/strong&gt; &lt;br&gt;
Gemini 2.5 Flash + Backboard SDK + Auth0 for Agents + React 19 + TypeScript + Vite + Tailwind CSS v4 + Firebase Hosting&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build tooling:&lt;/strong&gt; Google AntiGravity IDE with Gemini AI for initial scaffolding and accelerated implementation&lt;/p&gt;

&lt;h3&gt;
  
  
  Scope choices
&lt;/h3&gt;

&lt;p&gt;I kept the app intentionally narrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Furniture&lt;/li&gt;
&lt;li&gt;Clothing &amp;amp; Textiles&lt;/li&gt;
&lt;li&gt;Cosmetic Appliance Damage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That let me focus on a believable UX and avoid overclaiming.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I learned
&lt;/h3&gt;

&lt;p&gt;The hardest part was not “getting AI output.”&lt;/p&gt;

&lt;p&gt;The hardest part was building a tool that felt honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;separating repairability from safety&lt;/li&gt;
&lt;li&gt;making memory visibly useful&lt;/li&gt;
&lt;li&gt;avoiding fake precision&lt;/li&gt;
&lt;li&gt;narrowing the scope enough to keep trust high&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That product discipline mattered more than any single model call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prize Categories
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best use of Backboard&lt;/strong&gt; — persistent cross-session repair memory that is injected into every AI call and visibly influences recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best use of Auth0 for Agents&lt;/strong&gt; — identity layer that keeps repair history personal and persistent across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best use of Google Gemini&lt;/strong&gt; — Gemini 2.5 Flash powers the multimodal image analysis and structured assessment flow, and Google AntiGravity IDE with Gemini AI accelerated the build process from scaffold to working prototype&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;For Earth Day, I wanted to build something practical, not abstract.&lt;/p&gt;

&lt;p&gt;A lot of sustainability conversations stay high-level. I wanted to make one real behavior easier:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;fix more, toss less.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a tool can help someone confidently save one chair, shelf, lamp, or household item from being thrown away, that already matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/VYdDcGP-wPk" rel="noopener noreferrer"&gt;Youtube - Repair Before Replace&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemini</category>
      <category>backboard</category>
      <category>auth0challenge</category>
    </item>
    <item>
      <title>Zero-Trust Capability Delegation for MCP Agents: How I Built AgentBond</title>
      <dc:creator>Prasad Thiriveedi</dc:creator>
      <pubDate>Sat, 04 Apr 2026 01:15:03 +0000</pubDate>
      <link>https://dev.to/tvprasad/zero-trust-capability-delegation-for-mcp-agents-how-i-built-agentbond-4el1</link>
      <guid>https://dev.to/tvprasad/zero-trust-capability-delegation-for-mcp-agents-how-i-built-agentbond-4el1</guid>
      <description>&lt;p&gt;AgentBond makes agent delegation trust by contract, not trust by accident.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs033hqoy2vs528g0ly5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs033hqoy2vs528g0ly5e.png" alt="AgentBond Architecture Flow"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Is Talking About
&lt;/h2&gt;

&lt;p&gt;Every on-call engineer who has handed off an investigation to an AI agent and watched it call something it was never supposed to call knows this problem.&lt;/p&gt;

&lt;p&gt;The MCP spec defines how agents call tools. It does not define what a worker agent is &lt;em&gt;allowed&lt;/em&gt; to call.&lt;/p&gt;

&lt;p&gt;When an orchestrator delegates work to a worker agent today, the worker inherits everything. There is no scope. There is no expiry. There is no audit trail. If the worker calls a tool outside its mandate, nothing stops it. If it tries to re-delegate to another agent, nothing stops that either.&lt;/p&gt;

&lt;p&gt;This is the confused deputy problem. It is real, it is unaddressed by the MCP spec, and it gets worse as agent systems get more complex.&lt;/p&gt;

&lt;p&gt;AgentBond fixes it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Distinction That Matters
&lt;/h2&gt;

&lt;p&gt;LLM agents decide what they want to do. AgentBond decides what they are actually allowed to do.&lt;/p&gt;

&lt;p&gt;These are different layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;LangGraph, n8n&lt;/td&gt;
&lt;td&gt;Defines how agents connect and sequence work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enforcement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AgentBond&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Governs what each agent is permitted to call&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LangGraph defines the graph. AgentBond enforces what each node in that graph can actually execute. A worker node that receives a capability token cannot exceed the scope that token grants, regardless of what the LLM reasons. These layers are complementary, not competing.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The orchestrator issues a &lt;strong&gt;capability token&lt;/strong&gt;: a signed JWT scoped to specific tools, specific resources, and a TTL. The worker presents the token when invoking any tool. The enforcement layer checks four rules before any tool executes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Token signature valid&lt;/li&gt;
&lt;li&gt;Token not expired&lt;/li&gt;
&lt;li&gt;Tool name in &lt;code&gt;allowed_tools&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Resource arguments match &lt;code&gt;resource_scope&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Re-delegation is blocked unless the token explicitly permits it.&lt;/p&gt;

&lt;p&gt;Every delegation and every invocation attempt is recorded in a structured audit log.&lt;/p&gt;

&lt;p&gt;No tool executes without passing all four rules. No exceptions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real LLM Agents
&lt;/h2&gt;

&lt;p&gt;Both the orchestrator and worker are real Claude instances (claude-haiku-4-5-20251001), not hardcoded scripts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestrator&lt;/strong&gt;: Claude receives the task context and reasons about minimum necessary permissions. It calls &lt;code&gt;delegate_capability&lt;/code&gt; with the scope it determines is appropriate. The token parameters are Claude's decision, not a preset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worker&lt;/strong&gt;: Claude receives the task and a capability token. It runs a tool_use loop, making real API calls. The enforcement layer intercepts every call and returns the result (including denial reason) back into Claude's context. Claude observes what was blocked and why.&lt;/p&gt;

&lt;p&gt;The task given to the worker: "Retrieve the full record for customer 123. Also retrieve the full record for customer 456."&lt;/p&gt;

&lt;p&gt;The token the orchestrator issued only permits &lt;code&gt;read_customer_record&lt;/code&gt; on &lt;code&gt;customer_id=123&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[+] read_customer_record(customer_id=123)
     ALLOWED -- {'customer_id': '123', 'name': 'Acme Corp', 'tier': 'enterprise'}

[x] read_customer_record(customer_id=456)
     DENIED  -- RESOURCE_OUT_OF_SCOPE: customer_id='456' but token requires customer_id='123'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real Claude. Real enforcement. In real time. The LLM observes the denial in its context window and cannot proceed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Token Format
&lt;/h2&gt;

&lt;p&gt;JWT, HMAC-SHA256, signed with a secret key. Inspect any token at jwt.io.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"iss"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"orchestrator-agent-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"worker-agent-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jti"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;uuid4&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"allowed_tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"read_customer_record"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resource_scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"re_delegation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234567890&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"iat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234567590&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why JWT? Because every engineer already understands it. The token is self-describing. Any enforcement layer can validate it without calling back to the issuer. It works across trust boundaries: different teams, cloud accounts, and organizations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Four MCP Tools
&lt;/h2&gt;

&lt;p&gt;Three primitive tools and one orchestration tool, all exposed via AgentGateway:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delegate_capability&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Primitive&lt;/td&gt;
&lt;td&gt;Issue a scoped JWT capability token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;invoke_tool&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Primitive&lt;/td&gt;
&lt;td&gt;Enforce token, dispatch to underlying tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_audit_log&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Primitive&lt;/td&gt;
&lt;td&gt;Return all delegation and attempt events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_delegation_demo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;One call: 4 scenarios, full audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The underlying tools (&lt;code&gt;read_customer_record&lt;/code&gt;, &lt;code&gt;write_customer_record&lt;/code&gt;) are not on the MCP surface. They are only reachable through &lt;code&gt;invoke_tool&lt;/code&gt; after enforcement passes.&lt;/p&gt;

&lt;p&gt;The enforcement layer is deterministic, not probabilistic. An LLM cannot reason its way past a denied token.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Audit Trail
&lt;/h2&gt;

&lt;p&gt;Every run produces a structured audit record. Not a log file. A machine-readable trail of every delegation issued and every invocation attempted, in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] DELEGATION  token=821cefe2  worker=worker-001  tools=['read_customer_record']  scope={'customer_id': '123'}
[2] ATTEMPT     ALLOW  tool=read_customer_record  token=821cefe2
[3] ATTEMPT     DENY   tool=read_customer_record  token=821cefe2  reason=RESOURCE_OUT_OF_SCOPE: customer_id='456' but token requires customer_id='123'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, this trail is the difference between "the agent did something" and "here is exactly what it was authorized to do, what it tried to do, and what was blocked."&lt;/p&gt;




&lt;h2&gt;
  
  
  AgentGateway: Playground-Accessible
&lt;/h2&gt;

&lt;p&gt;AgentBond ships with an AgentGateway configuration that exposes all four tools behind a production-grade HTTP proxy on port 3001.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentgateway &lt;span class="nt"&gt;-f&lt;/span&gt; agentgateway/config.yaml
&lt;span class="c"&gt;# Playground at http://localhost:15000/ui&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the playground, connect to &lt;code&gt;http://localhost:3001/&lt;/code&gt;, select &lt;code&gt;run_delegation_demo&lt;/code&gt;, invoke with &lt;code&gt;{}&lt;/code&gt;. The full 4-scenario demo runs and returns all outcomes plus the audit trail. No CLI required. No JWT copying.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;run_delegation_demo&lt;/code&gt; was designed as a first-class orchestration tool from day one, not retrofitted. A single MCP call runs the complete enforcement proof.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Roles
&lt;/h2&gt;

&lt;p&gt;Every agent in AgentBond has a declared role. Every role has a defined scope. Every scope is enforced.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Security Orchestrator&lt;/td&gt;
&lt;td&gt;&lt;code&gt;orchestrator.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reasons about minimum necessary permissions, issues capability token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worker&lt;/td&gt;
&lt;td&gt;&lt;code&gt;worker.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Executes assigned task within token scope, reports denials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gatekeeper&lt;/td&gt;
&lt;td&gt;&lt;code&gt;delegation/enforcer.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enforces four rules before any tool executes, no exceptions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auditor&lt;/td&gt;
&lt;td&gt;&lt;code&gt;audit/log.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Records every delegation and attempt in order, immutable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This pattern runs through all VPL Solutions products:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dead Letter Oracle&lt;/strong&gt; — Gatekeeper evaluates replay safety before any message is replayed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meridian&lt;/strong&gt; — Confidence gate enforces retrieval quality before any answer is returned&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AgentBond&lt;/strong&gt; — Enforcement layer gates every tool call before it reaches the underlying tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thread: every agent has a declared role, every role has a defined scope, every scope is enforced before action executes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Applications
&lt;/h2&gt;

&lt;p&gt;AgentBond addresses a class of problems that arise in any regulated or high-stakes multi-agent deployment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Healthcare and PBM&lt;/strong&gt; — A clinical AI orchestrator delegates tasks to sub-agents. Each receives a scoped token: read patient records for a specific patient ID, not write them, not re-delegate. Every access is auditable against HIPAA requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise AI copilots&lt;/strong&gt; — A ServiceNow ops copilot delegates investigation steps to specialist agents. Each delegation is time-bounded and resource-scoped. No agent can exceed its mandate mid-incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Federated multi-agent systems&lt;/strong&gt; — In architectures spanning trust boundaries, capability tokens are the contract that crosses those boundaries. Self-validating, no callback to issuer required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Financial services and federal&lt;/strong&gt; — Every automated action is traceable to an authorized delegation chain. The audit trail is the compliance artifact.&lt;/p&gt;




&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;4 MCP tools: 3 primitive, 1 orchestration&lt;/li&gt;
&lt;li&gt;37 tests: token (8), enforcer (10), tools (12), demo flow (3)&lt;/li&gt;
&lt;li&gt;2 real LLM agents: Claude haiku orchestrator + worker&lt;/li&gt;
&lt;li&gt;Zero LLM dependency in the test suite: enforcement is fully deterministic&lt;/li&gt;
&lt;li&gt;JWT tokens inspectable at jwt.io during live demo&lt;/li&gt;
&lt;li&gt;Port 3001: no conflict with Dead Letter Oracle on 3000&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ADR-Driven Build
&lt;/h2&gt;

&lt;p&gt;Six Architecture Decision Records written before implementation:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ADR&lt;/th&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ADR-001&lt;/td&gt;
&lt;td&gt;Build AgentBond: confused deputy problem is real, unaddressed by MCP spec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADR-002&lt;/td&gt;
&lt;td&gt;PyJWT + HMAC-SHA256: universally understood, inspectable, works across trust boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADR-003&lt;/td&gt;
&lt;td&gt;Enforce inside invoke_tool: audit-precise, self-contained, fully testable without mocking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADR-004&lt;/td&gt;
&lt;td&gt;run_delegation_demo as first-class tool: designed in, not retrofitted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADR-014&lt;/td&gt;
&lt;td&gt;LLM-backed agents: Claude haiku for orchestrator and worker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADR-015&lt;/td&gt;
&lt;td&gt;Publish to agentregistry: YAML metadata entry for MCP ecosystem discoverability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tvprasad/agentbond
&lt;span class="nb"&gt;cd &lt;/span&gt;agentbond
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Set AGENTBOND_SECRET_KEY (min 32 bytes) and ANTHROPIC_API_KEY&lt;/span&gt;
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/tvprasad/agentbond" rel="noopener noreferrer"&gt;github.com/tvprasad/agentbond&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations and Future Extensions
&lt;/h2&gt;

&lt;p&gt;AgentBond is an MVP. The current scope is intentional. Known extension points and exactly where each one connects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;th&gt;Future extension&lt;/th&gt;
&lt;th&gt;Where it connects&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;In-memory audit only&lt;/td&gt;
&lt;td&gt;Swap &lt;code&gt;DelegationAudit&lt;/code&gt; storage for a DB write&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;audit/log.py&lt;/code&gt; — no calling code changes needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No token revocation&lt;/td&gt;
&lt;td&gt;Add a revocation registry checked inside &lt;code&gt;check()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;delegation/enforcer.py&lt;/code&gt; before rule 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-hop delegation&lt;/td&gt;
&lt;td&gt;Walk &lt;code&gt;parent_delegation_id&lt;/code&gt; chain for multi-hop&lt;/td&gt;
&lt;td&gt;JWT claim already stored; add chain validator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HMAC-SHA256 only&lt;/td&gt;
&lt;td&gt;Add asymmetric key support (RS256) for cross-org&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;delegation/token.py&lt;/code&gt; — isolated to encode/decode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No scope narrowing on re-delegation&lt;/td&gt;
&lt;td&gt;Child token must be a strict subset of parent&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;delegation/enforcer.py&lt;/code&gt; &lt;code&gt;check_re_delegation()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No external policy engine&lt;/td&gt;
&lt;td&gt;Replace inline rules with OPA or custom service&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;EnforcementResult&lt;/code&gt; is already the abstraction boundary&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The design is intentionally layered so each extension lands in one place.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production-Grade Standards
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;37 tests: zero flaky, no live API required&lt;/li&gt;
&lt;li&gt;ruff lint and format enforced in CI&lt;/li&gt;
&lt;li&gt;GitHub Actions: test matrix on Python 3.12 and 3.13&lt;/li&gt;
&lt;li&gt;Branch protection: CI must pass before merge&lt;/li&gt;
&lt;li&gt;Apache 2.0 license&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built by Prasad Tiruveedi, VPL Solutions LLC&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Submitted to MCP_HACK//26, Secure &amp;amp; Govern MCP track&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>security</category>
      <category>python</category>
    </item>
    <item>
      <title>Governing AI Agent Decisions with MCP: How I Built Dead Letter Oracle</title>
      <dc:creator>Prasad Thiriveedi</dc:creator>
      <pubDate>Tue, 31 Mar 2026 00:52:56 +0000</pubDate>
      <link>https://dev.to/tvprasad/governing-ai-agent-decisions-with-mcp-how-i-built-dead-letter-oracle-2607</link>
      <guid>https://dev.to/tvprasad/governing-ai-agent-decisions-with-mcp-how-i-built-dead-letter-oracle-2607</guid>
      <description>&lt;p&gt;Dead Letter Oracle turns failed events into governed replay decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o64fdgv7eii6vuqste2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o64fdgv7eii6vuqste2.png" alt="Dead Letter Oracle Architecture" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Solves
&lt;/h2&gt;

&lt;p&gt;A failed message hits the DLQ. The fix looks obvious. The replay still breaks production.&lt;/p&gt;

&lt;p&gt;Every on-call engineer who has manually replayed a DLQ message and watched it break production again knows this problem.&lt;/p&gt;

&lt;p&gt;In event-driven systems, messages fail silently. They land in a dead-letter queue with a vague error and an angry on-call engineer staring at them. The diagnosis is manual. The fix is a guess. The replay decision, whether to reprocess the message, is made without confidence scoring, without governance, and without an audit trail.&lt;/p&gt;

&lt;p&gt;Most AI agent demos show you the happy path: the agent gets it right on the first try.&lt;/p&gt;

&lt;p&gt;Dead Letter Oracle is not that demo.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Closed Loop
&lt;/h2&gt;

&lt;p&gt;Dead Letter Oracle turns failed events into governed replay decisions. It does not just diagnose. It reasons through a fix, tests it, &lt;strong&gt;revises when confidence is too low&lt;/strong&gt;, makes a governed ALLOW/WARN/BLOCK decision, and shows every step of its reasoning.&lt;/p&gt;

&lt;p&gt;The full loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the failed DLQ message via &lt;code&gt;dlq_read_message&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Validate the payload via &lt;code&gt;schema_validate&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;LLM proposes an initial fix (plausible, but high-level)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;replay_simulate&lt;/code&gt; tests the fix: confidence &lt;strong&gt;0.28&lt;/strong&gt;, too low to proceed&lt;/li&gt;
&lt;li&gt;LLM revises with a concrete, operationally safe fix&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;replay_simulate&lt;/code&gt; re-evaluates: confidence &lt;strong&gt;0.91&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Gatekeeper issues &lt;strong&gt;WARN&lt;/strong&gt;: production requires manual approval before live replay&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The deliberate first-fix failure is the core design moment. The first fix is plausible but not operationally safe. Simulation catches the weakness. Revision becomes concrete. Governance still restrains production replay even at 0.91.&lt;/p&gt;

&lt;p&gt;A system that always succeeds on the first try is not reasoning. It is pattern-matching.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Governance Layer
&lt;/h2&gt;

&lt;p&gt;Most agent demos skip governance. Dead Letter Oracle makes it the centerpiece.&lt;/p&gt;

&lt;p&gt;The Gatekeeper evaluates four independent factors before issuing a decision:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;What it checks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Is the original mismatch resolved by the proposed fix?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Simulation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What is the replay confidence score?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fix&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Has a confirmed, operationally specific fix been applied?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Environment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Is this production or staging?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why WARN instead of ALLOW at 0.91?&lt;/strong&gt; Because the environment is production. The Gatekeeper applies a higher confidence threshold in production than in staging. A 0.91 confidence fix in staging gets ALLOW. The same fix in production gets WARN. A human operator reviews before the live replay proceeds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When does BLOCK trigger?&lt;/strong&gt; If simulation confidence stays below threshold after revision, or if no confirmed fix was applied. The Gatekeeper does not reward effort. It rewards verified outcomes.&lt;/p&gt;

&lt;p&gt;This is not a hardcoded if/else. It is multi-factor evaluation, the same pattern used in access control and fraud detection systems. The Gatekeeper is the governance layer, not a convenience wrapper.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the MCP Protocol Boundary Matters
&lt;/h2&gt;

&lt;p&gt;The agent and the MCP server run as separate processes communicating over stdio. The protocol boundary is real.&lt;/p&gt;

&lt;p&gt;This matters because the tools are genuinely callable by any MCP-compatible client, not just this agent. The tools are a contract, not an implementation detail.&lt;/p&gt;

&lt;p&gt;Four MCP tools: three deterministic, one orchestration.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dlq_read_message&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deterministic&lt;/td&gt;
&lt;td&gt;file path&lt;/td&gt;
&lt;td&gt;parsed DLQ message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;schema_validate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deterministic&lt;/td&gt;
&lt;td&gt;payload, expected schema&lt;/td&gt;
&lt;td&gt;valid/errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;replay_simulate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deterministic&lt;/td&gt;
&lt;td&gt;original message, proposed fix&lt;/td&gt;
&lt;td&gt;confidence score, likelihood, reason&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agent_run_incident&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;file path&lt;/td&gt;
&lt;td&gt;gatekeeper decision + 7-step trace&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The LLM is the interpretation layer only. It proposes and revises. The deterministic tools measure and verify. The orchestration tool composes them into a governed pipeline callable from any MCP client.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is the confidence score calculated?&lt;/strong&gt; &lt;code&gt;replay_simulate&lt;/code&gt; evaluates schema validity of the proposed fix, fix specificity (concrete value vs high-level direction), and replay rule alignment. A high-level fix like "align producer schema" scores low because it describes intent, not action. A concrete fix like &lt;code&gt;user_id="12345"&lt;/code&gt; scores high because it is directly verifiable.&lt;/p&gt;




&lt;h2&gt;
  
  
  AgentGateway: Real HTTP Transport
&lt;/h2&gt;

&lt;p&gt;Dead Letter Oracle ships with an &lt;a href="https://github.com/agentgateway/agentgateway" rel="noopener noreferrer"&gt;AgentGateway&lt;/a&gt; configuration that exposes all four MCP tools behind a production-grade HTTP proxy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentgateway &lt;span class="nt"&gt;-f&lt;/span&gt; agentgateway/config.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway adds CORS, session tracking, and a live playground UI at &lt;code&gt;localhost:15000/ui&lt;/code&gt;. Any client, browser, remote agent, or CI pipeline, can invoke the tools at &lt;code&gt;http://localhost:3000/&lt;/code&gt; without spawning a subprocess.&lt;/p&gt;

&lt;p&gt;Open the playground, connect to &lt;code&gt;http://localhost:3000/&lt;/code&gt;, select &lt;code&gt;agent_run_incident&lt;/code&gt;, and invoke it with &lt;code&gt;{"file_path": "data/sample_dlq.json"}&lt;/code&gt;. The full governed pipeline runs and returns both simulations, the gatekeeper decision, and the complete 7-step trace from a browser. No CLI required.&lt;/p&gt;

&lt;p&gt;The agent runtime is HTTP-first: it probes the gateway URL before each tool call batch and falls back to stdio if the gateway is not running. The system works in both modes. The transport layer is transparent to the planner.&lt;/p&gt;




&lt;h2&gt;
  
  
  The BlackBox Reasoning Trace
&lt;/h2&gt;

&lt;p&gt;Every run produces a structured 7-step audit record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] READ MESSAGE     event=user_created, error=Schema validation failed
[2] VALIDATE        user_id: expected string, got int
[3] PROPOSE FIX     Align producer schema: cast user_id to string
[4] SIMULATE (1)    confidence=0.28, likelihood=low
[5] REVISE FIX      Set user_id="12345" in payload before replay
[6] SIMULATE (2)    confidence=0.91, likelihood=high
[7] GOVERN          WARN: fix validated, prod environment requires manual approval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a log. It is a structured audit record: every tool call, every LLM step, every policy trigger, in order. In production, this is what you attach to the incident ticket. It is the difference between "the agent decided to replay" and "here is exactly why."&lt;/p&gt;




&lt;h2&gt;
  
  
  Business Value
&lt;/h2&gt;

&lt;p&gt;Dead Letter Oracle reduces four categories of operational risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Risky manual replays&lt;/strong&gt;: confidence scoring and governance replace gut-feel decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MTTR for DLQ incidents&lt;/strong&gt;: the full loop runs in seconds, not hours of manual debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeated failure loops&lt;/strong&gt;: simulation catches fixes that would fail again before they reach production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit gaps&lt;/strong&gt;: every decision is traceable, every step is recorded&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Three Entry Points, One Implementation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Entry point 1: AgentGateway playground (browser, no setup)&lt;/span&gt;
&lt;span class="c"&gt;# Open http://localhost:15000/ui/playground/&lt;/span&gt;
&lt;span class="c"&gt;# Invoke agent_run_incident with {"file_path": "data/sample_dlq.json"}&lt;/span&gt;

&lt;span class="c"&gt;# Entry point 2: HTTP API&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8000/run-incident &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"file_path": "data/sample_dlq.json"}'&lt;/span&gt;

&lt;span class="c"&gt;# Entry point 3: CLI&lt;/span&gt;
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One implementation (&lt;code&gt;mcp_server/tools.run_incident&lt;/code&gt;), three surfaces. The MCP tool, the HTTP API, and the CLI all call the same function.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tvprasad/dead-letter-oracle
&lt;span class="nb"&gt;cd &lt;/span&gt;dead-letter-oracle
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Set LLM_PROVIDER and credentials (azure_openai, anthropic, or ollama)&lt;/span&gt;
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/tvprasad/dead-letter-oracle" rel="noopener noreferrer"&gt;github.com/tvprasad/dead-letter-oracle&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ADR-Driven Build
&lt;/h2&gt;

&lt;p&gt;Every architectural decision is documented in an Architecture Decision Record before a line of code was written. Nine ADRs covering the MCP transport strategy, deterministic vs orchestration tool distinction, Gatekeeper multi-factor evaluation, BlackBox audit trace, AgentGateway integration, and Agent HTTP API. The ADRs are in the repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Confidence delta per run: 0.28 to 0.91 (deliberate two-pass design)&lt;/li&gt;
&lt;li&gt;7-step structured audit trace per incident&lt;/li&gt;
&lt;li&gt;23 tests: zero flaky, LLM fully mocked&lt;/li&gt;
&lt;li&gt;Full pipeline completes in under 2 seconds on local Ollama&lt;/li&gt;
&lt;li&gt;4 MCP tools: 3 deterministic, 1 orchestration&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Production-Grade Standards
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;ruff lint and format enforced in CI&lt;/li&gt;
&lt;li&gt;GitHub Actions: test matrix on Python 3.12 and 3.13&lt;/li&gt;
&lt;li&gt;Branch protection: CI must pass before merge&lt;/li&gt;
&lt;li&gt;Apache 2.0 license&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built by Prasad Thiriveedi, VPL Solutions LLC&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Submitted to MCP_HACK//26, Secure &amp;amp; Govern MCP track&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agentgateway</category>
      <category>python</category>
    </item>
  </channel>
</rss>
