<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Urvil Joshi</title>
    <description>The latest articles on DEV Community by Urvil Joshi (@urvvil).</description>
    <link>https://dev.to/urvvil</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3508528%2F261bdfdb-5c6b-4857-a9d9-ca8a09b51641.jpg</url>
      <title>DEV Community: Urvil Joshi</title>
      <link>https://dev.to/urvvil</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/urvvil"/>
    <language>en</language>
    <item>
      <title>I Built My Own Spec-Driven Dev Workflow in Claude Code. Here’s What I Learned.</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Tue, 26 May 2026 04:58:04 +0000</pubDate>
      <link>https://dev.to/urvvil/i-built-my-own-spec-driven-dev-workflow-in-claude-code-heres-what-i-learned-5le</link>
      <guid>https://dev.to/urvvil/i-built-my-own-spec-driven-dev-workflow-in-claude-code-heres-what-i-learned-5le</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x8s01shqbca1zwqzuj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x8s01shqbca1zwqzuj5.png" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’ve been using AI to code for some time now. Copilot, Claude Code, Codex, Pi . I’ve shipped code with them. I’ve also spent more time than I’d like to admit fixing the things they confidently produced.&lt;/p&gt;

&lt;p&gt;A few months ago I started seeing a pattern I couldn’t ignore. The bugs weren’t random. They were the same bugs, over and over, across different projects. AI would write code that &lt;em&gt;looked&lt;/em&gt; right. It would compile. The obvious test case would pass. And then it would fail on the edge case nobody asked about.&lt;/p&gt;

&lt;p&gt;I thought it was my prompting. So I got better at prompting. The bugs got more subtle, not fewer.&lt;/p&gt;

&lt;p&gt;Then I created my workflow where I focused on human gates , planning and review. If you want to check that article out here is the &lt;a href="https://medium.com/@urvvil08/built-an-orchestrator-ai-agent-that-takes-my-issue-to-pull-requests-ab1912c93b52" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;link&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then I came across spec-driven development SDD and it was a obvious update for my workflow so I built my own workflow around it in Claude Code. I’m going to walk you through what I built and what I learned. I’m still figuring it out. This is not a “&lt;em&gt;here’s the answer&lt;/em&gt;” post. This is a “&lt;em&gt;here’s where I am&lt;/em&gt;” post.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥The Experiment That Made Me Care
&lt;/h3&gt;

&lt;p&gt;A few weeks ago I did an experiment I built a refund endpoint for a Spring Boot project. Standard stuff. I thought to add a partial refund feature to compare vibe coding and SDD.&lt;/p&gt;

&lt;p&gt;I prompted Claude Code with a reasonable description. Out came code that compiled and ran. I tested a single refund. It worked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s0d18t29rfsmx6x4bsm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s0d18t29rfsmx6x4bsm.png" width="800" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lwzxpf2qa2gfania5fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lwzxpf2qa2gfania5fo.png" width="634" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk98njgcfoezbeyh43v2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk98njgcfoezbeyh43v2p.png" width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then I ran two refund requests at the same time on the same order. Both succeeded. The order’s total was $100. The total refunded came out to $150.&lt;/p&gt;

&lt;p&gt;The customer would have walked away with an extra $50.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pf8of27j5ug50fhc2ka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pf8of27j5ug50fhc2ka.png" width="799" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The code wasn’t &lt;em&gt;broken&lt;/em&gt; in the usual sense. It had a check. It compared the refund amount to the order total. The check was just wrong under concurrent load a classic race condition. The AI didn’t know to handle concurrency because I didn’t tell it to handle concurrency. I didn’t tell it because I used it as a &lt;strong&gt;&lt;em&gt;search engine&lt;/em&gt;&lt;/strong&gt; not a &lt;strong&gt;&lt;em&gt;pair programmer&lt;/em&gt;&lt;/strong&gt;  . And I wasn’t thinking about it because the prompt and go workflow doesn’t make you think about it.&lt;/p&gt;

&lt;p&gt;That’s the moment I stopped blaming prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨What I Actually Realized
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1js8x1xjm0pbbh1msc0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1js8x1xjm0pbbh1msc0.png" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you see SDD approach, you are not just writing a prompt. When you start building a feature, you should know in your head how you will build the feature.&lt;/p&gt;

&lt;p&gt;That’s how we did coding previously, right?&lt;/p&gt;

&lt;p&gt;You get a feature. You create a design. You put it somewhere Obsidian, Notion, a notepad, your notebook, whatever. You draw the full picture. You know in your head what code you’ll add, what design pattern you’ll use, what the edge cases are. Once you know everything, you start coding.&lt;/p&gt;

&lt;p&gt;That’s what spec-driven development is asking you to do with AI.&lt;/p&gt;

&lt;p&gt;You’re clarifying everything to the agent. You’re not the audience watching it work. You’re the one who is driving it. You should know how this feature should be built, what changes should be done, what the requirements are, what the design changes in the code will be.&lt;/p&gt;

&lt;p&gt;The AI is incredibly capable at translating clear specs into working code. It’s bad at extracting intent from vague prompts. Once you accept that, the whole approach inverts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh92w5ykn3t5mcaz5ky9s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh92w5ykn3t5mcaz5ky9s.png" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍SDD Is Not New (And That’s the Point)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwfcukvq5ce3awb28klj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwfcukvq5ce3awb28klj.png" width="799" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I want to be honest here because I see a lot of takes painting spec-driven development as some breakthrough AI-era methodology. It’s not.&lt;/p&gt;

&lt;p&gt;CORBA’s IDL files in the nineties pioneered spec generates code for interfaces. Protocol Buffers carried that pattern forward in 2001. Test-Driven Development from the late nineties established “write the contract first.” Behavior-Driven Development in 2006 made specs readable in plain English.&lt;/p&gt;

&lt;p&gt;SDD takes those ideas and applies them to entire features, scaled up by LLMs. One researcher named Bryan Finster put it bluntly in a January 2026 paper: &lt;em&gt;“&lt;/em&gt; &lt;strong&gt;&lt;em&gt;SDD is not a revolution. It’s just BDD with branding.&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2o4lb424ihb9pk1yxuq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2o4lb424ihb9pk1yxuq.png" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;He’s mostly right. The branding does matter, because it reminds practitioners that specs should be authoritative, not advisory.&lt;/p&gt;

&lt;p&gt;The reason this works now and didn’t work in the 2000s with UML codegen is that natural language plus LLMs can bridge the gap that diagrams and codegen compilers never could. &lt;em&gt;We’re not inventing the methodology. We’re finally making it viable.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Spec Kit and Kiro
&lt;/h3&gt;

&lt;p&gt;There are two real tools getting attention right now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcwyf5wufm81szm5hkj46.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcwyf5wufm81szm5hkj46.png" width="800" height="569"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub’s Spec Kit&lt;/strong&gt; , open-sourced September 2025. It’s a CLI that installs slash commands and templates into your existing project. You run it, and your AI agent (Claude Code, Copilot, Cursor, over thirty of them) gets a structured spec-driven workflow. Good if you want to get started fast.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo07cmkf5s24aowme51h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo07cmkf5s24aowme51h.png" width="799" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS’s Kiro&lt;/strong&gt;. A full IDE built on Code OSS, with spec-driven development as a first-class primitive. Three documents per feature requirements, design, tasks with human approval gates between each. Good if you want the IDE experience.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Both are solid. Honestly. If you want to try SDD today, install one of them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But if you’re a developer reading this, I assume you have your own way of working. Even in your career, you’ll try different workflows to find what suits your style. Instead of adopting someone else’s structure, you can create your own something that complements how &lt;em&gt;you&lt;/em&gt; think.&lt;/p&gt;

&lt;p&gt;That’s what I did.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrqqkh1b64jvqiywht3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrqqkh1b64jvqiywht3j.png" width="475" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰What I Built
&lt;/h3&gt;

&lt;p&gt;I built a 14-phase workflow inside Claude Code using its native primitives — subagents, slash commands, hooks, and a status tracking file. No external tools. No third-party install. Just Claude Code’s own capabilities, composed deliberately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw777afwyuvzo1b0usg1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw777afwyuvzo1b0usg1q.png" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eleven Claude subagents, each with one job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A repo-init agent that reads my codebase and writes a project.md. Tech stack, build commands, test commands, conventions. Every later agent reads this.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fschnc09kcbocmuczw8nd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fschnc09kcbocmuczw8nd.png" width="799" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An issue-fetch agent that pulls a ticket from GitHub and creates a working folder.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeeznpv9fogfoh2zbnnq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeeznpv9fogfoh2zbnnq.png" width="587" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requirements clarification agents is for business questions, They ask me questions in batches till it is clear with requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6tagiky0du6xvo66ma4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6tagiky0du6xvo66ma4.png" width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A requirements agent that drafts a requirements.md from the answers which I review and give review comments it will resolve and present me file until I am satisfied with the requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf38c9yrc1hnzcqnlvn1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf38c9yrc1hnzcqnlvn1.png" width="799" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical Design clarification agents is for technical questions, They ask me questions in batches till it is clear with design.&lt;/li&gt;
&lt;li&gt;A technical design agent that drafts a design.md which I review and give review comments, psuedo code to help it make perfect design as i like and present me file until I am satisfied with the design.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl030k5mtckahjsi7yr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl030k5mtckahjsi7yr6.png" width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A task planner that breaks the design into ordered tasks with explicit test-first requirements.&lt;/li&gt;
&lt;li&gt;A TDD implementation agent that runs one task at a time red, green, refactor logging every test command and result to a traceability.md file.&lt;/li&gt;
&lt;li&gt;A review agent that audits the diff against requirements, design, tests, conventions, security, and maintainability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4jctwklhu5vplukb45i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4jctwklhu5vplukb45i.png" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A review resolution agent that fixes the issues the human accepts or what Human gave as a custom review finding as human should review all the code at this place too as even with all this there are mistakes in implamentation which as a dev you are responsible for and should be minimum.&lt;/li&gt;
&lt;li&gt;A Human resolution review agent will show what is resolved to the human ask for final approval this will be the gate where you review the diff again if you have time for extra caution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm8ah8wjx88cbpmhg5gp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm8ah8wjx88cbpmhg5gp.png" width="798" height="127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Final Summary will be created once approved by human and will upate final summary.md file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq88qwo4tnq3mypv7vcrk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq88qwo4tnq3mypv7vcrk.png" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A PR agent that drafts the commit message and pull request body.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv50nsolv6x7xq2zkfn1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv50nsolv6x7xq2zkfn1w.png" width="711" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Above all of them sits an orchestrator a thirteenth file that parses my plain-English messages and routes them to the right subagent. I never type slash commands during the workflow. I just say “approve requirements” or “accept findings 1 and 2, reject 3 and all also add this finding” or “raise PR” and the orchestrator handles it.&lt;/p&gt;

&lt;p&gt;I’m not claiming this is the best structure. It’s &lt;em&gt;mine&lt;/em&gt;. It fits how I work. Yours would look different and should.&lt;/p&gt;

&lt;h3&gt;
  
  
  🏁What It Caught On the Refund Bug
&lt;/h3&gt;

&lt;p&gt;I ran the same refund task through this workflow.&lt;/p&gt;

&lt;p&gt;The very first clarification agent before any code&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vfdpe2l4rs6dlf8ad41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vfdpe2l4rs6dlf8ad41.png" width="799" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;None of these were in the ticket description. None would have been caught by vibe coding. Every single one would become a bug if missed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6bdqeyi0231uc1gwkbj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6bdqeyi0231uc1gwkbj.png" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s the whole thing. The clarification phase is where the bugs that ship in production get caught before any code exists.&lt;/p&gt;

&lt;p&gt;The agent didn’t catch the race condition. It caught it because the workflow forced me to think about concurrency &lt;em&gt;before&lt;/em&gt; I let the AI write anything.&lt;/p&gt;

&lt;p&gt;I’m the one who solved the bug. The workflow at least just made sure I didn’t skip the question.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍The Results with Spec driven Development Flow
&lt;/h3&gt;

&lt;p&gt;I again ran two refund requests at the same time on the same order. This time one succeeded one did not as expected. The order’s total was $100. The total refunded came out to $90.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7cjscl6j1zyxtn4pjko.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7cjscl6j1zyxtn4pjko.png" width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨The Honest Trade-offs
&lt;/h3&gt;

&lt;p&gt;This workflow has overhead. Real overhead.&lt;/p&gt;

&lt;p&gt;A vibe-coded refund endpoint takes me maybe ten minutes. The spec-driven version through this workflow takes closer to forty minutes clarification rounds, requirements review, design clarification, design review, then the actual TDD implementation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtve3zekdygrg5xsfjas.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtve3zekdygrg5xsfjas.png" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a one-off script? Not worth it. Friday afternoon prototype? Vibe code it. Exploring something where you don’t know what you want yet? Vibe code it.&lt;/p&gt;

&lt;p&gt;But for code that handles money, code that lives in production, code that other people will read and maintain the forty minutes upfront saves multiples on rework, debugging, and shipped bugs. The tests aren’t an afterthought. The design doc isn’t fiction. The next person reading the code including future me has the requirements, the design, the tasks, and the full test history sitting right there in the repo.&lt;/p&gt;

&lt;p&gt;I’m not telling you to use my workflow. I’m telling you to think about which work in your life deserves which approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥Why I Wrote This
&lt;/h3&gt;

&lt;p&gt;This post isn’t a tutorial. It’s me sharing where I am.&lt;/p&gt;

&lt;p&gt;I’ve been hearing a lot of “&lt;em&gt;vibe coding is dead&lt;/em&gt;” and “&lt;em&gt;spec-driven is the future&lt;/em&gt;” takes lately. I think both are slightly wrong. Vibe coding is the right tool for some work. Spec-driven is the right tool for other work. The skill is knowing which is which.&lt;/p&gt;

&lt;p&gt;I’m continuously improving my workflow. If you’ve built something similar, if you think mine can be improved somewhere please tell me. That’s the whole point of writing this in public.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://youtu.be/eqWGzIHNjyw" rel="noopener noreferrer"&gt;I Built a AI Dev Workflow using Spec Driven Development in Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>specdrivendevelopmen</category>
      <category>softwareengineering</category>
      <category>developertools</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>I tried Pi after watching its founder explain why he quit Claude Code</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Thu, 07 May 2026 06:11:29 +0000</pubDate>
      <link>https://dev.to/urvvil/i-tried-pi-after-watching-its-founder-explain-why-he-quit-claude-code-2oef</link>
      <guid>https://dev.to/urvvil/i-tried-pi-after-watching-its-founder-explain-why-he-quit-claude-code-2oef</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxma1t2t0evcxpx6hf0j1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxma1t2t0evcxpx6hf0j1.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A walkthrough of the open-source coding agent that fits in 1,000 tokens and the one reason I can’t fully switch yet.&lt;/p&gt;

&lt;p&gt;A few days ago, I watched Mario Zechner the creator of Pi explain why he stopped using Claude Code. By the end, he’d convinced me to try it.&lt;/p&gt;

&lt;p&gt;Pi is an open-source coding agent with four tools, a system prompt under a thousand tokens. And one idea : &lt;strong&gt;the agent should be minimal and should able to modify itself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This post walks through what’s actually different about Pi, the demo I built to test it, an honest comparison with Claude Code, and the one reason I haven’t fully switched.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥Why Mario built Pi
&lt;/h3&gt;

&lt;p&gt;Mario’s pitch is simple and authentic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Modern coding agents got bloated.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code’s system prompt has got to roughly 14,000 tokens. Tools get added, modified, and removed between releases. System reminders get injected into your context behind your back. &lt;strong&gt;&lt;em&gt;You aren’t the owner of your context and you have zero control over it.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mario’s argument, paraphrased:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Models are &lt;em&gt;already&lt;/em&gt; trained to be coding agents. They don’t need a 10,000-token system prompt explaining what a coding agent is. They know.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So Pi strips it all down. Four tools read, write, edit, bash. A system prompt under 1,000 tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No MCP servers&lt;/li&gt;
&lt;li&gt;No sub-agents&lt;/li&gt;
&lt;li&gt;No permission prompts&lt;/li&gt;
&lt;li&gt;No plan mode&lt;/li&gt;
&lt;li&gt;No built-in to-dos&lt;/li&gt;
&lt;li&gt;No background bash&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead, &lt;em&gt;the agent extends itself&lt;/em&gt;. You ask Pi to add a feature, it writes a TypeScript extension, you hot-reload, and you’re done.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;An agent that adapts to your workflow, instead of the other way around.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That line is what got me to install it.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Installing Pi (60 seconds)
&lt;/h3&gt;

&lt;p&gt;Head to &lt;a href="https://pi.dev" rel="noopener noreferrer"&gt;pi.dev&lt;/a&gt; there’s a one-line install command. Or grab it from npm. Paste it in your terminal and you’re done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04j3eb5k2u4q7x3iu019.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04j3eb5k2u4q7x3iu019.png" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Type pi to start it. Then /login to pick a provider.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ttdpiqay7qvi0o2m2kn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ttdpiqay7qvi0o2m2kn.png" width="800" height="132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can sign in with your Anthropic, OpenAI, or GitHub Copilot subscription, or bring your own API key. Worth knowing: as of recently, Anthropic’s Pro plan limits don’t apply when you authenticate Pi with your Claude account . You’ll be billed as extra usage on top of the subscription for me it’s the dealbreaker.&lt;/p&gt;

&lt;p&gt;I’m logged in with my Claude account, so /model lets me pick from any Anthropic model. I'm running Sonnet here. There's also a /settings slash command if you want to change reasoning level, theme, or hide thinking traces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmc6fb34djmiflmybssy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmc6fb34djmiflmybssy.png" width="799" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍Three things that are actually different in Pi
&lt;/h3&gt;

&lt;p&gt;I won’t bore you with every slash command Pi’s GitHub has the full reference. But there are three design choices that genuinely set Pi apart.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. The system prompt is yours
&lt;/h4&gt;

&lt;p&gt;Drop a file called system.md in ~/.pi/agent/ and Pi uses &lt;em&gt;your&lt;/em&gt; system prompt instead of the default. Want to keep Pi's prompt and just append your own rules? Use append-system.md.&lt;/p&gt;

&lt;p&gt;This is huge. If you want to use Pi for non-coding work like research, writing or anything else you can swap out the entire instruction set. No coding agent I’ve used lets you do this. In Claude Code, you’re stuck inside the 14k-token prompt the team ships.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Sessions are trees, not lines
&lt;/h4&gt;

&lt;p&gt;Most coding agents give you a linear conversation. If the agent went the wrong way ten messages ago, you re-prompt or restart.&lt;/p&gt;

&lt;p&gt;Pi sessions are trees. Use /tree to see the full branch structure. Use /fork to create a new branch from any earlier message. You jump back to the point where things went sideways and continue from there so no re-prompting, no context loss and, no restart.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Bash does almost everything
&lt;/h4&gt;

&lt;p&gt;Pi only ships four tools. There’s no dedicated grep tool, no find tool, no git_status tool. Just bash.&lt;/p&gt;

&lt;p&gt;Mario’s reasoning: &lt;strong&gt;&lt;em&gt;models are reinforcement-trained on bash.&lt;/em&gt;&lt;/strong&gt; They know how to use it. Adding specialized tools is just added noise. If you want a custom tool, you build it as an extension.&lt;/p&gt;

&lt;p&gt;There are also no permission prompts. Pi runs full access by default. Mario’s argument is that &lt;strong&gt;&lt;em&gt;most users mindlessly click “accept” on every permission prompt anyway&lt;/em&gt;&lt;/strong&gt;. If you want real human gates, build them as an extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰The part that sold me
&lt;/h3&gt;

&lt;p&gt;Pi is a deliberately minimal harness. To get anything beyond the bare minimum, you have to build it. That sounds like a downside until you see what Mario gave you in return: &lt;strong&gt;&lt;em&gt;the agent ships with full knowledge of its own source code, and it can extend itself.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In other words, you ask Pi to add a feature. Pi reads its own extension docs. Pi writes the TypeScript file. You hot-reload. The feature is now part of your agent.&lt;/p&gt;

&lt;p&gt;I’ll show you two extensions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Extension 1: rebuilding my Claude Code orchestrator in Pi
&lt;/h4&gt;

&lt;p&gt;If you read my last post, you saw my issue-to-PR workflow in Claude Code. It’s an orchestrator sub-agent that spawns four other sub-agents with human approval gates in between.&lt;/p&gt;

&lt;p&gt;When I switched to Pi, I wanted the same workflow. But Pi has no sub-agents.&lt;/p&gt;

&lt;p&gt;So I had two choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Option A :- spawn separate Pi processes.&lt;/strong&gt; Each phase runs in its own isolated context window. This mirrors Claude Code’s sub-agent model exactly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Option B :- single shared session.&lt;/strong&gt; All phases run in one continuous Pi conversation. Higher token usage, but simpler to demo.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this demo I went with Option B. For real production work, I’d use Option A.&lt;/p&gt;

&lt;p&gt;I won’t walk through the full flow as its already there in my previous &lt;a href="https://medium.com/@urvvil08/built-an-orchestrator-ai-agent-that-takes-my-issue-to-pull-requests-ab1912c93b52" rel="noopener noreferrer"&gt;post&lt;/a&gt;. The point here is that the same multi-phase, gated workflow I used Claude Code’s sub-agent system for, I rebuilt as a single Pi extension. A TypeScript file in ~/.pi/agent/extensions/. Hot-reload, and it's live.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkawp1jib4kgb2eiwgve3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkawp1jib4kgb2eiwgve3.png" width="800" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kbq8xn0taci96umk0di.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kbq8xn0taci96umk0di.png" width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Extension 2: a status widget Pi built for itself
&lt;/h4&gt;

&lt;p&gt;This one’s the demo that captures Pi’s whole pitch in 30 seconds. Everyone on Reddit is building this, so I built one too.&lt;/p&gt;

&lt;p&gt;I gave this Prompt :&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Read your own extension docs and build a status widget that shows the current git branch and number of uncommitted changes. Save it to my extensions folder.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5x9k2svo6ujcubqdeuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5x9k2svo6ujcubqdeuh.png" width="799" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pi read its own documentation, wrote the extension, saved it to the right path, and told me to run /reload.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkesqb1okyj4cib3234pt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkesqb1okyj4cib3234pt.png" width="799" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I ran /reload.&lt;/p&gt;

&lt;p&gt;The status bar at the bottom of my terminal now showed my git branch and uncommitted file count. The agent had just extended itself. Live. In one prompt. Try doing that in Claude Code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6y6e6mgbnq2q3ylxdku2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6y6e6mgbnq2q3ylxdku2.png" width="800" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🏁Skills
&lt;/h3&gt;

&lt;p&gt;Pi also supports skills. They live in ~/.pi/agent/skills/ (or in your project repo for project-local skills).&lt;/p&gt;

&lt;p&gt;To invoke one, type /skill  and write your prompt. There's nothing radically different from how skills work in other agents you just paste your skill and call it like above.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuf8qztjj8zrv780ropf3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuf8qztjj8zrv780ropf3.png" width="797" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Pi vs Claude Code — honest comparison
&lt;/h3&gt;

&lt;p&gt;Here’s what each one ships out of the box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe519h9w5xbjlwt795m24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe519h9w5xbjlwt795m24.png" width="800" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt; has permission prompts, MCP support, sub-agents, plan mode, a large system prompt, and is locked to Anthropic models. It’s a finished product. Everything you need is there on day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pi&lt;/strong&gt; has four tools, a sub-1,000-token system prompt, multi-provider support (Anthropic, OpenAI, Copilot, OpenRouter, Ollama, and more), an editable system prompt, extensions, and the ability to modify itself. It ships none of features &lt;strong&gt;&lt;em&gt;by default&lt;/em&gt;&lt;/strong&gt; but you can build any of them as extensions, or install someone else’s package.&lt;/p&gt;

&lt;p&gt;If you want something that works on day one, Claude Code wins. If you want to actually own your workflow, Pi wins.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✍️The one reason I can’t fully switch
&lt;/h3&gt;

&lt;p&gt;Anthropic bills your Pi sessions as extra per-token usage. Effectively, you’re paying twice once for your subscription, again for every Pi token.&lt;/p&gt;

&lt;p&gt;That’s not Pi’s fault. It’s Anthropic’s policy. But it means switching to Pi while keeping my Anthropic subscription is financially dumb for me right now.&lt;/p&gt;

&lt;p&gt;If I weren’t on Anthropic and if I was using a ChatGPT subscription, or Copilot, or running local models with Ollama Pi would be my full-time coding agent. The minimalism, the extensibility, the fact that &lt;em&gt;I&lt;/em&gt; control my context. Mario nailed it. This is how I want my coding agent to work.&lt;/p&gt;

&lt;p&gt;So for now, Pi sits next to Claude Code in my workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pi GitHub: &lt;a href="https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent#quick-start" rel="noopener noreferrer"&gt;github.com/badlogic/pi-mono&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install: &lt;a href="https://pi.dev" rel="noopener noreferrer"&gt;pi.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=RjfbvDXpFls&amp;amp;t=184s" rel="noopener noreferrer"&gt;Building pi in a World of Slop — Mario Zechner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mariozechner.at/posts/2025-11-30-pi-coding-agent/" rel="noopener noreferrer"&gt;Mario’s blog post on why he built Pi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=VEehKa__op8" rel="noopener noreferrer"&gt;Pi Coding Agent: The Open-Source Tool That Modifies Itself&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aicodingtool</category>
      <category>claudecode</category>
      <category>picodingagent</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built an Orchestrator AI Agent That Takes My Github issue to Pull Requests.</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Tue, 28 Apr 2026 13:09:05 +0000</pubDate>
      <link>https://dev.to/urvvil/i-built-an-orchestrator-ai-agent-that-takes-my-github-issue-to-pull-requests-57jm</link>
      <guid>https://dev.to/urvvil/i-built-an-orchestrator-ai-agent-that-takes-my-github-issue-to-pull-requests-57jm</guid>
      <description>&lt;p&gt;&lt;em&gt;A Claude Code workflow with one orchestrator, five subagents, and three human gates running on a real Spring Boot project, end to end&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is my Minimal dev setup in 2026.&lt;/p&gt;

&lt;p&gt;Pixel Agents in VS Code to monitor my agents. Claude Code in the terminal. Together, they take a GitHub issue and turn it into a merged pull request with three approvals from me along the way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8yqpzr2e9a290jwuarv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8yqpzr2e9a290jwuarv.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥 The project: LinkStash
&lt;/h3&gt;

&lt;p&gt;LinkStash is a Spring Boot URL shortener I built last week. To be upfront: this is not a serious production repo. It’s a demo I created specifically to show this workflow on a realistic codebase.&lt;/p&gt;

&lt;p&gt;The repo has one open issue:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfmq6fbpi1czqcx02251.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfmq6fbpi1czqcx02251.png" width="799" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s the issue I want my workflow to handle.I’m going to invoke an agent and it will handle it with my inputs and reviews.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplxwjppyhqj7zohm16po.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplxwjppyhqj7zohm16po.png" width="799" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨ Step 1 → CLAUDE.md sets the rules
&lt;/h3&gt;

&lt;p&gt;Every Claude Code session reads CLAUDE.md first. It's the file you keep at the repo root that tells Claude how your project works. Conventions, what not to do, project structure basically all of it.&lt;/p&gt;

&lt;p&gt;For LinkStash, mine includes things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Constructor injection only — never field injection&lt;/li&gt;
&lt;li&gt;Records for DTOs&lt;/li&gt;
&lt;li&gt;Don’t add Lombok&lt;/li&gt;
&lt;li&gt;Don’t push to main&lt;/li&gt;
&lt;li&gt;Always write a Flyway migration, never use ddl-auto: update&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve never written one, run /init in Claude Code and it'll generate a starting point you can edit. The trick is keeping it tight long CLAUDE.md files dilute attention. &lt;strong&gt;&lt;em&gt;Forty lines of clear rules beats two hundred lines of vague guidance.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you use multiple coding agents(Claude, Codex, Copilot) : you can create AGENT.md for general conventions shared across all coding agents (Claude, Codex, Copilot), and keep agent specific md for your Coding Agent specific things.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Step 2 → The orchestrator agent
&lt;/h3&gt;

&lt;p&gt;Here’s where it gets interesting. My main agent is called issue-resolver, and it lives in .claude/agents/.&lt;/p&gt;

&lt;p&gt;It does three things on its own and pauses three times for me. The high-level flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetches the GitHub issue (via my MCP server — more on this below)&lt;/li&gt;
&lt;li&gt;Spawns a subagent that explores the codebase and writes ARCHITECTURE.md&lt;/li&gt;
&lt;li&gt;Spawns a subagent that drafts plan.md&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pauses for me to approve the plan&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Spawns a subagent that implements the plan&lt;/li&gt;
&lt;li&gt;Spawns a subagent that runs /ultrareview for self-critique&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pauses for me to triage findings&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Spawns a subagent that applies accepted findings&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pauses for me to do a final review of the changes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Pushes and opens the PR&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key rule baked into the agent prompt: &lt;em&gt;never modify code yourself, always delegate to subagents&lt;/em&gt;. The orchestrator only orchestrates. &lt;strong&gt;&lt;em&gt;Each subagent has one job&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍A note on the MCP server
&lt;/h3&gt;

&lt;p&gt;For fetching the issue, I’m using a custom MCP server I built in a previous video. You don’t have to do this the official GitHub MCP server has a gh_get_issue tool that does the same thing. Or you could use Claude Code skills.&lt;/p&gt;

&lt;p&gt;I’m using my own because I built it for a related workflow already. Pick whichever fits your workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Step 3 → Kicking off the loop
&lt;/h3&gt;

&lt;p&gt;The whole invocation is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@issue-resolver fetch and resolve issue #1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. The orchestrator goes to work. First it fetches the issue from my MCP server. Then the explore subagent reads the codebase and writes ARCHITECTURE.md : &lt;strong&gt;&lt;em&gt;entities, endpoints, data flow, testing conventions&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Then the plan subagent runs and produces plan.md. This is where I get the first interesting moment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgzgsm19932c1w4cviov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgzgsm19932c1w4cviov.png" width="800" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨ Step 4 → Gate 1: Plan approval
&lt;/h3&gt;

&lt;p&gt;The plan came back with two open questions the agent flagged for me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Which rate-limiting algorithm — token bucket or fixed window?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Should creating a link with a past&lt;/em&gt; &lt;em&gt;expiresAt return 400?&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is exactly what I want. The agent isn’t guessing it’s asking back the engineering decisions.&lt;/p&gt;

&lt;p&gt;I told it: greedy token bucket (Bucket4j default behavior) and yes, return 400 on past expiry validated against server time.&lt;/p&gt;

&lt;p&gt;I sent the plan back. Plan came back updated. Both open questions resolved. Approved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxhbwe47ef1nbb1x5nh2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxhbwe47ef1nbb1x5nh2.png" width="800" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Step 5 → Implementation runs
&lt;/h3&gt;

&lt;p&gt;The implement subagent takes over. New tables for API keys and link expiration. The Bucket4j filter. Updated controllers. Tests added. Tests run after each major change. All green.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu9oun3dks542ktnw1he.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu9oun3dks542ktnw1he.png" width="760" height="115"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Step 6 → Gate 2: /ultrareview findings
&lt;/h3&gt;

&lt;p&gt;After implementation, the orchestrator spawns the review subagent. /ultrareview is Claude Code's high-effort self-critique mode.&lt;/p&gt;

&lt;p&gt;Findings come back as a numbered list with severity, file location, and suggested fixes. I get a structured prompt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fwukmkpokb2t5niikbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fwukmkpokb2t5niikbs.png" width="800" height="318"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reply: "accept all" / "accept all except [numbers]" / "accept only [numbers]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the second gate. I read each finding, decide which are real, which are nitpicks, which are wrong. If I disagree with one, I exclude it.&lt;/p&gt;

&lt;p&gt;The honest part of this workflow is right here: even when /ultrareview is correct in principle, &lt;em&gt;I’m the one who is responsible for the commit&lt;/em&gt;. I don’t accept findings blindly. I read them.&lt;/p&gt;

&lt;p&gt;This time I accepted all of them as they were all fair calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Step 7 → Gate 3: Post-fix review
&lt;/h3&gt;

&lt;p&gt;After applying findings subagent which will resolve the findings we found out , the orchestrator pauses one more time before pushing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfzsy4hnexv46npd2kom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfzsy4hnexv46npd2kom.png" width="799" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I scroll through the diff. If I have any additional changes I’d want even though /ultrareview didn’t flag them I describe them and the agent runs apply-findings again. If the diff looks clean, I type push.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You might say three gates is excessive that it slows the workflow down&lt;/em&gt;&lt;/strong&gt;. For a demo, sure, you can argue that. But for production code, code that ships to clients, code that runs at scale &lt;strong&gt;&lt;em&gt;you should know what your AI is shipping&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The gates aren’t friction. They’re the part of the workflow that keeps you accountable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  ✨Step 8 →The PR
&lt;/h3&gt;

&lt;p&gt;The PR has a clean summary, “Generated by Claude Code” attribution, all the commits, all the file changes. Ready for review.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4e2hqrnbm42dfkezi7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4e2hqrnbm42dfkezi7l.png" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰What I’d take from this if I were building it myself
&lt;/h3&gt;

&lt;p&gt;A few things I learned that I’d recommend if you’re trying to set up your own version:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep your CLAUDE.md tight.&lt;/strong&gt; Forty lines of clear rules beats two hundred lines of vague guidance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagents for one job each.&lt;/strong&gt; The temptation is to make smart, multi-purpose agents. Resist it. Each subagent does one thing explore, plan, implement, review, apply, resolve. Predictable and easy to debug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human gates are non-negotiable for real work.&lt;/strong&gt; Demo? Skip them if you want. Production? Human gates. That’s the floor, not the ceiling.&lt;/p&gt;

&lt;h3&gt;
  
  
  🏁Closing
&lt;/h3&gt;

&lt;p&gt;The point of this workflow isn’t “ &lt;strong&gt;&lt;em&gt;AI does my job.&lt;/em&gt;&lt;/strong&gt; ” It’s the opposite. AI does the typing. I do the deciding. Critical Decisions, in the right places. That’s the modern dev workflow if you’re trying to use these tools seriously instead of as a novelty :) .&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://youtu.be/51LURUiqGZA" rel="noopener noreferrer"&gt;Claude Code Agent Workflow: Issue to Merged PR (Full Demo)&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>springboot</category>
      <category>agents</category>
      <category>claudecode</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>Andrej Karpathy’s LLM Wiki: Create your own knowledge base</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Mon, 20 Apr 2026 04:50:08 +0000</pubDate>
      <link>https://dev.to/urvvil/andrej-karpathys-llm-wiki-create-your-own-knowledge-base-5dij</link>
      <guid>https://dev.to/urvvil/andrej-karpathys-llm-wiki-create-your-own-knowledge-base-5dij</guid>
      <description>&lt;p&gt;Andrej Karpathy &lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;&lt;strong&gt;tweeted&lt;/strong&gt;&lt;/a&gt;something that quietly broke the AI community’s understanding of how we should be using LLMs to manage knowledge.&lt;/p&gt;

&lt;p&gt;Two days later, he followed up with a GitHub gist called &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener noreferrer"&gt;&lt;strong&gt;llm-wiki.md&lt;/strong&gt;&lt;/a&gt;. The idea isn’t a product. It’s not code. It’s a &lt;em&gt;pattern&lt;/em&gt; a special one that might make will help you create a small scale personal knowledge base in few minutes.&lt;/p&gt;

&lt;p&gt;Let’s break this down.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥The Tweet That Started It
&lt;/h3&gt;

&lt;p&gt;Karpathy’s original tweet:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating…”&lt;/p&gt;

&lt;p&gt;&lt;em&gt;— @karpathy, April 2, 2026&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that’s what he published a single markdown file on GitHub Gist. Something he calls an &lt;strong&gt;idea file&lt;/strong&gt; : a document meant to be copy-pasted into an LLM agent like Claude Code , OpenAI Codex or any agent, where &lt;em&gt;your&lt;/em&gt; agent then instantiates the pattern for &lt;em&gt;your&lt;/em&gt; specific needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨ &lt;strong&gt;The Core Idea: Stop Retrieving. Start Compiling.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s the insight in one sentence: &lt;strong&gt;instead of having the LLM re-read your raw documents every time you ask a question, build a persistent, structured wiki once and keep it updated forever.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Karpathy used an analogy from software engineering: &lt;strong&gt;compilation&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│ SOFTWARE ENGINEERING │
│ │
│ Source Code ──[compile once]──► Binary │
│ (readable) (runs fast every │
│ single call) │
└─────────────────────────────────────────────────────────────┘
                          ⇕ same idea ⇕
┌─────────────────────────────────────────────────────────────┐
│ LLM WIKI │
│ │
│ Raw Sources ──[LLM compiles]──► Wiki │
│ (PDFs, notes, (pre-synthesized, │
│ articles) interlinked, │
│ always ready) │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don’t execute source code every time you want to run a program. You compile it once into a binary and run &lt;em&gt;that&lt;/em&gt;. Karpathy says: treat knowledge the same way. Your PDFs and notes are the source code. The wiki is the binary.&lt;/p&gt;

&lt;p&gt;Every time you add a new document, the LLM doesn’t just index it. It &lt;strong&gt;reads it, extracts the key information, updates existing pages, revises summaries, flags contradictions, and strengthens cross-links&lt;/strong&gt;. The wiki is a persistent, compounding artifact.&lt;/p&gt;

&lt;p&gt;In Karpathy’s own words, the line that captures the whole philosophy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You rarely write the wiki yourself. You curate sources, ask questions, and think. The LLM handles the whole work summarizing, cross-referencing, filing, and bookkeeping.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍The Three-Layer Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;╔══════════════════════════════════════════════════════════════╗
║ LAYER 3 — THE SCHEMA ║
║ (CLAUDE.md / AGENTS.md) ║
║ ║
║ Rules • Conventions • Workflows • How to ingest/query ║
║ ║
║ ↕ tells the LLM HOW to behave ║
╠══════════════════════════════════════════════════════════════╣
║ LAYER 2 — THE WIKI ║
║ (LLM owns this entirely) ║
║ ║
║ ┌──────────┐ ┌──────────┐ ┌──────────┐ ║
║ │ Entity │──│ Concept │──│ Overview │ index.md ║
║ │ pages │ │ pages │ │ pages │ log.md ║
║ └──────────┘ └──────────┘ └──────────┘ ║
║ ↑ LLM creates, links, updates, maintains ║
╠══════════════════════════════════════════════════════════════╣
║ LAYER 1 — RAW SOURCES ║
║ (IMMUTABLE) ║
║ ║
║ 📄 PDFs 📰 Articles 🎧 Podcast notes 🖼️ Images ║
║ ║
║ LLM reads • NEVER modifies • source of truth ║
╚══════════════════════════════════════════════════════════════╝
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 1 — Raw sources.&lt;/strong&gt; Your curated collection. Articles, papers, meeting notes, images. Immutable. The LLM reads them but &lt;em&gt;never&lt;/em&gt; modifies them. This is your ground truth. The fact that they’re immutable is a deliberate design choice: you can always re-compile the wiki from scratch if needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — The wiki.&lt;/strong&gt; A directory of markdown files the LLM owns completely. Entity pages, concept pages, summaries, an index, a log. You read it. The LLM writes it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 — The schema.&lt;/strong&gt; This is a CLAUDE.md (for Claude Code) or AGENTS.md (for Codex) file. It’s the config that turns a generic agent into a &lt;em&gt;disciplined wiki maintainer&lt;/em&gt;. It defines how pages are structured, how new sources get ingested, how answers get formatted.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰The Three Operations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    ┌──────────────────────┐
                    │ YOU (Human) │
                    │ curates &amp;amp; asks │
                    └──────────┬───────────┘
                               │
          ┌────────────────────┼────────────────────┐
          │ │ │
          ▼ ▼ ▼
   ┌────────────┐ ┌────────────┐ ┌────────────┐
   │ 1. INGEST │ │ 2. QUERY │ │ 3. LINT │
   ├────────────┤ ├────────────┤ ├────────────┤
   │ Drop new │ │ Ask a │ │ Health- │
   │ source → │ │ question → │ │ check wiki │
   │ LLM reads, │ │ LLM reads │ │ → find │
   │ summarises,│ │ wiki &amp;amp; │ │ contra- │
   │ updates │ │ synthesises│ │ dictions, │
   │ 10–15 wiki │ │ answer │ │ orphans, │
   │ pages │ │ w/ cites │ │ stale data │
   └─────┬──────┘ └─────┬──────┘ └─────┬──────┘
         │ │ │
         └────────────────────┴────────────────────┘
                              │
                              ▼
                    ┌──────────────────────┐
                    │ WIKI COMPOUNDS │
                    │ (every op makes it │
                    │ richer over time)│
                    └──────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Ingest.&lt;/strong&gt; You drop a source into the raw folder. The LLM reads it, writes a summary page, and touches some related pages updating, cross-linking, flagging contradictions. A single article becomes a web of updates across your entire knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query.&lt;/strong&gt; You ask a question. The LLM doesn’t search raw documents it reads the already synthesized wiki and answers. And here’s the compounding trick: &lt;strong&gt;good answers can be filed back into the wiki as new pages&lt;/strong&gt;. Your explorations become permanent knowledge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lint.&lt;/strong&gt; Periodically, you ask the LLM to audit the whole wiki. Find contradictions. Find orphan pages with no links pointing in. Find concepts that are mentioned but missing their own page. The wiki stays healthy because the LLM does the maintenance no human ever wants to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Let’s Actually Build One
&lt;/h3&gt;

&lt;p&gt;Let’s build a working LLM Wiki together.&lt;/p&gt;

&lt;h4&gt;
  
  
  What you need
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; (or OpenAI Codex, or any agent) the brain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Obsidian&lt;/strong&gt; (free, &lt;a href="https://obsidian.md" rel="noopener noreferrer"&gt;obsidian.md&lt;/a&gt;) — the viewer&lt;/li&gt;
&lt;li&gt;A folder on your computer — your vault&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Step 1: Create the folder structure
&lt;/h4&gt;

&lt;p&gt;Open your terminal:&lt;/p&gt;

&lt;p&gt;bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir llm-wiki-demo &amp;amp;&amp;amp; cd llm-wiki-demo
mkdir raw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;llm-wiki-demo/
├── raw/ (your immutable sources go here)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Open Claude Code in that folder, and paste this single message
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;“I want you to read this idea file by Andrej Karpathy and help me set up an LLM Wiki in this directory. Before you do anything, ask me what this wiki will be about, and what sources I plan to feed it. Once I answer, write me a CLAUDE.md schema file based on my answer”.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;paste the full contents of &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener noreferrer"&gt;Karpathy’s original gist&lt;/a&gt; here&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Claude will respond with some clarifying questions
&lt;/h4&gt;

&lt;p&gt;Claude will respond with a few clarifying questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What topic will this wiki cover?&lt;/li&gt;
&lt;li&gt;What kinds of sources will you feed it?&lt;/li&gt;
&lt;li&gt;Roughly how many sources are you planning to ingest?&lt;/li&gt;
&lt;li&gt;What page types do you want?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Step 4: Answer honestly
&lt;/h4&gt;

&lt;p&gt;For this demo, I’m building a wiki about &lt;strong&gt;AI and the philosophy of software&lt;/strong&gt;. My answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The wiki covers AI research and the philosophy of software. I’ll feed it short essays and blog posts from people like Rich Sutton and Andrej Karpathy. Probably 10–20 sources. I want concept pages, essay summaries, and author pages.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude will now write a CLAUDE.md file tailored to that use case, initialize wiki/index.md and wiki/log.md, and say something like &lt;em&gt;"Ready to ingest your first source."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You just built the whole schema without writing a line of code. That’s Karpathy’s pattern working exactly as intended.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 5: Ingest sources
&lt;/h4&gt;

&lt;p&gt;For my demo I have two sources&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;#1 Rich Sutton’s “The Bitter Lesson”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drop Rich Sutton’s “The Bitter Lesson” into raw/ as bitter-lesson.pdf.&lt;/p&gt;

&lt;p&gt;Tell Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Ingest raw/bitter-lesson.pdf."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Watch what happens. Claude reads the 2-page essay and generates something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wiki/
├── index.md (updated)
├── log.md (new entry appended)
├── sources/
│ └── bitter-lesson.md (summary page)
├── concepts/
│ ├── search.md
│ ├── learning.md
│ ├── moores-law.md
│ ├── general-methods.md
│ └── human-knowledge-approaches.md
├── examples/
│ ├── computer-chess.md
│ ├── computer-go.md
│ ├── speech-recognition.md
│ └── computer-vision.md
└── people/
    └── rich-sutton.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One 2-page PDF just became ~10 interlinked pages. Each page cross-references the others with Obsidian-style [[wikilinks]].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;#2 — Karpathy’s “Software 2.0”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drop &lt;strong&gt;Karpathy’s “Software 2.0”&lt;/strong&gt; into raw/as &lt;em&gt;software-2-0.pdf&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Tell Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Ingest raw/software-2-0.pdf."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude doesn’t start from scratch. It reads your existing wiki first, recognizes that Karpathy’s “Software 2.0” essay is arguing something closely related to the Bitter Lesson, and does something remarkable: it &lt;strong&gt;updates the existing pages&lt;/strong&gt; to add Karpathy’s framing, strengthens the cross-references, and creates new pages only where needed.&lt;/p&gt;

&lt;p&gt;The software-2-0.md page now includes a [[bitter-lesson]] backlink because the LLM detected the conceptual connection between the two essays a link &lt;em&gt;no human added&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your wiki got denser, not just bigger.&lt;/strong&gt; This is the compounding property Karpathy is pointing at.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 6: Ask a synthesis question
&lt;/h4&gt;

&lt;p&gt;Now the payoff:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do Sutton and Karpathy agree about the future of software, and where might they disagree?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude doesn’t reopen the PDFs. It reads the two wiki pages you just built, follows the [[links]] between them, and gives you a grounded cross-author synthesis in seconds. That answer which draws on connections that didn't exist 60 seconds ago is now a file sitting in your vault forever.&lt;/p&gt;

&lt;p&gt;This is what Karpathy means when he says knowledge &lt;em&gt;compounds&lt;/em&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 7: Open Obsidian and point it at the folder
&lt;/h4&gt;

&lt;p&gt;Install &lt;a href="https://obsidian.md" rel="noopener noreferrer"&gt;Obsidian&lt;/a&gt;, create a new vault, point it at your llm-wiki-demo/ folder, and hit the &lt;strong&gt;graph view&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You’re now looking at your knowledge as a network. Nodes are pages. Edges are the links Claude added automatically. Every source you add makes the graph denser.&lt;/p&gt;

&lt;p&gt;That moment when the graph renders for the first time is when most people get it.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍RAG vs LLM Wiki: The Honest Comparison
&lt;/h3&gt;

&lt;p&gt;The question everyone asks: is this actually better than RAG?&lt;/p&gt;

&lt;p&gt;Honest answer: &lt;strong&gt;neither wins. They solve different problems.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────┬─────────────────────────────────┐
│ RAG │ LLM WIKI │
├─────────────────────────────────┼─────────────────────────────────┤
│ │ │
│ 📄 Raw docs stay raw │ 📄 Raw docs compiled into │
│ │ structured wiki pages │
│ │ │
│ 🔍 Retrieves chunks per query │ 📖 Reads pre-synthesized pages │
│ │ │
│ 🔁 Stateless — every query │ 📈 Stateful — knowledge │
│ starts from scratch │ compounds over time │
│ │ │
│ 🧩 Answers assembled from │ 🔗 Answers drawn from already- │
│ fragments at runtime │ connected concepts │
│ │ │
│ 🕒 Cheap per query │ 💰 Expensive ingest, │
│ │ cheap query │
│ │ │
│ ✅ Perfect traceability to │ ⚠️ Answers 1–2 steps removed │
│ source (which chunk?) │ from raw source │
│ │ │
│ ❌ No cross-time synthesis │ ✅ Links March article to │
│ │ October article naturally │
│ │ │
│ ✅ Fresh data always re-read │ ⚠️ Updates require re-ingest │
│ │ │
│ ✅ Hallucinations stay local │ ⚠️ Hallucinations can get │
│ to one answer │ baked in as "facts" │
│ │ │
│ 🎯 Best for: large, changing │ 🎯 Best for: ~100–500 curated │
│ corpora, fact lookup, │ sources, research projects, │
│ millions of docs │ personal knowledge, books │
│ │ │
└─────────────────────────────────┴─────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;RAG&lt;/strong&gt; is great when you have millions of documents that change constantly and you need precise citations to an exact chunk. Think customer support, legal search, enterprise fact lookup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM Wiki&lt;/strong&gt; is great when you have a bounded, curated corpus maybe a few hundred sources on a topic you’re going deep on. Research projects. A book you’re studying. A course you’re taking. Your own journal. Situations where &lt;strong&gt;synthesis matters more than retrieval&lt;/strong&gt; where the valuable answers require connecting five sources, not looking up one.&lt;/p&gt;

&lt;p&gt;There’s a real critique of the LLM Wiki pattern worth taking seriously: because the LLM summarizes and compresses sources into wiki pages, there’s a risk of hallucinations getting baked in as &lt;em&gt;“facts.”&lt;/em&gt; With pure RAG, a wrong answer is just one wrong answer. With an LLM Wiki, a small misunderstanding can quietly propagate across linked pages.&lt;/p&gt;

&lt;p&gt;That’s why Karpathy emphasizes the &lt;strong&gt;lint&lt;/strong&gt; step periodic audits and why any serious implementation should spot-check generated pages against raw sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰Why This Actually Matters
&lt;/h3&gt;

&lt;p&gt;It’s not really about wikis. Karpathy is pointing at something much older a 1945 vision by Vannevar Bush called the &lt;strong&gt;Memex&lt;/strong&gt; : a personal, curated knowledge store where the &lt;em&gt;connections between documents&lt;/em&gt; are as valuable as the documents themselves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbyju0hwounu0jvezf96.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbyju0hwounu0jvezf96.png" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bush’s vision was closer to this than to what the web became: private, actively curated, with associative trails between ideas. The reason the Memex was never really built isn’t technical. It’s that nobody wants to do the &lt;em&gt;bookkeeping&lt;/em&gt; updating cross-references, keeping summaries current, noting when new data contradicts old claims.&lt;/p&gt;

&lt;p&gt;As Karpathy writes in the gist:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The tedious part of maintaining a knowledge base is not the reading or the thinking it’s the bookkeeping. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in one pass.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The tedious part of knowledge is finally solved.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your job shifts from &lt;em&gt;filing&lt;/em&gt; to &lt;em&gt;thinking&lt;/em&gt;. From &lt;em&gt;organizing&lt;/em&gt; to &lt;em&gt;curating&lt;/em&gt;. From &lt;em&gt;searching&lt;/em&gt; to &lt;em&gt;asking better questions&lt;/em&gt;. The LLM handles everything else.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Karpathy’s Tweet:&lt;/strong&gt; &lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;https://x.com/karpathy/status/2039805659525644595&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Karpathy’s original gist:&lt;/strong&gt; &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener noreferrer"&gt;gist.github.com/karpathy/442a6bf555914893e9891c11519de94f&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code:&lt;/strong&gt; &lt;a href="https://claude.com/claude-code" rel="noopener noreferrer"&gt;claude.com/claude-code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Obsidian:&lt;/strong&gt; &lt;a href="https://obsidian.md" rel="noopener noreferrer"&gt;obsidian.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demo source 1 — Sutton’s “The Bitter Lesson”:&lt;/strong&gt; &lt;a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html" rel="noopener noreferrer"&gt;incompleteideas.net/IncIdeas/BitterLesson.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demo source 2 — Karpathy’s “Software 2.0”:&lt;/strong&gt; &lt;a href="https://karpathy.medium.com/software-2-0-a64152b37c35" rel="noopener noreferrer"&gt;karpathy.medium.com/software-2–0-a64152b37c35&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Karpathy’s LLM Wiki Changes Everything:&lt;/strong&gt; &lt;a href="https://youtu.be/04z2M_Nv_Rk" rel="noopener noreferrer"&gt;https://youtu.be/04z2M_Nv_Rk&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>artificialintelligen</category>
      <category>productivity</category>
      <category>rags</category>
      <category>knowledgemanagement</category>
    </item>
    <item>
      <title>Karpathy’s Auto Research and its application beyond ML</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Mon, 13 Apr 2026 04:53:16 +0000</pubDate>
      <link>https://dev.to/urvvil/karpathys-auto-research-and-its-application-beyond-ml-bb</link>
      <guid>https://dev.to/urvvil/karpathys-auto-research-and-its-application-beyond-ml-bb</guid>
      <description>&lt;p&gt;What if you could hand an AI agent a single file, a scoring function, and say &lt;em&gt;“make this better”&lt;/em&gt; then go to sleep? You wake up to a hundred experiments completed, the best ones committed to your git history, and a better result than what you started with.&lt;/p&gt;

&lt;p&gt;Andrej Karpathy open-sourced exactly this way ago and It’s called &lt;strong&gt;Auto Research&lt;/strong&gt; , and once you understand the pattern, you start seeing places to apply it everywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Who Is Andrej Karpathy and Why Does This Matter?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxt17w8ujx0igv9wxpas.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxt17w8ujx0igv9wxpas.png" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you write code for a living, you’ve probably used something Karpathy built without knowing it.&lt;/p&gt;

&lt;p&gt;He was a co-founder of OpenAI. He led Tesla’s Autopilot team the neural networks that power self-driving. He created nanoGPT, minbpe, and llm.c, three of the most influential open-source AI projects in existence. He also coined the term &lt;em&gt;vibe coding&lt;/em&gt;, which, love it or hate it, is now part of every developer's vocabulary.&lt;/p&gt;

&lt;p&gt;So when Karpathy open-sources something, it’s worth paying attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥 &lt;strong&gt;What Is Auto Research?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The story behind Auto Research is simple. Karpathy had a training script for GPT-2 that he’d been hand-optimizing for months. Tweaking hyperparameters. Trying different learning rate schedules. Adjusting batch sizes. At some point he asked himself the obvious question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why am I doing this manually? Why not have an AI agent run these experiments for me?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That question became Auto Research.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnuf70xkncxn0rsghgxb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnuf70xkncxn0rsghgxb.png" width="799" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Auto Research is a closed-loop autonomous optimization system. An AI agent runs experiments in a tight loop: hypothesize, modify , evaluate, keep or revert. Then repeat. Forever, or until you tell it to stop.&lt;/p&gt;

&lt;p&gt;Here’s the loop in plain terms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hypothesize.&lt;/strong&gt; The agent reads the current state of the something we want to modify, looks at previous results, and forms a theory about what to try next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modify.&lt;/strong&gt; It edits exactly one file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate.&lt;/strong&gt; It runs an evaluation script that returns a single score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep or revert.&lt;/strong&gt; If the score improved, git commit. If it got worse, git reset --hard. Clean slate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loop.&lt;/strong&gt; Back to step 1 with the new context.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There’s a detail in here that’s easy to miss but absolutely critical: &lt;strong&gt;fixed time budget per experiment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every experiment gets the same amount of compute. Why? Because otherwise the agent could cheat. If experiment A gets five minutes and experiment B gets fifty, of course B might look better it had ten times the compute. By fixing the time budget, you force the agent to win on the &lt;em&gt;quality of its ideas&lt;/em&gt;, not on brute force.&lt;/p&gt;

&lt;p&gt;And notice what’s acting as the memory: &lt;strong&gt;git&lt;/strong&gt;. Your git log becomes a complete trail of every successful experiment. Every commit message says what the agent tried and what score it achieved. At the end of an overnight run, you can git log --oneline and see the entire optimization journey.&lt;/p&gt;

&lt;p&gt;If you start it before bed, you can wake up to roughly a hundred experiments completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨The Three-File Architecture
&lt;/h3&gt;

&lt;p&gt;Auto Research works because of a constraint system built around three files. Each one has a specific role, and the boundaries between them are what prevent the whole thing from collapsing into chaos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8w1zbf0ojblf0ooo0evg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8w1zbf0ojblf0ooo0evg.png" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  File 1: program.md
&lt;/h4&gt;

&lt;p&gt;This is the file &lt;em&gt;you&lt;/em&gt; write. Think of it as a system prompt for the experiment loop. You define three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The objective.&lt;/strong&gt; What are you optimizing? “Minimize p99 latency.” “Maximize test pass rate.” “Reduce image size.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The constraints.&lt;/strong&gt; What can’t the agent do? “Don’t exceed 512MB of memory.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The protocol.&lt;/strong&gt; How should the agent behave? “Run eval after every change.” “Commit if better, revert if worse.” “Don’t stop to ask questions.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;program.md is the job description. You're hiring an AI employee, and this is their employment contract.&lt;/p&gt;

&lt;h4&gt;
  
  
  File 2: train.py
&lt;/h4&gt;

&lt;p&gt;This is the one and only file the agent can edit. The name comes from Karpathy’s original use case (training GPT-2), but it doesn’t have to be a Python script. It can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A system prompt&lt;/li&gt;
&lt;li&gt;A SQL query&lt;/li&gt;
&lt;li&gt;A Dockerfile&lt;/li&gt;
&lt;li&gt;A CSS file&lt;/li&gt;
&lt;li&gt;A config file&lt;/li&gt;
&lt;li&gt;Literally anything you want to optimize&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The single-file constraint is deliberate. By giving the agent one degree of freedom, you prevent it from making sprawling changes you can’t review. The agent has a focused surface area; you have a reviewable diff.&lt;/p&gt;

&lt;h4&gt;
  
  
  File 3: prepare.py
&lt;/h4&gt;

&lt;p&gt;This is the most important file in the entire system, and the agent &lt;strong&gt;absolutely cannot touch it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;prepare.py defines what &lt;em&gt;better&lt;/em&gt; means. It runs the evaluation, computes the metric, and outputs a single scalar number. The agent reads this score and decides whether to commit or revert.&lt;/p&gt;

&lt;p&gt;Why is it locked? Because if the agent could edit the evaluation, it could just rewrite the scoring function to always return a perfect score. Game over. Optimization meaningless.&lt;/p&gt;

&lt;p&gt;There’s a subtle but important corollary here: &lt;strong&gt;if you set the wrong metric, the agent will confidently optimize the wrong thing.&lt;/strong&gt; It will improve the number you gave it, even if that number doesn’t measure what you actually care about. Choosing the right metric is your job. The agent handles execution. You handle direction.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚨The Misconception That’s Costing People
&lt;/h3&gt;

&lt;p&gt;Most people who hear about Auto Research think it’s a machine learning thing. They see Karpathy’s GPT-2 example and assume the pattern only applies to training models.&lt;/p&gt;

&lt;p&gt;This is wrong, and it’s the most expensive misconception you can have about this technology.&lt;/p&gt;

&lt;p&gt;Auto Research is a &lt;em&gt;pattern&lt;/em&gt;, not a tool. The pattern works anywhere you have three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One scalar metric.&lt;/strong&gt; A single number that tells you if things got better or worse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated evaluation.&lt;/strong&gt; No human in the loop. If you need a person to look at the result and judge, it’s too slow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One mutable file.&lt;/strong&gt; A focused surface area for the agent to work with.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If all three conditions are met, you can Auto Research it. Here’s what that opens up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt engineering.&lt;/strong&gt; Your file is system_prompt.txt. Your metric is accuracy on a labeled test set. The agent tries different phrasings, few-shot examples, chain-of-thought instructions, even different languages. Each experiment runs the prompt against your test data and reports accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API performance.&lt;/strong&gt; Your file is the handler code. Your metric is p99 latency under load. The agent experiments with caching, connection pooling, query batching, async patterns. The eval script fires a thousand requests and measures the 99th percentile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dockerfile optimization.&lt;/strong&gt; Your file is the Dockerfile. Your metric is build_time × image_size. The agent tries multi-stage builds, different base images, layer reordering. The eval runs docker build and measures both numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL query tuning.&lt;/strong&gt; Your file is query.sql. Your metric is execution time on a fixed dataset. The agent tries index hints, join strategies, CTEs vs subqueries. The eval just runs the query and reports wall clock time.&lt;/p&gt;

&lt;p&gt;The pattern doesn’t change. The loop doesn’t change. The three files don’t change. Only the contents change.&lt;/p&gt;

&lt;p&gt;The rule is simple: &lt;strong&gt;if you can score it, you can Auto Research it. If you can’t score it, don’t try.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍Why This Is Bigger Than It Looks
&lt;/h3&gt;

&lt;p&gt;I want to zoom out for a second, because the implications of Auto Research go beyond “neat trick for optimizing code.”&lt;/p&gt;

&lt;p&gt;Karpathy has talked about his end vision for this. Back in the early 2000s, there was a project called SETI@home you could donate spare computing power on your home PC to the search for extraterrestrial life. Karpathy wants to build the same thing, but for AI research. Millions of agents, distributed across thousands of computers, all running Auto Research loops on different problems.&lt;/p&gt;

&lt;p&gt;Think about what that means. Right now, AI labs spend tens of millions of dollars on researchers whose job is essentially to be the experiment loop propose changes, run training runs, look at results, decide what to try next. That work is now scriptable.&lt;/p&gt;

&lt;p&gt;Karpathy’s prediction is that every frontier AI lab will eventually adopt some form of Auto Research internally. And he made the basic version open source.&lt;/p&gt;

&lt;p&gt;The person who can set up the loop correctly will out-produce a team doing it manually.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰What I Think You Should Do With This
&lt;/h3&gt;

&lt;p&gt;If you’ve read this far and you’re a developer, here’s my honest take.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it once.&lt;/strong&gt; Clone Karpathy’s repo. Pick the smallest possible problem in your codebase that has a clear metric maybe a slow function, a bloated Dockerfile, a config file with too many knobs. Set up the three files. Let an AI agent loop on it for an hour. You’ll learn more from one run than from reading ten articles like this one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Then start noticing.&lt;/strong&gt; Once the pattern is in your head, you’ll start seeing Auto Research opportunities everywhere. That nightly batch job that takes 40 minutes? The system prompt you’ve been hand-tuning for weeks? The query that’s too slow but you don’t have time to optimize? All of these are candidates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get good at picking metrics.&lt;/strong&gt; This is the hard part and the only part the agent can’t do for you. A bad metric gives you confident, automated, beautifully-committed garbage. A good metric gives you measurable progress toward something that actually matters. Spend more time on this than on anything else.&lt;/p&gt;

&lt;p&gt;The closing thought I keep coming back to is something Karpathy himself said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Any metric you care about that is reasonably efficient to evaluate can be auto researched.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The loop is open source. The pattern is freely available. The only thing standing between you and using it is picking your first metric.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/karpathy/autoresearch" rel="noopener noreferrer"&gt;Karpathy Autoresearch Github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/xj2nn-RcOcY?si=qtLZB59yzh0LBvrM" rel="noopener noreferrer"&gt;Karpathy’s AutoResearch and its applications beyond ML&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>autoresearch</category>
      <category>autonomousai</category>
      <category>andrejkarpathy</category>
    </item>
    <item>
      <title>7 Git Concepts That Will Boost Your Productivity Exponentially</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Sat, 28 Mar 2026 23:44:37 +0000</pubDate>
      <link>https://dev.to/urvvil/7-git-concepts-that-will-boost-your-productivity-exponentially-5gi</link>
      <guid>https://dev.to/urvvil/7-git-concepts-that-will-boost-your-productivity-exponentially-5gi</guid>
      <description>&lt;p&gt;If you’ve been using Git for a while and have the basics down there’s more power remaining to be seen. Git is packed with features that most developers never touch, and learning just a handful of them can improve your workflow.&lt;/p&gt;

&lt;p&gt;Here are seven concepts that, once learned, you’ll find yourself reaching for regularly.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Git Worktree : Work on Multiple Branches Without the Stash
&lt;/h3&gt;

&lt;p&gt;We’ve all been there. You’re deep in a feature, changes are halfway done, and suddenly you need to review someone’s code or jump on a hotfix. You will use git stash, switch branches, do your thing, switch back, git stash pop, and pray you remember which stash was which.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft165ze7ryo8hpb3pdn24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft165ze7ryo8hpb3pdn24.png" width="768" height="595"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git stash
# ...do other work...
git stash pop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works, but it has real downsides. If your codebase is large and takes 20–30 minutes to compile, switching branches means recompiling &lt;em&gt;twice&lt;/em&gt;. And if you’re a serial stasher, you’ll inevitably lose track of what’s where.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Git worktree&lt;/strong&gt; solves this elegantly. It lets you check out multiple branches into separate directories simultaneously, all backed by the same repository.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2keg1fd2s3mnyr70tyc5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2keg1fd2s3mnyr70tyc5.png" width="800" height="466"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git worktree add -b code-review ../code-review release-branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a new directory ../code-review with the code-review branch checked out from release-branch. Your original working directory is untouched no stashing, no recompilation.&lt;/p&gt;

&lt;p&gt;A few things to keep in mind. Always create your worktree in a &lt;em&gt;sibling&lt;/em&gt; directory, not inside your main repo. Nesting worktrees causes duplicate file issues. The main worktree contains the .git folder with all repository metadata, while secondary worktrees have a .git &lt;em&gt;file&lt;/em&gt; that points back to the main one.&lt;/p&gt;

&lt;p&gt;You can list all active worktrees with git worktree list and clean them up when you're done:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git worktree remove ../code-review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Personally, I keep worktrees open for release branches and branches that need regular code review. It saves a surprising amount of context-switching time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Git Squash : Keep Your History Clean
&lt;/h3&gt;

&lt;p&gt;Over the course of a feature or hotfix, it’s easy to accumulate a trail of commits like “fix typo,” “change color,” “update docs,” and “refactor again.” These might be meaningful while you’re working, but they clutter the project history for everyone else.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jep8bqir7s3dukf2fii.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jep8bqir7s3dukf2fii.png" width="456" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Git squash lets you collapse multiple commits into one, giving you a clean, readable history. There are two common approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interactive rebase:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git rebase -i HEAD~3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens an editor showing your last three commits. You mark the ones you want to fold in with squash (or s), keep the top one as pick, and Git combines them. You'll get a chance to write a new commit message.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pick abc1234 Code refactor
squash def5678 Code refactor
squash ghi9012 Code refactor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Squash merge:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When merging a feature branch into your release branch, you can use the --squash flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git merge --squash feature-branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pulls in all the changes but doesn’t create a commit automatically. You then commit once with a clean message that summarizes the entire body of work. The result is a single, meaningful entry in your branch’s history instead of a dozen incremental ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Git Aliases : Stop Typing the Same Long Commands
&lt;/h3&gt;

&lt;p&gt;If you’re anything like most Git users, you probably type certain commands dozens of times a day. Commands like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git log --oneline --graph --decorate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gets old fast. Git aliases let you create shortcuts for frequently used commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config --global alias.logs "log --oneline --graph --decorate"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now git logs runs the full command. You can alias complex or long frequently used commands . If you're maintaining an open-source project or spending a lot of time in the terminal, aliases add up to real time savings over the course of a day.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Git Bisect : Find the Exact Commit That Broke Things
&lt;/h3&gt;

&lt;p&gt;Regression bugs are frustrating. Something that worked last week is now broken, and somewhere in the last 100 commits, something went wrong. You &lt;em&gt;could&lt;/em&gt; manually check out commits one by one, but Git has a smarter tool built in.&lt;/p&gt;

&lt;p&gt;git bisect performs a binary search through your commit history to find the exact commit that introduced the bug.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nv8303qczzny1zd3i24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nv8303qczzny1zd3i24.png" width="472" height="509"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git bisect start
git bisect bad # current HEAD is broken
git bisect good abc1234 # this older commit was working
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Git checks out a commit halfway between the two. You run your tests, then tell Git whether that commit is good or bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git bisect good # or: git bisect bad
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It narrows the range and checks out the next candidate. For 100 commits, you’ll find the culprit in roughly 7 steps instead of 100.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;A practical tip *&lt;/em&gt; : If you’re using a test file to verify the bug, add it to .gitignore so it persists across checkouts. Otherwise, Git will remove it each time it switches to a different commit.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Git Cherry-Pick : Selectively Apply Commits Across Branches
&lt;/h3&gt;

&lt;p&gt;Sometimes you need to move a specific commit from one branch to another without merging the entire branch. Maybe you fixed a bug on your feature branch that the release branch also needs, but the feature isn’t ready to merge yet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mlx6o9j1g3kz25yoab6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mlx6o9j1g3kz25yoab6.png" width="293" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;git cherry-pick does exactly this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git checkout release-branch
git cherry-pick &amp;lt;commit-hash&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This applies the changes from that single commit onto your current branch as a new commit.&lt;/p&gt;

&lt;p&gt;Common use cases include applying a bugfix to a release branch without merging unfinished feature work, backporting fixes to a main or master branch that’s publicly visible, and recovering lost commits that you found via git reflog. It's far cleaner than copy-pasting code changes and committing them separately, which is something you see more often than you'd expect.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Git Reflog : Your Safety Net for Lost Commits
&lt;/h3&gt;

&lt;p&gt;git log shows you commit history. git reflog shows you &lt;em&gt;everything like&lt;/em&gt; every branch switch, every rebase, every cherry-pick, every checkout. It's a full activity log of what your HEAD has pointed to.&lt;/p&gt;

&lt;p&gt;This becomes invaluable when something goes wrong. Say you accidentally delete a local branch that was never pushed to remote:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch -D feature-branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The commits aren’t actually gone yet. Git only deleted the branch &lt;em&gt;label&lt;/em&gt;; the commits persist until garbage collection runs, which can be weeks or months later.&lt;/p&gt;

&lt;p&gt;To recover:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git reflog
# Find the commit hash from before the deletion
git checkout -b feature-branch &amp;lt;commit-hash&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your branch and all its commits are back. This works for detached HEAD situations, careless rebases, and any number of “I’ve made a terrible mistake” moments. Think of git reflog as Git's undo history&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Git Hooks : Automate Quality Checks Before You Commit
&lt;/h3&gt;

&lt;p&gt;Git hooks are scripts that run automatically at specific points in your Git workflow like before a commit, before a push, after a merge, and more. They’re a powerful way to enforce quality standards without relying on memory or discipline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftq13fz37kbkbpo18es8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftq13fz37kbkbpo18es8x.png" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most commonly useful hook is the &lt;strong&gt;pre-commit hook&lt;/strong&gt;. Here’s how to set one up from scratch.&lt;/p&gt;

&lt;p&gt;Inside any Git repository, navigate to .git/hooks/. You'll find sample files there. To create a pre-commit hook, rename pre-commit.sample to pre-commit (removing the .sample extension) and write your script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/bin/bash
# Pre-commit hook: compile Java files before allowing commit

echo "Running pre-commit checks..."
javac *.java
if [$? -ne 0]; then
    echo "Compilation failed. Commit aborted."
    exit 1
fi
echo "All checks passed."
exit 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, every time you run git commit, this script executes first. If the compilation fails, the commit is blocked.&lt;/p&gt;

&lt;p&gt;You can extend this pattern to run unit tests, static analysis tools like SonarQube, linting, or any validation your project requires. If the hook exits with a non-zero status, the commit is rejected. It’s a simple mechanism that prevents “I forgot to do this before pushing” problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✍️Conclusion
&lt;/h3&gt;

&lt;p&gt;These seven concepts sit just beyond the basics, but each one addresses a real world problem that you as a developers face regularly. Worktrees eliminate context-switching overhead. Squash keeps your history readable. Aliases save keystrokes. Bisect turns debugging from a guessing game into a logarithmic search. Cherry-pick gives you precision. Reflog acts as your safety net. And hooks automate through lifecycle events.&lt;/p&gt;

&lt;p&gt;Pick one that solves a problem you’re currently facing, try it out. You’ll be surprised how quickly these become part of your daily workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=d_xZgcRJ--Q" rel="noopener noreferrer"&gt;10X Your Git Workflow: 7 Pro Tips [Git Productivity 2025]&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>gitproductivity</category>
      <category>git</category>
      <category>gittipandtricks</category>
      <category>gitpractices</category>
    </item>
    <item>
      <title>Canonical Log Lines Stripe Brilliant Technique for Production Observability</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Tue, 17 Mar 2026 13:38:47 +0000</pubDate>
      <link>https://dev.to/urvvil/canonical-log-lines-stripe-brilliant-technique-for-production-observability-di8</link>
      <guid>https://dev.to/urvvil/canonical-log-lines-stripe-brilliant-technique-for-production-observability-di8</guid>
      <description>&lt;p&gt;Stripe published a technique they call canonical log lines one fat, structured log line emitted at the end of every request that contains all the important telemetry in one place. Sounds too simple right but it fundamentally changes how you query, debug, and understand production systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚨The Problem With Traditional Logging
&lt;/h3&gt;

&lt;p&gt;Let’s say your payment API receives a single request. Internally, it will hit every layer authentication, rate limiting, database queries, and response generation. Traditional logging looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2024-03-18 22:48:32.990] Request started http_method=POST http_path=/v1/charges request_id=req_123
[2024-03-18 22:48:32.991] User authenticated auth_type=api_key key_id=mk_123 user_id=usr_123
[2024-03-18 22:48:32.992] Rate limiting ran rate_allowed=true rate_quota=100 rate_remaining=99
[2024-03-18 22:48:32.998] Charge created charge_id=ch_123 team=acquiring
[2024-03-18 22:48:32.999] Request finished duration=0.009 http_status=200 database_queries=34
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks perfectly fine. And for simple questions, it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Was anything rate limited recently?
"Rate limiting ran" rate_allowed=false

# Duration stats over the last hour
"Request finished" earliest=-1h | stats count p50(duration) p95(duration) p99(duration)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But now imagine for a incident and someone asks: “Which users are being rate limited the most?”&lt;/p&gt;

&lt;p&gt;Suddenly you have a problem. The &lt;em&gt;user_id&lt;/em&gt; field is in the &lt;em&gt;authentication&lt;/em&gt; line. The &lt;em&gt;rate_allowed&lt;/em&gt; field is in the &lt;em&gt;rate limiting&lt;/em&gt; line. They're completely separate. To link them, you need to join across lines using &lt;em&gt;request_id&lt;/em&gt; as a common bridge field.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjetcsr55nqcbbw47iy8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjetcsr55nqcbbw47iy8.png" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;PROBLEM *&lt;/em&gt; : Each log line only knows what its own module knows. The rate limiter doesn’t know the user ID. The auth module does not know if rate limiting passed. The information you need is almost always spread across multiple lines of log which you is too much processing when we consider millions or billions of requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨The Solution: One Canonical Line With All Importent Data
&lt;/h3&gt;

&lt;p&gt;The canonical log line is Stripe’s answer: at the end of every request, create a &lt;em&gt;single&lt;/em&gt; structured fat log line that contains all the key data together. Keep in mind not instead of regular logs but &lt;em&gt;in addition to&lt;/em&gt; them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2024-03-18 22:48:32.999] canonical-log-line
  alloc_count=9123
  auth_type=api_key
  database_queries=34
  duration=0.009
  http_method=POST
  http_path=/v1/charges
  http_status=200
  key_id=mk_123
  permissions_used=account_write
  rate_allowed=true
  rate_quota=100
  rate_remaining=99
  request_id=req_123
  user_id=usr_123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every key piece of information about this request like auth, rate limiting, HTTP details, database stats lives in a single readable line. Now the query for “who is being rate limited most?” becomes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hsra5vgymptwdllrq76.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hsra5vgymptwdllrq76.png" width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Canonical lines are an ergonomic feature for engineers. By collecting everything that is important and accessible through queries that are easy to write, makes production incident easy to debug and analyse_._&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqc6kzdb8yjosz4vjqvx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqc6kzdb8yjosz4vjqvx.png" width="799" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍More Real Query Examples
&lt;/h3&gt;

&lt;p&gt;The power of canonical log lines is not just rate limiting. Because every key field is together, you can ask almost any operational question in a single line of query syntax.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 1 :Performance by Endpoint
&lt;/h4&gt;

&lt;p&gt;During an incident, you want to see latency percentiles for the /v1/charges endpoint for a specific user, while filtering out client errors :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgxolezm6ouobhebewqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgxolezm6ouobhebewqr.png" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 2 — Detect a Bug vs. Legitimate Rate Limiting
&lt;/h4&gt;

&lt;p&gt;Is rate limiting hitting few users or is it a bug affecting everyone?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jlb0hdvz256bqxu7la5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jlb0hdvz256bqxu7la5.png" width="800" height="196"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 3 — TLS Version Adoption (Real Stripe Use Case)
&lt;/h4&gt;

&lt;p&gt;Stripe needed to migrate users from TLS 1.0/1.1 to TLS 1.2. They could query this instantly with canonical lines:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jejjt8c69x7b4smsm8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jejjt8c69x7b4smsm8x.png" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥Architecture Used By Stripe
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftstxb3km1tq89nc6nhu4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftstxb3km1tq89nc6nhu4.png" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The beauty of canonical log lines is that the implementation is simple.Idea is to do it in middleware, which makes it one completely automatic and second we have have strict control over the it.&lt;/p&gt;

&lt;p&gt;During a request’s lifecycle, each module decorates a shared environment object with relevant fields. The canonical logger sits at the very end of the middleware chain and drains all those fields into one log line.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndbw9zt6h6ycb0dz2b15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndbw9zt6h6ycb0dz2b15.png" width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stripe wraps the canonical line emission in a Ruby ensure block which means it runs even if an exception was thrown mid-request. This guarantees you always have observability, especially during the incidents when you need it most.&lt;/p&gt;

&lt;p&gt;Stripe’s approach for canonical line storage:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Kafka&lt;/strong&gt; serialize canonical lines as Protocol Buffers and push asynchronously to a Kafka topic. This keeps the request path fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Splunk&lt;/strong&gt; will be ingested almost in real-time. Perfect when you need answers in seconds. Great for the last hour of data.&lt;/li&gt;
&lt;li&gt;A consumer reads from Kafka, batches the data, and writes it to &lt;strong&gt;S3&lt;/strong&gt;. Periodic jobs ingest it into &lt;strong&gt;Redshift&lt;/strong&gt; for SQL-based long-term analytics over months of history&lt;/li&gt;
&lt;li&gt;Stripe’s &lt;strong&gt;Developer Dashboard&lt;/strong&gt; is powered by &lt;strong&gt;MapReduce&lt;/strong&gt; jobs over these S3 archives.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Long-Term Analytics in SQL (Redshift)
&lt;/h4&gt;

&lt;p&gt;Stripe archives canonical lines to Redshift. This lets them run months of historical queries in standard SQL&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj0ny0g7g6z57p7q7muq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj0ny0g7g6z57p7q7muq.png" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰Canonical vs. Other Observability Tools
&lt;/h3&gt;

&lt;p&gt;Canonical log lines do not replace metrics or distributed tracing. They occupy a unique place in observability:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpntg653j8hq8lca2wl5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpntg653j8hq8lca2wl5.png" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ideally we should use all four, with canonical log lines as the first view in any debugging process because they are the fast to query and the cheap to aggregate.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✍️Conclusion
&lt;/h3&gt;

&lt;p&gt;Canonical log lines are not a new technology. They are a simple convention that makes your existing logging infrastructure dramatically more powerful:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faza6ynqgb0puesqvwjzr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faza6ynqgb0puesqvwjzr.png" width="799" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The beauty of this pattern is that it scales from a small start-up to Stripe’s global payment infrastructure. The implementation is simple yet powerful.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://stripe.com/blog/canonical-log-lines" rel="noopener noreferrer"&gt;Blog post by Brandur Leach&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=9XJtEzmzG3g" rel="noopener noreferrer"&gt;Stripe’s Smarter Approach to Structured Logging By Arpit Bhayani&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>backend</category>
      <category>devops</category>
      <category>monitoring</category>
      <category>sre</category>
    </item>
    <item>
      <title>Docker Model Runner: Run AI Models Locally Within Your Docker Ecosystem</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Tue, 07 Oct 2025 13:10:57 +0000</pubDate>
      <link>https://dev.to/urvvil/docker-model-runner-run-ai-models-locally-within-your-docker-ecosystem-n9h</link>
      <guid>https://dev.to/urvvil/docker-model-runner-run-ai-models-locally-within-your-docker-ecosystem-n9h</guid>
      <description>&lt;h1&gt;
  
  
  Docker Model Runner (DMR)
&lt;/h1&gt;

&lt;p&gt;Docker Model Runner (DMR) officially reached General Availability on September 18th, transitioning from its beta phase that began in April. This powerful tool enables developers to pull, run, and manage AI models locally within the Docker ecosystem, bringing the convenience of containerization to machine learning workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Docker Model Runner?
&lt;/h2&gt;

&lt;p&gt;Docker Model Runner allows you to run Large Language Models (LLMs) directly on your local machine while leveraging Docker's robust ecosystem. It combines the best features of local AI inference with Docker's familiar tooling and workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Local LLM Execution
&lt;/h3&gt;

&lt;p&gt;Running LLM models locally provides several critical advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced Data Security&lt;/strong&gt;: Your data remains entirely on your local machine, never leaving your control or being sent to external services. This is particularly important for sensitive or proprietary information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accelerated Development Workflows&lt;/strong&gt;: Developers can iterate faster by running AI models alongside their applications without network latency or API rate limits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Seamless Integration&lt;/strong&gt;: If you're already using Docker Compose for your development environment, you can easily add AI models to your stack. When you spin up your containers, your LLM will launch simultaneously, creating a fully integrated local development environment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foi9rs1d97pfmcu8kdg5j.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foi9rs1d97pfmcu8kdg5j.webp" alt=" " width="539" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. OpenAI-Compatible APIs
&lt;/h3&gt;

&lt;p&gt;Docker Model Runner provides OpenAI-compatible API endpoints, making integration straightforward. Many applications already use OpenAI's API format, which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No code changes required in existing applications&lt;/li&gt;
&lt;li&gt;Client applications can switch seamlessly between cloud and local models&lt;/li&gt;
&lt;li&gt;Response formats remain consistent with OpenAI standards&lt;/li&gt;
&lt;li&gt;Your existing parsing logic continues to work without modification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgw9mo6c9btz02bnfdbid.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgw9mo6c9btz02bnfdbid.webp" alt=" " width="720" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Integrated Inference Engine
&lt;/h3&gt;

&lt;p&gt;The architecture is designed for optimal performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Models run on your host machine rather than inside Docker containers, maximizing performance&lt;/li&gt;
&lt;li&gt;Utilizes Llama.cpp inference server for efficient model execution&lt;/li&gt;
&lt;li&gt;Automatic NVIDIA GPU support when available&lt;/li&gt;
&lt;li&gt;Combines Docker's ecosystem management capabilities with Ollama-like performance&lt;/li&gt;
&lt;li&gt;Provides Docker commands for pulling, caching, managing, and running models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7bxg9700jq5ujzoic00.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7bxg9700jq5ujzoic00.webp" alt=" " width="657" height="655"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. OCI Artifact Distribution
&lt;/h3&gt;

&lt;p&gt;Models are packaged and distributed as Open Container Initiative (OCI) artifacts, the same standardized format used for Docker images. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Models can be pushed to any OCI-compatible registry&lt;/li&gt;
&lt;li&gt;Standardized packaging ensures consistency and portability&lt;/li&gt;
&lt;li&gt;Most models are distributed in GGUF (GPT-Generated Unified Format)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GGUF uses quantization to reduce model size, enabling AI models to run on standard hardware, including CPU-only systems. This format is ideal for local deployments where computational resources may be limited.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9cczd28zk6kathd6f972.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9cczd28zk6kathd6f972.webp" alt=" " width="720" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multiple Interaction Methods
&lt;/h3&gt;

&lt;p&gt;Docker Model Runner offers flexibility in how you interact with models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Command-line interface for terminal-based interactions&lt;/li&gt;
&lt;li&gt;Docker Desktop GUI for visual model management&lt;/li&gt;
&lt;li&gt;OpenAI-compatible REST APIs for programmatic access&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Parallel Multi-Model Support
&lt;/h3&gt;

&lt;p&gt;Need to run multiple models simultaneously? Docker Model Runner handles this effortlessly. For example, if you're building an AI agent that performs text summarization and image generation, you can run both models in parallel without complex configuration. Models can be accessed through the GUI, CLI, and API endpoints concurrently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe5c4wrtxqe4juig8jxgw.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe5c4wrtxqe4juig8jxgw.webp" alt=" " width="720" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Docker Desktop version 4.41.0 or higher&lt;/li&gt;
&lt;li&gt;(Optional) NVIDIA GPU for accelerated inference&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Open Docker Desktop settings and enable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU-backed inference&lt;/strong&gt;: Allows automatic NVIDIA GPU detection and utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Host TCP support&lt;/strong&gt;: Enables OpenAI-compatible API access via HTTP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CORS settings&lt;/strong&gt;: Set to "all" if you encounter API access issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8m31qo6dakr2l6gglsp.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8m31qo6dakr2l6gglsp.webp" alt=" " width="720" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding and Pulling Models
&lt;/h3&gt;

&lt;h4&gt;
  
  
  From Docker Hub:
&lt;/h4&gt;

&lt;p&gt;Navigate to the AI section in Docker Hub and search for models using the &lt;code&gt;ai/&lt;/code&gt; prefix. Popular models like Llama 3.2, Mistral, and Phi-3 are readily available. Each model listing shows different quantization versions, allowing you to balance performance and resource requirements.&lt;/p&gt;

&lt;p&gt;To pull a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model pull ai/llama3.2:1b-instruct-q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  From Hugging Face:
&lt;/h4&gt;

&lt;p&gt;Browse to your desired model on Hugging Face, select "Use this model," and choose "Docker Model Runner" as the deployment method. The interface will display the appropriate pull command with your selected quantization level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Commands
&lt;/h3&gt;

&lt;p&gt;Check Docker Model Runner status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;List downloaded models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This displays model metadata including name, parameters, quantization level, and architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interacting with Models
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Command-Line Interface
&lt;/h3&gt;

&lt;p&gt;Single query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model run ai/llama3.2:1b-instruct-q4_K_M &lt;span class="s2"&gt;"What is Docker?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Interactive session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model run ai/llama3.2:1b-instruct-q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a chat interface where you can have multi-turn conversations. Docker Model Runner maintains context across multiple exchanges. Exit by typing &lt;code&gt;/bye&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Desktop GUI
&lt;/h3&gt;

&lt;p&gt;In the Models tab, navigate to the Local section and click "Run" next to your desired model. This launches an interactive interface where you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat with the model through a text input field&lt;/li&gt;
&lt;li&gt;View the Inspect tab for model metadata and architecture details&lt;/li&gt;
&lt;li&gt;Check the Requests tab to see your conversation history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The GUI maintains multi-turn conversation context, allowing natural, contextual interactions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflmz8iret90duu0dl2w6.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflmz8iret90duu0dl2w6.webp" alt=" " width="720" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI-Compatible API
&lt;/h3&gt;

&lt;p&gt;With host TCP support enabled in Docker Desktop settings, you can access models via REST API on the configured port (default varies based on your settings):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:PORT/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "ai/llama3.2:1b-instruct-q4_K_M",
    "messages": [{"role": "user", "content": "What is Docker?"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response format matches OpenAI's API specification, ensuring compatibility with existing tooling and parsers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Model Runner vs. Ollama
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2k18o6qwk962bkvzyvg0.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2k18o6qwk962bkvzyvg0.webp" alt=" " width="720" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both tools enable local AI model execution, but they have distinct characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Docker Model Runner runs models on the host machine rather than in containers, typically achieving approximately 12% better performance than containerized approaches. Ollama also runs on the host, either as a standalone binary or managed service, providing similar performance benefits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integration&lt;/strong&gt;: Docker Model Runner provides seamless integration with Docker Desktop and Docker Compose, making it ideal if you're already using Docker for development. Models can be defined in your compose files and started automatically with your application stack. Ollama operates as a standalone application with its own CLI and basic API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API Endpoints&lt;/strong&gt;: Both offer OpenAI-compatible endpoints, but they use different default ports. You can configure these as needed for your environment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tips and Resources
&lt;/h2&gt;

&lt;p&gt;The official Docker Model Runner documentation provides comprehensive guidance for various platforms including WSL 2, Linux, and macOS. The "Known Issues" section addresses common problems and their solutions.&lt;/p&gt;

&lt;p&gt;For those interested in the technical details, the Docker team has published an in-depth blog post covering the design philosophy, goals, GPU acceleration strategies, and high-level architecture. This resource is invaluable for understanding the engineering decisions behind Docker Model Runner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Docker Model Runner represents a natural evolution for developers already invested in the Docker ecosystem. By bringing local AI model execution to Docker Desktop, it eliminates the need for separate tools while providing familiar commands and workflows.&lt;/p&gt;

&lt;p&gt;The combination of data privacy, development speed, and seamless integration makes Docker Model Runner particularly attractive for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development teams building AI-powered applications&lt;/li&gt;
&lt;li&gt;Organizations with data sensitivity requirements&lt;/li&gt;
&lt;li&gt;Developers seeking faster iteration cycles&lt;/li&gt;
&lt;li&gt;Teams already standardized on Docker tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're currently using Docker for development but haven't explored local AI model execution, Docker Model Runner offers a compelling entry point. Its integration with existing Docker workflows means minimal learning curve while unlocking powerful AI capabilities directly in your development environment.&lt;/p&gt;

&lt;p&gt;Whether you're building chatbots, implementing RAG systems, or experimenting with AI agents, Docker Model Runner provides the infrastructure to do so efficiently and securely on your local machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://youtu.be/CV5uBoA78qI" rel="noopener noreferrer"&gt;Docker Model Runner Tutorial 2025: Run AI Models Locally in Minutes | Complete Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>devops</category>
    </item>
    <item>
      <title>10X Your Git Workflow: 7 Pro Tips to Boost Productivity 🚀</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Wed, 17 Sep 2025 08:52:10 +0000</pubDate>
      <link>https://dev.to/urvvil/10x-your-git-workflow-7-pro-tips-to-boost-productivity-28nj</link>
      <guid>https://dev.to/urvvil/10x-your-git-workflow-7-pro-tips-to-boost-productivity-28nj</guid>
      <description>&lt;p&gt;Hey DEV community! Tired of Git stashes or messy commits? My new YouTube video, 10X Your Git Workflow: 7 Pro Tips (Worktree, Hooks &amp;amp; More), shares advanced hacks to save time and streamline version control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Highlights&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Swap git stash for git worktree to juggle branches smoothly.&lt;/li&gt;
&lt;li&gt;Clean commits with interactive rebase for polished PRs.&lt;/li&gt;
&lt;li&gt;Automate checks with Git hooks to catch errors early.&lt;/li&gt;
&lt;li&gt;Recover lost commits with git reflog—your safety net!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Perfect for devs using GitHub or GitLab. Watch now: [&lt;a href="https://youtu.be/d_xZgcRJ--Q" rel="noopener noreferrer"&gt;https://youtu.be/d_xZgcRJ--Q&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;What’s your top Git trick or worst Git headache? Share below! 😄&lt;/p&gt;

&lt;h1&gt;
  
  
  git #versioncontrol #developerproductivity #programming #coding
&lt;/h1&gt;

</description>
      <category>git</category>
      <category>versioncontrol</category>
      <category>developerproductivity</category>
      <category>gittips</category>
    </item>
    <item>
      <title>Hey DEV community! Tired of Git chaos? My new YouTube video, 10X Your Git Workflow: 7 Pro Tips, shares advanced hacks to save time https://youtu.be/d_xZgcRJ--Q #git #versioncontrol #developerproductivity #programming #coding</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Wed, 17 Sep 2025 08:44:00 +0000</pubDate>
      <link>https://dev.to/urvvil/hey-dev-community-tired-of-git-chaos-my-new-youtube-video-10x-your-git-workflow-7-pro-tips-2n5h</link>
      <guid>https://dev.to/urvvil/hey-dev-community-tired-of-git-chaos-my-new-youtube-video-10x-your-git-workflow-7-pro-tips-2n5h</guid>
      <description>&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://youtu.be/d_xZgcRJ--Q" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;youtu.be&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Understanding False Sharing and How to Mitigate It in Java</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Sat, 08 Feb 2025 17:06:53 +0000</pubDate>
      <link>https://dev.to/urvvil/understanding-false-sharing-and-how-to-mitigate-it-in-java-2oh1</link>
      <guid>https://dev.to/urvvil/understanding-false-sharing-and-how-to-mitigate-it-in-java-2oh1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz61qfp407cdrh5xoss22.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz61qfp407cdrh5xoss22.jpeg" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Keeping in mind the world of multi-threaded programming, optimizing performance is often a never-ending task. One of the bottlenecks that is often neglected by developers is &lt;strong&gt;false sharing&lt;/strong&gt;. This article deep dives to understand what false sharing is, how it impacts performance, and other ways to mitigate it using practical examples in Java.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥What is False Sharing?
&lt;/h3&gt;

&lt;p&gt;False sharing occurs when multiple threads modify variables that reside on the same cache line. A &lt;strong&gt;cache line&lt;/strong&gt; is the smallest unit of data that can be transferred between the main memory (RAM) and the CPU cache. Modern CPUs cache data in chunks (typically 64 bytes), and when one thread updates a variable in a cache line, it invalidates the entire cache line for other threads. This forces other threads to reload the cache line from memory, even if they are accessing different variables within the same cache line.&lt;/p&gt;

&lt;p&gt;The result? Unnecessary cache invalidations and reloads, leading to significant performance degradation, especially in high-concurrency scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚨The Problem: False Sharing in Action
&lt;/h3&gt;

&lt;p&gt;Let’s start by looking at a simple example that demonstrates false sharing. Consider the following Java code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class FalseSharingProblem {

    public static void main(String[] args) {
        FalseSharingCounter falseSharingCounter1 = new FalseSharingCounter();
        FalseSharingCounter falseSharingCounter2 = falseSharingCounter1;

        Runnable r1 = () -&amp;gt; {
        int iterations = 1_000_000_000;
        long start = System.currentTimeMillis();
        for (int i = 0; i &amp;lt; iterations; i++) {
            falseSharingCounter1.count1++;
        }
            System.out.println("Time taken "+(System.currentTimeMillis()-start)+" ms");
        };

        Runnable r2 = () -&amp;gt; {
            int iterations = 1_000_000_000;
            long start = System.currentTimeMillis();
            for (int i = 0; i &amp;lt; iterations; i++) {
                falseSharingCounter2.count2++;
            }
            System.out.println("Time taken "+(System.currentTimeMillis()-start)+" ms");
        };

        Thread.ofPlatform().name("Thread1").start(r1);
        Thread.ofPlatform().name("Thread1").start(r2);
    }
}

public class FalseSharingCounter {

    public volatile int count1 = 0;
    public volatile int count2 = 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, two threads (Thread1 and Thread2) are incrementing two different counters (count1 and count2) that reside in the same FalseSharingCounter object. Since count1 and count2 are likely to be on the same cache line, updating one counter will invalidate the cache line for the other thread, causing false sharing.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍The Impact
&lt;/h3&gt;

&lt;p&gt;When you run this code, you’ll notice that the time taken to complete the increments is significantly higher than expected. This is due to the constant cache line invalidations caused by false sharing.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨The Artificial Solution: Separate Objects
&lt;/h3&gt;

&lt;p&gt;One way to mitigate false sharing is to ensure that the counters are not on the same cache line. This can be achieved by using separate objects for each counter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class FalseSharingArtificialSolution {

    public static void main(String[] args) {
        FalseSharingCounter falseSharingCounter1 = new FalseSharingCounter();
        FalseSharingCounter falseSharingCounter2 = new FalseSharingCounter();

        Runnable r1 = () -&amp;gt; {
        int iterations = 1_000_000_000;
        long start = System.currentTimeMillis();
        for (int i = 0; i &amp;lt; iterations; i++) {
            falseSharingCounter1.count1++;
        }
            System.out.println("Time taken "+(System.currentTimeMillis()-start)+" ms");
        };

        Runnable r2 = () -&amp;gt; {
            int iterations = 1_000_000_000;
            long start = System.currentTimeMillis();
            for (int i = 0; i &amp;lt; iterations; i++) {
                falseSharingCounter2.count2++;
            }
            System.out.println("Time taken "+(System.currentTimeMillis()-start)+" ms");
        };

        Thread.ofPlatform().name("Thread1").start(r1);
        Thread.ofPlatform().name("Thread1").start(r2);
    }
}

public class FalseSharingCounter {

    public volatile int count1 = 0;
    public volatile int count2 = 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this solution, falseSharingCounter1 and falseSharingCounter2 are two separate objects, ensuring that count1 and count2 are not on the same cache line. This eliminates false sharing, and you'll observe a significant improvement in performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰The Elegant Solution: Using @Contended
&lt;/h3&gt;

&lt;p&gt;While the artificial solution works, it’s not always practical to create separate objects for every counter. One of the solution is to add padding to the variables. We can do manually but Java provides a more elegant solution using the @jdk.internal.vm.annotation.Contended annotation. This annotation tells the JVM to add padding around the annotated field or class to prevent false sharing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1: Padding a Single Field
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class FalseSharingContendedCounter1 {
    // this mean this jvm will pad it so that it will not be in same cache line as other fields of this class
    @jdk.internal.vm.annotation.Contended
    public volatile int count1 = 0;
    public volatile int count2 = 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, count1 is padded to ensure it doesn't share a cache line with count2.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Padding the Entire Class
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@jdk.internal.vm.annotation.Contended
public class FalseSharingContendedCounter2 {
    public volatile int count1 = 0;
    public volatile int count2 = 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the entire class is padded, ensuring that none of its fields share a cache line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 3: Grouping Fields
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class FalseSharingContendedCounter3 {
    @jdk.internal.vm.annotation.Contended("group1")
    public volatile int count1 = 0;
    @jdk.internal.vm.annotation.Contended("group1")
    public volatile int count2 = 0;
    @jdk.internal.vm.annotation.Contended("group2")
    public volatile int count3 = 0;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, count1 and count2 are grouped together and will share the same cache line, while count3 is placed in a different cache line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the Contended Solution
&lt;/h3&gt;

&lt;p&gt;To use the @Contended annotation, you need to run your Java program with the -XX:-RestrictContended JVM option:&lt;/p&gt;

&lt;p&gt;Here’s the complete code for the contended solution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// use -XX:-RestrictContended cm options to run this
public class FalseSharingContendedSolution {

    public static void main(String[] args) {
        FalseSharingContendedCounter1 falseSharingCounter1 = new FalseSharingContendedCounter1();
        FalseSharingContendedCounter1 falseSharingCounter2 = falseSharingCounter1;

        Runnable r1 = () -&amp;gt; {
        int iterations = 1_000_000_000;
        long start = System.currentTimeMillis();
        for (int i = 0; i &amp;lt; iterations; i++) {
            falseSharingCounter1.count1++;
        }
            System.out.println("Time taken "+(System.currentTimeMillis()-start)+" ms");
        };

        Runnable r2 = () -&amp;gt; {
            int iterations = 1_000_000_000;
            long start = System.currentTimeMillis();
            for (int i = 0; i &amp;lt; iterations; i++) {
                falseSharingCounter2.count2++;
            }
            System.out.println("Time taken "+(System.currentTimeMillis()-start)+" ms");
        };

        Thread.ofPlatform().name("Thread1").start(r1);
        Thread.ofPlatform().name("Thread1").start(r2);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✍️Conclusion
&lt;/h3&gt;

&lt;p&gt;False sharing is a subtle but significant performance issue in multi-threaded applications. By understanding how cache lines work and using techniques like object separation or the @Contended annotation, you can mitigate false sharing and improve the performance of your Java applications.&lt;/p&gt;

&lt;p&gt;Remember, in the world of high-performance computing, every nanosecond counts. So, the next time you’re dealing with multi-threaded counters or shared variables, don’t forget to check for false sharing!&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=tLS85IfsbYE&amp;amp;list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4&amp;amp;index=22" rel="noopener noreferrer"&gt;False Sharing in Java — Jakob Jenkov&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy coding! 🚀&lt;/p&gt;

</description>
      <category>falsesharing</category>
      <category>multithreadinginjava</category>
      <category>multithreading</category>
    </item>
    <item>
      <title>Understanding and Preventing Deadlocks in Java: A Comprehensive Guide</title>
      <dc:creator>Urvil Joshi</dc:creator>
      <pubDate>Sat, 01 Feb 2025 22:09:26 +0000</pubDate>
      <link>https://dev.to/urvvil/understanding-and-preventing-deadlocks-in-java-a-comprehensive-guide-1e5c</link>
      <guid>https://dev.to/urvvil/understanding-and-preventing-deadlocks-in-java-a-comprehensive-guide-1e5c</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkhmfuzt0nqvfixxwti3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkhmfuzt0nqvfixxwti3.png" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deadlocks represent a significant challenge in multi-threaded applications. For an introduction, refer to &lt;a href="https://medium.com/@urvvil08/deadlocks-in-java-understanding-the-concurrency-nightmare-25961085c75b" rel="noopener noreferrer"&gt;Deadlocks in Java: Understanding the Concurrency Nightmare&lt;/a&gt;. They happen when two or more threads become indefinitely blocked, each waiting for the other to release resources. This article digs into effective strategies for preventing deadlocks in Java applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  🍥What is a Deadlock?
&lt;/h3&gt;

&lt;p&gt;A deadlock happens when several threads are waiting for resources that are held by one another, forming a circular dependency. For instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread 1 holds Lock A and waits for Lock B&lt;/li&gt;
&lt;li&gt;Thread 2 holds Lock B and waits for Lock A&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, neither thread can move forward, leading to a complete halt.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨Prevention Strategies
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. Lock Reordering
&lt;/h3&gt;

&lt;p&gt;Lock reordering is a simple but effective strategy where we ensure that all threads acquire locks in the same order. This prevents the circular wait condition necessary for deadlocks to occur.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class DeadlockPreventionLockRunnableA implements Runnable{

    private Lock lock1;
    private Lock lock2;

    public DeadlockPreventionLockRunnableA(Lock lock1, Lock lock2) {
        this.lock1 = lock1;
        this.lock2 = lock2;
    }

    @Override
    public void run() {
        String name = Thread.currentThread().getName();
        System.out.println(name + " is attempting to lock lock1");
        lock1.lock();
        System.out.println(name + " has locked lock1");

        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {

        }
        System.out.println(name + " is attempting to lock lock2");
        lock2.lock();
        System.out.println(name + " has locked lock2");
        lock2.unlock();
        System.out.println(name + " is attempting to unlock lock2");
        lock1.unlock();
        System.out.println(name + " is attempting to unlock lock1");

    }
}

public class DeadlockPreventionLockRunnableB implements Runnable{

    private Lock lock1;
    private Lock lock2;

    public DeadlockPreventionLockRunnableB(Lock lock1, Lock lock2) {
        this.lock1 = lock1;
        this.lock2 = lock2;
    }

    @Override
    public void run() {
        String name = Thread.currentThread().getName();
        System.out.println(name + " is attempting to lock lock1");
        lock1.lock();
        System.out.println(name + " has locked lock1");

        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {

        }
        System.out.println(name + " is attempting to lock lock2");
        lock2.lock();
        System.out.println(name + " has locked lock2");
        lock2.unlock();
        System.out.println(name + " is attempting to unlock lock2");
        lock1.unlock();
        System.out.println(name + " is attempting to unlock lock1");

    }
}

public class DeadlockPreventionLockReordering {

    public static void main(String[] args) {

        Lock lock1 = new ReentrantLock();
        Lock lock2 = new ReentrantLock();

        // depends on order which they acquire lock not when they release
        Runnable r1 = new DeadlockPreventionLockRunnableA(lock1, lock2);
        Runnable r2 = new DeadlockPreventionLockRunnableB(lock1, lock2);

        Thread.ofPlatform().name("thread 1").start(r1);
        Thread.ofPlatform().name("thread 2").start(r2);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both classes acquire locks in the same order&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Timeout and Back-off Strategy
&lt;/h3&gt;

&lt;p&gt;Sometimes lock ordering isn’t feasible due to application requirements. In such cases, we can implement a timeout and back-off strategy where threads:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Attempt to acquire locks with a timeout&lt;/li&gt;
&lt;li&gt;Release all held locks if they can’t acquire all needed locks&lt;/li&gt;
&lt;li&gt;Wait for a random period before retrying&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Implementation :&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class DeadlockPreventionTimeoutBackOff {

    public static void main(String[] args) {

        Lock lock1 = new ReentrantLock();
        Lock lock2 = new ReentrantLock();

        // depends on order which they acquire lock not when they release
        Runnable r1 = new DeadlockPreventionTimeoutBackOffRunnableA(lock1, lock2);
        Runnable r2 = new DeadlockPreventionTimeoutBackOffRunnableB(lock1, lock2);

        Thread.ofPlatform().name("thread 1").start(r1);
        Thread.ofPlatform().name("thread 2").start(r2);
    }
}

public class DeadlockPreventionTimeoutBackOffRunnableA implements Runnable{

    private Lock lock1;
    private Lock lock2;

    public DeadlockPreventionTimeoutBackOffRunnableA(Lock lock1, Lock lock2) {
        this.lock1 = lock1;
        this.lock2 = lock2;
    }

    @Override
    public void run() {
        while(true){
            int failures = 0;
            while(!lockBothLocks()){
                failures++;
                System.err.println(Thread.currentThread().getName() + " is not able to lock both locks failure attempt no " + failures);
                sleep(100L * (long) Math.random());
            }
            if(failures&amp;gt;0){
                System.out.println(Thread.currentThread().getName() + " is able to lock both locks after " + failures+ " attempts");
            }
            lock2.unlock();
            lock1.unlock();
        }

    }

    private boolean lockBothLocks() {
        try {
            boolean lock1Locked = lock1.tryLock(1000, TimeUnit.MILLISECONDS);
            if(!lock1Locked){
                return false;
            }
        } catch (InterruptedException e) {
            return false;
        }
        try {
            boolean lock2Locked = lock2.tryLock(1000, TimeUnit.MILLISECONDS);
            if(!lock2Locked){
                lock1.unlock();
                return false;
            }
        } catch (InterruptedException e) {
            lock1.unlock();
            return false;
        }
        return true;
    }

    private static void sleep(long millis) {
        try {
            Thread.sleep(millis);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }
}

public class DeadlockPreventionTimeoutBackOffRunnableB implements Runnable {

    private Lock lock1;
    private Lock lock2;

    public DeadlockPreventionTimeoutBackOffRunnableB(Lock lock1, Lock lock2) {
        this.lock1 = lock1;
        this.lock2 = lock2;
    }

    @Override
    public void run() {
        while (true) {
            int failures = 0;
            while (!lockBothLocks()) {
                failures++;
                System.err.println(Thread.currentThread().getName() + " is not able to lock both locks failure attempt no " + failures);
                sleep(100L * (long) Math.random());
            }
            if (failures &amp;gt; 0) {
                System.out.println(Thread.currentThread().getName() + " is able to lock both locks after " + failures + " attempts");
            }
            lock1.unlock();
            lock2.unlock();
        }

    }

    private boolean lockBothLocks() {
        try {
            boolean lock2Locked = lock2.tryLock(1000, TimeUnit.MILLISECONDS);
            if (!lock2Locked) {
                return false;
            }
        } catch (InterruptedException e) {
            return false;
        }
        try {
            boolean lock1Locked = lock1.tryLock(1000, TimeUnit.MILLISECONDS);
            if (!lock1Locked) {
                lock2.unlock();
                return false;
            }
        } catch (InterruptedException e) {
            lock2.unlock();
            return false;
        }
        return true;
    }

    private static void sleep(long millis) {
        try {
            Thread.sleep(millis);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key features of this implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses tryLock() with timeout instead of blocking lock().&lt;/li&gt;
&lt;li&gt;Implements exponential back-off with random sleep as that will increase chance of one thread acquiring lock.&lt;/li&gt;
&lt;li&gt;Releases all acquired locks if unable to acquire all needed locks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Deadlock Detection
&lt;/h3&gt;

&lt;p&gt;For complex systems, implementing deadlock detection can be valuable. The basic approach involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Maintaining a graph of lock acquisitions&lt;/li&gt;
&lt;li&gt;Checking for cycles in this graph before granting new locks&lt;/li&gt;
&lt;li&gt;Taking corrective action if a potential deadlock is detected&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6zv77w39y8zku9ojgwe.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6zv77w39y8zku9ojgwe.gif" width="686" height="726"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Graph 1&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here we have Lock and threads node which represent our locks and threads in a graph. we have lock 1 locked by thread 1 and so on . Threads 3 has to check before locking lock 1 if it will make graph cyclic . If it does then it will become deadlock like graph 2&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxcp3g7f5kxoqswlfgib.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxcp3g7f5kxoqswlfgib.gif" width="630" height="430"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Graph 2&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🧰Best Practices for Deadlock Prevention
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Always acquire locks in a consistent order across your application&lt;/li&gt;
&lt;li&gt;Use timeout mechanisms instead of indefinite waiting&lt;/li&gt;
&lt;li&gt;Implement back-off strategies for retry attempts&lt;/li&gt;
&lt;li&gt;Keep lock holding times as short as possible&lt;/li&gt;
&lt;li&gt;Consider using higher-level concurrency utilities like java.util.concurrent collections&lt;/li&gt;
&lt;li&gt;Implement monitoring and detection mechanisms for production systems&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ✍️Conclusion
&lt;/h3&gt;

&lt;p&gt;Deadlock prevention has to be systematically approached with good design. The best that could be hoped for is the best possible approximation under given conditions; lock ordering, timeouts, and back-off mechanisms can go a long way toward preventing deadlocks in your applications.&lt;br&gt;&lt;br&gt;
Remember that best practice often depends on your specific use case sometimes lock ordering is all you need, and at other times you require more elaborate timeout and retry mechanisms.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎗️Reference
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=6E3aYf3jXdk&amp;amp;list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4&amp;amp;index=18" rel="noopener noreferrer"&gt;Deadlock Prevention in Java — Jakob Jenkov&lt;/a&gt;&lt;/p&gt;

</description>
      <category>deadlock</category>
      <category>java</category>
      <category>deadlockprevention</category>
    </item>
  </channel>
</rss>
