<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jenny</title>
    <description>The latest articles on DEV Community by Jenny (@jenny_dev).</description>
    <link>https://dev.to/jenny_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3784533%2F837c365f-95ab-496d-aa12-57141bb922d6.jpeg</url>
      <title>DEV Community: Jenny</title>
      <link>https://dev.to/jenny_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jenny_dev"/>
    <language>en</language>
    <item>
      <title>"My GPT-5 coding agent was a 10x developer until I actually looked at the Git diff"</title>
      <dc:creator>Jenny</dc:creator>
      <pubDate>Sun, 22 Feb 2026 16:04:46 +0000</pubDate>
      <link>https://dev.to/jenny_dev/my-gpt-5-coding-agent-was-a-10x-developer-until-i-actually-looked-at-the-git-diff-4d0</link>
      <guid>https://dev.to/jenny_dev/my-gpt-5-coding-agent-was-a-10x-developer-until-i-actually-looked-at-the-git-diff-4d0</guid>
      <description>&lt;p&gt;I love modern AI coding tools. I really do. &lt;/p&gt;

&lt;p&gt;They are incredibly fast, hyper-confident, and fully willing to refactor your entire codebase because you casually typed the words "clean up the auth flow" at 2 AM. &lt;/p&gt;

&lt;p&gt;Which is how I learned a very annoying, very expensive truth:&lt;br&gt;
&lt;strong&gt;AI doesn't ship your project. Your workflow does.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a while, my workflow was basically what everyone is calling "vibe coding." It looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask for a feature.&lt;/li&gt;
&lt;li&gt;Watch the AI write 400 lines of code.&lt;/li&gt;
&lt;li&gt;Ship it.&lt;/li&gt;
&lt;li&gt;Fix the weird edge case it broke.&lt;/li&gt;
&lt;li&gt;Repeat until my repo looks like a thrift store of mismatched design patterns.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It worked... until I tried to change something two weeks later and realized half the code was written for a version of my app (and a version of me) that no longer exists. &lt;/p&gt;

&lt;p&gt;So, I ran an experiment on a real project task (small SaaS backend stuff: an auth tweak, a webhook handler, and a permissions edge case). Not huge, but enough surface area for the AI to cause some damage. &lt;/p&gt;

&lt;p&gt;And here’s the punchline: The model didn’t matter as much as I expected. The spec did. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Enemy is "Drift"
&lt;/h3&gt;

&lt;p&gt;Most people vibe-code like this: You give the agent a vague prompt, chat back and forth, and let it "figure it out." &lt;/p&gt;

&lt;p&gt;Even with the absolute bleeding-edge February 2026 models—like GPT-5, Claude 4.5 Sonnet, or Gemini 3.1 Pro—this approach eventually fails. Sometimes it works temporarily. But usually, the AI does extra stuff you didn’t ask for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It changes unrelated files.&lt;/li&gt;
&lt;li&gt;It introduces new NPM dependencies.&lt;/li&gt;
&lt;li&gt;It invents a completely new architecture mid-task.&lt;/li&gt;
&lt;li&gt;It fixes the symptom but silently breaks an edge case.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Leaves you with code that technically compiles, but makes future-you suffer. That’s &lt;strong&gt;drift&lt;/strong&gt;. And drift is what turns "AI speed" into "AI debt."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Changed Everything: The Micro-Spec
&lt;/h3&gt;

&lt;p&gt;I don’t write long documents. I’m not trying to cosplay as a PM for my own side projects. But I realized I had to stop handing the AI "vibes" and start handing it rules.&lt;/p&gt;

&lt;p&gt;Now, I write a one-screen execution spec &lt;em&gt;before&lt;/em&gt; I touch any code. This is the template I use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; One sentence. What should happen when the feature is done?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-goals:&lt;/strong&gt; What are we explicitly NOT doing?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope:&lt;/strong&gt; Which specific modules/files are allowed to change?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraints:&lt;/strong&gt; No new dependencies? Follow existing patterns? &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptance checks:&lt;/strong&gt; Tests or behavior checks that prove it’s actually done.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (Realistic Webhook Task):&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Handle subscription upgrade webhook from Stripe.&lt;br&gt;
&lt;strong&gt;Non-goals:&lt;/strong&gt; No database schema refactors. No UI changes.&lt;br&gt;
&lt;strong&gt;Scope:&lt;/strong&gt; &lt;code&gt;billing.service.ts&lt;/code&gt; + &lt;code&gt;webhook.route.ts&lt;/code&gt; ONLY.&lt;br&gt;
&lt;strong&gt;Constraints:&lt;/strong&gt; Handler must be idempotent. Strict signature verification.&lt;br&gt;
&lt;strong&gt;Acceptance:&lt;/strong&gt; Replay test passes. Invalid signature test fails. Double-event test passes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once this exists, the AI stops improvising product decisions. It becomes an executor. Which is exactly what you want.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 2026 Tool Stack (What actually helps)
&lt;/h3&gt;

&lt;p&gt;I’m not married to any tool, but after testing the latest updates, here is how they fit if you want to be spec-driven:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Chat Models (Claude 4.5 / GPT-5 / Gemini 3.1 Pro)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Drafting the micro-spec, listing edge cases, and suggesting acceptance tests. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worst for:&lt;/strong&gt; Directly prompting "build the whole feature" without constraints. That’s how you get scope creep.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Planning &amp;amp; Spec Layers (Traycer.ai)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the project is big enough, your one-screen spec turns into multiple sections (auth, DB, UI). At that point, manual markdown gets messy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; I’ve started using planning extensions inside VS Code, specifically &lt;strong&gt;Traycer&lt;/strong&gt;. It forces file-level breakdowns &lt;em&gt;before&lt;/em&gt; the code is written. You give Traycer your high-level goal, and it generates a strict Phase Plan. It’s not magic; it’s just structured enough that you stop handing agents vibes. Plus, it verifies the final code against the plan to catch any hallucinations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. IDE Agents (Cursor / Copilot Workspace)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Implementing a strict Traycer spec in a scoped blast radius. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worst for:&lt;/strong&gt; Vague requirements. If you ask Cursor to "fix billing," it will still drift. Just faster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Workflow That Stopped My Repo from Haunting Me
&lt;/h3&gt;

&lt;p&gt;Here’s what I do now for anything that actually matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write the tiny spec (or use Traycer to generate the Phase Plan).&lt;/li&gt;
&lt;li&gt;Ask a chat model to list edge cases I missed.&lt;/li&gt;
&lt;li&gt;Execute in an IDE agent with a strictly locked scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review the diff against the original spec.&lt;/strong&gt; (Did it sneak in a dependency?)&lt;/li&gt;
&lt;li&gt;Run tests and commit.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If something goes wrong, I don't yell at the AI. I update the spec first, then re-run it. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My hot take:&lt;/strong&gt;&lt;br&gt;
If you can’t write acceptance checks, the task isn’t ready for a GPT-5 or Claude 4.5 agent. It’s ready for you to sit down and think.&lt;/p&gt;

&lt;h3&gt;
  
  
  Questions for you guys (because I’m curious):
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;What’s your biggest AI failure mode lately? Scope creep? Random dependencies? The infinite fix loop?&lt;/li&gt;
&lt;li&gt;Are you writing specs first, or just prompting until it compiles and praying? Let me know below.&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
  </channel>
</rss>
