<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dennis Schmock</title>
    <description>The latest articles on DEV Community by Dennis Schmock (@dennisschmock).</description>
    <link>https://dev.to/dennisschmock</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F981842%2Fcf768f66-4d41-45e1-b940-f065c2f0ed27.jpeg</url>
      <title>DEV Community: Dennis Schmock</title>
      <link>https://dev.to/dennisschmock</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dennisschmock"/>
    <language>en</language>
    <item>
      <title>Vibe Coding vs AI-Driven Development: The Contracts Problem (and GS-TDD)</title>
      <dc:creator>Dennis Schmock</dc:creator>
      <pubDate>Fri, 09 Jan 2026 22:08:42 +0000</pubDate>
      <link>https://dev.to/dennisschmock/vibe-coding-vs-ai-driven-development-the-contracts-problem-and-gs-tdd-3h39</link>
      <guid>https://dev.to/dennisschmock/vibe-coding-vs-ai-driven-development-the-contracts-problem-and-gs-tdd-3h39</guid>
      <description>&lt;p&gt;Most AI-assisted coding right now is just… vibes.&lt;/p&gt;

&lt;p&gt;You ask for a feature, you get a plausible diff, it compiles, your dopamine spikes, and two days later production starts doing interpretive dance.&lt;/p&gt;

&lt;p&gt;That failure mode has a name: &lt;strong&gt;vibe coding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What I’m aiming for instead is &lt;strong&gt;AI-driven development&lt;/strong&gt;. Whether you call it AI-Assisted Engineering or (as we say in my native Denmark) &lt;em&gt;AI-drevet udvikling&lt;/em&gt;, the goal is the same: a workflow where AI helps you design and implement software every day, while humans keep full accountability for correctness, security, and operations.&lt;/p&gt;

&lt;p&gt;The difference isn’t “which model” you use. It’s &lt;strong&gt;contracts&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe coding is a trust problem
&lt;/h2&gt;

&lt;p&gt;Vibe coding is what happens when we treat generated code like it’s “probably fine” because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it &lt;em&gt;looks&lt;/em&gt; reasonable&lt;/li&gt;
&lt;li&gt;it uses familiar patterns&lt;/li&gt;
&lt;li&gt;it passes a quick smoke test&lt;/li&gt;
&lt;li&gt;it’s late and we want to go home&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI is great at producing plausible code. But plausible code is not the same as correct code.&lt;/p&gt;

&lt;p&gt;If your workflow is “generate → ship”, you’re not doing AI-driven development. You’re doing probabilistic deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI-driven development is a contract problem
&lt;/h2&gt;

&lt;p&gt;The reliable version is simple, but it requires discipline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define the contract&lt;/strong&gt; (Humans)&lt;/li&gt;
&lt;li&gt;Let AI work &lt;strong&gt;inside&lt;/strong&gt; that contract (Machine)&lt;/li&gt;
&lt;li&gt;Verify the result like an adult (Humans + CI)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The “contract” can be acceptance tests, unit tests, invariants, schemas, types, or all of the above. But tests are the most universal contract because they encode behavior &lt;em&gt;and&lt;/em&gt; they fail loudly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring loop: Red → Gold → Refactor (GS-TDD)
&lt;/h2&gt;

&lt;p&gt;To operationalize this, I use a variation of TDD I call &lt;strong&gt;Gold Standard TDD (GS-TDD)&lt;/strong&gt;. It evolves the classic Red–Green–Refactor loop into &lt;strong&gt;Red–Gold–Refactor&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Red&lt;/strong&gt;: Write (or generate) a failing test suite that defines behavior. Ideally BDD-style so intent and edge cases are explicit.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Gold&lt;/strong&gt;: Instruct the AI to produce a &lt;strong&gt;Gold Standard implementation&lt;/strong&gt; on the first pass, which should be &lt;em&gt;production-oriented by design&lt;/em&gt;&lt;br&gt;
That means that we have:    &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security-aware defaults&lt;/li&gt;
&lt;li&gt;Maintainable structure&lt;/li&gt;
&lt;li&gt;Solid error handling and boundaries&lt;/li&gt;
&lt;li&gt;Boring, standard architecture&lt;/li&gt;
&lt;li&gt;no mvp, no prototyping&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Refactor&lt;/strong&gt;: Improve the internals safely while keeping behavior unchanged.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The key shift:&lt;/strong&gt; GS-TDD replaces “minimal Green” with “Gold Standard”, because AI allows us to skip the boilerplate phase. Your tests (the contract) keep the AI honest, and the "Gold" prompt keeps the architecture clean.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For a deeper dive into the methodology, check out my research note on &lt;a href="https://www.boringreliability.dev/research/gs-tdd" rel="noopener noreferrer"&gt;GS-TDD&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny example (The Workflow)
&lt;/h2&gt;

&lt;p&gt;Say you’re adding a rule: “Users can’t create more than 3 API keys”.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Red&lt;/strong&gt;: Write the contract first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;limits API keys per user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;seedUser&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// This defines the contract: behavior AND error shape&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;createKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nx"&gt;rejects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toThrow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;API key limit reached&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the test suite is the boss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gold&lt;/strong&gt;: Prompt the AI for a production-oriented first pass &lt;em&gt;under constraints&lt;/em&gt;. You don't just say "make it pass". You say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Implement this feature. Enforce the limit transactionally to avoid race conditions, use our standard domain error types, and add a structured log line when the limit is hit."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The “Gold” code might still fail a test or two on the first run — that’s fine. You debug inside the contract until it’s green.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Refactor&lt;/strong&gt;: Once behavior is proven, you clean up naming and file structure.&lt;/p&gt;

&lt;p&gt;AI can help in all steps — but &lt;strong&gt;the contract decides what’s allowed&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Boring AI Checklist (8 rules I actually follow)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define the contract&lt;/strong&gt;: Tests describe what must remain true.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify, don’t trust&lt;/strong&gt;: Run tests, lint, and build locally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer small diffs&lt;/strong&gt;: AI hallucinates more in large contexts. Keep changes reviewable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep prompts in the diff&lt;/strong&gt;: The “why” belongs in PRs/docs, not in your chat history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review like it’s human code&lt;/strong&gt;: Check naming, invariants, and boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guard risky edges&lt;/strong&gt;: Auth, data loss, security headers, rate limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch production&lt;/strong&gt;: Logs/metrics/alerts confirm the change behaved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback is a feature&lt;/strong&gt;: Know how you undo the change before you ship.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. No mysticism. Just boring verification at high speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  “But isn’t this just TDD?”
&lt;/h2&gt;

&lt;p&gt;Sort of — but with a twist in motivation.&lt;/p&gt;

&lt;p&gt;Classic TDD helps humans think clearly and design well. With AI in the loop, tests also become &lt;strong&gt;safety rails&lt;/strong&gt; that stop plausible nonsense from sliding into main.&lt;/p&gt;

&lt;p&gt;If your AI workflow doesn’t have contracts, your delivery pipeline is basically a slot machine that sometimes pays out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Read the full series
&lt;/h2&gt;

&lt;p&gt;I’m documenting this entire methodology (and the concept of "Wards") on my site.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Hub&lt;/strong&gt;: &lt;a href="https://www.boringreliability.dev/ai-driven-development" rel="noopener noreferrer"&gt;AI-driven development / AI-drevet udvikling&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Core Idea&lt;/strong&gt;: &lt;a href="https://www.boringreliability.dev/blog/ward-2-red-gold-refactor" rel="noopener noreferrer"&gt;Ward #2: Red-Gold-Refactor&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The future of coding is boring: small diffs, explicit contracts, and fewer production surprises.&lt;/p&gt;

&lt;p&gt;Which is exactly the kind of future worth shipping.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>tdd</category>
      <category>reliability</category>
    </item>
  </channel>
</rss>
