<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adun.J.Fong</title>
    <description>The latest articles on DEV Community by Adun.J.Fong (@adun1982).</description>
    <link>https://dev.to/adun1982</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949467%2F034f31ce-0961-4b58-82ab-fae1eceff93e.jpeg</url>
      <title>DEV Community: Adun.J.Fong</title>
      <link>https://dev.to/adun1982</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adun1982"/>
    <language>en</language>
    <item>
      <title>AI-Native Step Tracing: Verify AI Code by Observing Behavior, Not Reading It</title>
      <dc:creator>Adun.J.Fong</dc:creator>
      <pubDate>Sun, 24 May 2026 19:08:23 +0000</pubDate>
      <link>https://dev.to/adun1982/ai-native-step-tracing-verify-ai-code-by-observing-behavior-not-reading-it-10jd</link>
      <guid>https://dev.to/adun1982/ai-native-step-tracing-verify-ai-code-by-observing-behavior-not-reading-it-10jd</guid>
      <description>&lt;p&gt;I realized something that changed how I work with AI:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code review of AI-generated code costs as much as writing it yourself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That cancels out the speed advantage entirely. You're back to square one.&lt;/p&gt;

&lt;p&gt;The problem isn't the model. The problem is where you're putting the verification effort.&lt;/p&gt;

&lt;p&gt;## The Insight&lt;/p&gt;

&lt;p&gt;AI hallucinations show up in &lt;strong&gt;behavior&lt;/strong&gt;, not in syntax. The code looks fine. It passes review. It passes tests. Then it breaks in production.&lt;/p&gt;

&lt;p&gt;So stop verifying code. Verify behavior.&lt;/p&gt;

&lt;p&gt;## The Methodology&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Assume AI will always make mistakes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not sometimes — always. Same engineering logic as "networks drop packets." Design around the failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Define steps before implementation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Steps exist in every implementation anyway — either you define them before, or AI invents them during. Defining them upfront costs almost nothing but&lt;br&gt;
  gives you a behavioral contract you can verify.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. How to choose step granularity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the hardest part. The conventional instinct is to split along code boundaries — functions, modules, services.&lt;/p&gt;

&lt;p&gt;The right instinct: split along &lt;strong&gt;data change moments&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A 1500-line function can be a single step if it produces one data change you care about. Three lines of code can be a step if those three lines produce&lt;br&gt;
  data you need to verify. Step size has nothing to do with code size.&lt;/p&gt;

&lt;p&gt;This also means you're not constraining AI's implementation. You're only saying "I need to see the data at this point." How AI gets there is entirely its&lt;br&gt;
  creative space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Wrap each step in a tracer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Record input, output, duration, success/failure. No instrumentation needed after the fact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Verify by observing results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Did the step run? Is the data correct? Is the timing reasonable? A human can answer these in seconds. No code review required.&lt;/p&gt;

&lt;p&gt;Human defines steps → AI implements → Trace verifies → Human observes&lt;/p&gt;

&lt;p&gt;## Without Tracing vs. With Tracing&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without tracing:&lt;/strong&gt;&lt;br&gt;
  ❌ Error: Insufficient stock&lt;br&gt;
  → Which step failed? How far did it get? Unknown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With tracing:&lt;/strong&gt;&lt;br&gt;
  ❌ Step 1: ① Check stock (44ms)&lt;br&gt;
     Input  : {"productId":"prod_002","quantity":1}&lt;br&gt;
     Error  : Insufficient stock: available=0, requested=1&lt;/p&gt;

&lt;p&gt;Exact step. Exact input. Exact error. Located instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Successful order with tracing:&lt;/strong&gt;&lt;br&gt;
  ✅ Step 1: ① Check stock (31ms)&lt;br&gt;
  ✅ Step 2: ② Lock stock &amp;amp; create order (32ms)&lt;br&gt;
  ✅ Step 3: ③ Push to payment queue (15ms)&lt;br&gt;
  ✅ Step 4: ④ Payment processed (64ms)&lt;br&gt;
  ✅ Step 5: ⑤ Notify shipping (28ms)&lt;/p&gt;

&lt;p&gt;The trace &lt;strong&gt;is&lt;/strong&gt; the delivery proof. No code review needed.&lt;/p&gt;

&lt;p&gt;## Why Heavy Prompt Constraints Backfire&lt;/p&gt;

&lt;p&gt;Heavy prompt constraints tell AI what &lt;em&gt;not&lt;/em&gt; to do. AI spends attention avoiding rules instead of solving the problem.&lt;/p&gt;

&lt;p&gt;Step definitions tell AI &lt;em&gt;what to achieve&lt;/em&gt;. The implementation is AI's creative space. You get better results and more flexibility — not less.&lt;/p&gt;

&lt;p&gt;## The Bugs You Can't See&lt;/p&gt;

&lt;p&gt;The demo (&lt;code&gt;src/order.ts&lt;/code&gt;) contains both implementations. The naive version has three real bugs invisible in code review:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Race condition&lt;/strong&gt; — stock checked and deducted separately. Concurrent requests oversell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing transaction&lt;/strong&gt; — order creation and stock deduction not atomic. Crash between them = inconsistent data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No idempotency&lt;/strong&gt; — duplicate payment messages process twice.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These pass code review. They pass unit tests. They show up in production under load.&lt;/p&gt;

&lt;p&gt;Defining the steps precisely forces these boundaries to become visible.&lt;/p&gt;

&lt;p&gt;## Design Principles&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No model lock-in&lt;/strong&gt; — works with any model. Optimized for the weakest model you use; stronger models only improve results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No language lock-in&lt;/strong&gt; — the pattern works in any language. This demo is TypeScript.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimum viable contract&lt;/strong&gt; — two fields: &lt;code&gt;traceId&lt;/code&gt; and step name. Everything else is implementation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;## Try It&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
  git clone https://github.com/adun1982/step-trace
  npm install
  npx tsx src/index.ts

  ---
  This is shared freely. No product. No upsell. Just a pattern that emerged from real production use — solo developer, 20 restaurant chains, 400 locations,
  team-scale output.

  If it changes how you think about AI-assisted development, that's enough.

  ---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
