<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anh Nguyen Lewis</title>
    <description>The latest articles on DEV Community by Anh Nguyen Lewis (@anhnguyensynctree).</description>
    <link>https://dev.to/anhnguyensynctree</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867015%2F0db7d0d6-8646-4bd5-b6ac-691f91dcf99b.png</url>
      <title>DEV Community: Anh Nguyen Lewis</title>
      <link>https://dev.to/anhnguyensynctree</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anhnguyensynctree"/>
    <language>en</language>
    <item>
      <title>AI Writes Your Tests. Here's What It Systematically Misses.</title>
      <dc:creator>Anh Nguyen Lewis</dc:creator>
      <pubDate>Wed, 08 Apr 2026 05:57:21 +0000</pubDate>
      <link>https://dev.to/anhnguyensynctree/ai-writes-your-tests-heres-what-it-systematically-misses-3a38</link>
      <guid>https://dev.to/anhnguyensynctree/ai-writes-your-tests-heres-what-it-systematically-misses-3a38</guid>
      <description>&lt;h1&gt;
  
  
  AI Writes Your Tests. Here's What It Systematically Misses.
&lt;/h1&gt;

&lt;p&gt;We ran a tool called Optinum against 16 real bugs from SWE-bench Verified — a dataset of production OSS issues with human-verified patches. In 62.5% of cases, the AI-written tests that accompanied each fix missed the exact failure class the bug belonged to.&lt;/p&gt;

&lt;p&gt;Not random misses. The same categories, over and over.&lt;/p&gt;

&lt;p&gt;We also took one instance, synthesized a test, and proved it in Docker: the test fails on the bug commit and passes on the fix commit. No spreadsheets, no hand-waving.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;optinum benchmark &lt;span class="nt"&gt;--verify&lt;/span&gt; sympy__sympy-18199
&lt;span class="go"&gt;
Optinum E2E Verify — sympy__sympy-18199
  Pattern:    cascade-change (cascade-blindness catalog)
  Test code:  def test_nthroot_mod_cubic_composite():

  test_fails_on_bug:   true
  test_passes_on_fix:  true
  execution_verified:  true
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the headline. Here's the full story.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Is Structural, Not a Quality Issue
&lt;/h2&gt;

&lt;p&gt;When an AI coding tool fixes a bug, it typically generates a test alongside the code. The test covers the function that was changed. Coverage goes up. The PR ships.&lt;/p&gt;

&lt;p&gt;The problem is not that AI writes bad tests. The problem is that AI writes tests that share the same blind spots as the code it just wrote.&lt;/p&gt;

&lt;p&gt;When an AI modifies &lt;code&gt;method A&lt;/code&gt;, it understands that diff perfectly. It writes a test for &lt;code&gt;method A&lt;/code&gt;. What it doesn't do — because it has no structural reason to — is ask: &lt;em&gt;what other functions in this file or repo are affected by this change, and do those also need updating or testing?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The blast radius is invisible to it.&lt;/p&gt;

&lt;p&gt;A human doing a code review would look at the diff, see &lt;code&gt;_build_repr&lt;/code&gt; changed, and grep for all callers. The AI tests what it authored. The rest of the codebase is outside its context.&lt;/p&gt;

&lt;p&gt;This is the pattern we call &lt;strong&gt;cascade-blindness&lt;/strong&gt;. It's not the only one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Evidence: SWE-bench Verified
&lt;/h2&gt;

&lt;p&gt;SWE-bench Verified is a benchmark of 500 real GitHub issues from production OSS projects, each with a verified patch. It's built to test whether automated tools can reproduce realistic bug-fixing scenarios — not toy examples.&lt;/p&gt;

&lt;p&gt;We ran Optinum's classifier against all 500 instances and the full synthesis pipeline against a 16-instance pilot. The pilot spans six projects: Django, sympy, scikit-learn, requests, Sphinx, and LangChain. Every instance has a human-verified ground-truth label for the bug's change type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pilot Results
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Optinum SWE-bench Pilot — 16 instances [catalog-classification]

  ✓ astropy__astropy-7336
  ✓ django__django-10973
  ✓ django__django-11066             [AI gap]
  ✓ django__django-13964
  ✓ django__django-14034
  ✓ django__django-15695             [AI gap]
  ✓ django__django-7530
  ✓ psf__requests-1724               [AI gap]
  ✓ sphinx-doc__sphinx-8265          [AI gap]
  ✓ sympy__sympy-18199
  ✓ scikit-learn__scikit-learn-14983 [AI gap]
  ✓ matplotlib__matplotlib-23413     [AI gap]
  ✓ django__django-12589             [AI gap]
  ✓ django__django-14855             [AI gap]
  ✓ sphinx-doc__sphinx-9367          [AI gap]
  ✓ langchain-ai__langchain-35871    [AI gap]

── Summary ──────────────────────────────────
  Pilot size:      16
  Catch rate:      16/16  (catalog pattern matched)
  AI gap hits:     10/16  (AI missed, Optinum catches)
  False-pos rate:  0/16   (dry-run)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;[AI gap]&lt;/code&gt; means the AI-generated test suite that accompanied the fix did not cover the failure class Optinum targets for that instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Run (500 instances)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Change Type&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;new-write-endpoint&lt;/td&gt;
&lt;td&gt;347&lt;/td&gt;
&lt;td&gt;69.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cascade-change&lt;/td&gt;
&lt;td&gt;68&lt;/td&gt;
&lt;td&gt;13.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;contract-change&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;11.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;schema-migration&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;4.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;type-widening&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every one of the 500 instances mapped to a pattern in the catalog. The catalog was built independently from SWE-bench — its coverage is a validation, not a training outcome.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Proof: Docker Execution
&lt;/h2&gt;

&lt;p&gt;The sympy instance is the clearest proof. Here's what it demonstrates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bug&lt;/strong&gt; (&lt;code&gt;sympy__sympy-18199&lt;/code&gt;): &lt;code&gt;nthroot_mod&lt;/code&gt; — the function that finds nth roots in modular arithmetic — raised &lt;code&gt;NotImplementedError&lt;/code&gt; for any composite modulus. It only handled prime moduli.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The patch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; def nthroot_mod(a, n, p, all_roots=False):
     ...
     if n == 2:
         return sqrt_mod(a, p, all_roots)
&lt;span class="gd"&gt;-    if not is_nthpow_residue(a, n, p):
-        return None
&lt;/span&gt;     if not isprime(p):
&lt;span class="gd"&gt;-        raise NotImplementedError("Not implemented for composite p")
&lt;/span&gt;&lt;span class="gi"&gt;+        return _nthroot_mod_composite(a, n, p)
+    if a % p == 0:
+        return [0]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The test Optinum synthesized:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sympy.ntheory.residue_ntheory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;nthroot_mod&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_nthroot_mod_cubic_composite&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# n=3 hits the composite check directly (n=2 shortcuts to sqrt_mod)
&lt;/span&gt;    &lt;span class="c1"&gt;# Pre-fix: raises NotImplementedError("Not implemented for composite p")
&lt;/span&gt;    &lt;span class="c1"&gt;# Post-fix: returns roots via _nthroot_mod_composite using CRT
&lt;/span&gt;    &lt;span class="n"&gt;roots&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nthroot_mod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all_roots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;roots&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nthroot_mod returned None for composite modulus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Docker execution:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The sandbox:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clones sympy at commit &lt;code&gt;ba80d1e&lt;/code&gt; (the bug commit)&lt;/li&gt;
&lt;li&gt;Installs from source with &lt;code&gt;pip install -e .&lt;/code&gt; — no compilation required&lt;/li&gt;
&lt;li&gt;Runs the test → &lt;strong&gt;FAILS&lt;/strong&gt;: &lt;code&gt;NotImplementedError: Not implemented for composite p&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Applies the patch with &lt;code&gt;git apply&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Runs the test again → &lt;strong&gt;PASSES&lt;/strong&gt;: roots found via Chinese Remainder Theorem
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;test_fails_on_bug&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;   &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;test_passes_on_fix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;execution_verified&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a reproducible fact, not a benchmark claim. The test is wrong until the patch is applied. After the patch, it is right.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Catalog: 22 Patterns Across 6 Change Types
&lt;/h2&gt;

&lt;p&gt;The core of Optinum is a &lt;strong&gt;blind spot catalog&lt;/strong&gt; — a taxonomy of the ways AI-generated code fails in ways its own tests don't catch. Every pattern has OSS evidence, a severity rating, and a flag marking whether it appears specifically because of how LLMs generate code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Contract-Change Patterns
&lt;/h3&gt;

&lt;p&gt;These fire when an API signature, parameter, or response shape changes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;AI-Native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Renamed API Parameters&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Changed Response Shape&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Required Field Not Sent by Callers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Removed Field Still Sent by Callers&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP Status Code Changed&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Response Cast Without Runtime Validation&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unit Test Mocks the Same Assumption the Code Has&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The last one is the most insidious:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI writes a function expecting &lt;code&gt;{ user: { id } }&lt;/code&gt;. AI writes a test mocking the dependency to return &lt;code&gt;{ user: { id } }&lt;/code&gt;. Both the code and the mock share the same wrong assumption. The test passes. Production sends &lt;code&gt;{ userId }&lt;/code&gt; and crashes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The test is a circular proof of nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  New-Write-Endpoint Patterns
&lt;/h3&gt;

&lt;p&gt;These fire when a new API endpoint or mutation is added.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;AI-Native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Idempotency Key Not Sent by Callers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth Check Without Ownership Verification (IDOR)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Endpoint Added Outside Auth Middleware&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sanitization at HTTP Boundary Bypassed Internally&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Step Operation Without Database Transaction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cascade-Change Patterns
&lt;/h3&gt;

&lt;p&gt;These fire when a change in one function should have propagated to related functions but didn't.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;AI-Native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Delete/Update Without Cascading to Related Entities&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event Emission Dropped During Refactor&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write Handler Loses Cache Invalidation&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Handler Swallows Exceptions Silently&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;forEach(async ...)&lt;/code&gt; Fire-and-Forget&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Path Exists in Code but Has No Test&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Schema-Migration Pattern
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORM Schema Updated Without Migration File&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Type-Widening Patterns
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;AI-Native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Return Type Widened to Include Null&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nested Property Access Without Null Guard&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Test Suite Has No Boundary/Edge Case Assertions&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Config-Drift Pattern
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;AI-Native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Provider Config Updated in One File, Missed Elsewhere&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why These Patterns Are AI-Native
&lt;/h2&gt;

&lt;p&gt;The catalog distinguishes AI-native patterns — bugs that appear significantly more often in AI-generated code — from classic bugs that affect all code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;async-forEach-fire-forget&lt;/code&gt;&lt;/strong&gt;: Human developers know &lt;code&gt;forEach&lt;/code&gt; is synchronous. LLMs learn the &lt;code&gt;async/await&lt;/code&gt; pattern but not the iterator contract. The code looks correct and runs. The async operations complete at undefined times, errors are silently swallowed, and callers observe completion before the work is done. (&lt;a href="https://github.com/microsoft/vscode/pull/304898" rel="noopener noreferrer"&gt;microsoft/vscode#304898&lt;/a&gt; — &lt;code&gt;forEach(async ...)&lt;/code&gt; causing fire-and-forget promises; fixed by switching to &lt;code&gt;Promise.all&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;optional-chain-assumed-truthy&lt;/code&gt;&lt;/strong&gt;: LLMs train on happy-path code. Null and undefined branches appear far less frequently in training data. Guards are omitted systematically, not randomly. (&lt;a href="https://github.com/open-feature/java-sdk-contrib/pull/1709" rel="noopener noreferrer"&gt;open-feature/java-sdk-contrib#1709&lt;/a&gt; — null pointer from missing descriptor check on optional value.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;config-drift-across-files&lt;/code&gt;&lt;/strong&gt;: When an AI updates a file, it applies the change to that file. It doesn't grep the codebase for all other references the way a human doing a provider swap would. (&lt;a href="https://github.com/MalikAhmad911/infinite-rankers/pull/1" rel="noopener noreferrer"&gt;MalikAhmad911/infinite-rankers#1&lt;/a&gt; — Claude renamed &lt;code&gt;NEON_DATABASE_URL&lt;/code&gt; → &lt;code&gt;DATABASE_URL&lt;/code&gt; in code; &lt;code&gt;.env.example&lt;/code&gt; still had the old key.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;mocked-dependency-circular-test&lt;/code&gt;&lt;/strong&gt;: Both the code and the mock are generated in the same session with the same context window. The mock reflects the AI's assumption about the dependency contract. If that assumption is wrong, the test is a proof of the wrong thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;boundary-values-untested&lt;/code&gt;&lt;/strong&gt;: Training data for test generation overwhelmingly shows success-path tests. Empty arrays, zero, null, empty string, max integer — these inputs appear far less often in examples. The test suite looks comprehensive and isn't.&lt;/p&gt;

&lt;p&gt;In all these cases, the pattern is not a quality failure. It's a predictable artifact of the distribution the model learned from.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real OSS Evidence
&lt;/h2&gt;

&lt;p&gt;Every non-provisional pattern in the catalog has at least one confirmed incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;transaction-missing&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;LedgerService.recordPayment()&lt;/code&gt; writes DEBIT and CREDIT entries sequentially with no transaction boundary. Crash between writes leaves the ledger permanently imbalanced. A second incident: &lt;code&gt;SaveEvmTransaction&lt;/code&gt;, &lt;code&gt;SaveEvmLog&lt;/code&gt;, &lt;code&gt;NextEvmBlock&lt;/code&gt; executed as independent statements — orphaned records on partial failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;auth-ownership-gap&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;router()&lt;/code&gt; exposed publicly without auth middleware. &lt;code&gt;post_scope_check&lt;/code&gt; accepted arbitrary &lt;code&gt;project_id&lt;/code&gt; without verifying the caller owned the resource. Every authenticated user could read every other user's data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;migration-drift&lt;/code&gt;&lt;/strong&gt; — Prisma schema for &lt;code&gt;LiteLLM_MCPServerTable&lt;/code&gt; updated with &lt;code&gt;source_url&lt;/code&gt; column. Migration file not generated. Column missing on every container restart causing runtime errors. (&lt;a href="https://github.com/BerriAI/litellm/issues/24433" rel="noopener noreferrer"&gt;BerriAI/litellm#24433&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;cascade-blindness&lt;/code&gt;&lt;/strong&gt; — Claude dead-code removal deleted &lt;code&gt;get_model()&lt;/code&gt; from &lt;code&gt;function_app.py&lt;/code&gt; because it appeared unused. &lt;code&gt;fb_gen.py&lt;/code&gt; called it internally. Two separate restore commits required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;params-renamed&lt;/code&gt;&lt;/strong&gt; — LangChain: dispatch built &lt;code&gt;{"path": path}&lt;/code&gt; but both &lt;code&gt;_handle_rename&lt;/code&gt; implementations read &lt;code&gt;args["old_path"]&lt;/code&gt;. &lt;code&gt;KeyError&lt;/code&gt; on every rename. Two classes with identical bugs, written in separate AI sessions from the same wrong template. (&lt;a href="https://github.com/langchain-ai/langchain/issues/35852" rel="noopener noreferrer"&gt;langchain-ai/langchain#35852&lt;/a&gt;)&lt;/p&gt;




&lt;h2&gt;
  
  
  The LangChain Case in Detail
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;langchain-ai__langchain-35871&lt;/code&gt; is the instance that best illustrates why the standard answer — "just review the diff" — is insufficient.&lt;/p&gt;

&lt;p&gt;Two middleware classes, &lt;code&gt;_StateClaudeFileToolMiddleware&lt;/code&gt; and &lt;code&gt;_FilesystemClaudeFileToolMiddleware&lt;/code&gt;, were generated from the same flawed template in &lt;strong&gt;separate AI coding sessions&lt;/strong&gt;. Both classes had the same bug: dispatch sent &lt;code&gt;{"path": path}&lt;/code&gt; but both &lt;code&gt;_handle_rename&lt;/code&gt; implementations read &lt;code&gt;args["old_path"]&lt;/code&gt;. A &lt;code&gt;KeyError&lt;/code&gt; on every rename call.&lt;/p&gt;

&lt;p&gt;Neither session knew about the other. Neither test caught it. The bug was present in both classes because AI reproduces the pattern it learned — and both sessions learned from the same wrong template.&lt;/p&gt;

&lt;p&gt;This is cascade-blindness at the codebase level: not a cascade within one function, but a cascade across the entire assumption set the AI inherited from its context.&lt;/p&gt;

&lt;p&gt;The test Optinum generates for this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# cascade-blindness: sibling classes share a broken assumption
# Both _handle_rename implementations read args['old_path']
# but dispatch sends args['path'] — KeyError on every rename
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_file_tool_middleware_rename_dispatch_key&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.callbacks.manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;_StateClaudeFileToolMiddleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;_FilesystemClaudeFileToolMiddleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_StateClaudeFileToolMiddleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_FilesystemClaudeFileToolMiddleware&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# Dispatch sends 'path', not 'old_path'
&lt;/span&gt;        &lt;span class="c1"&gt;# Pre-fix: KeyError on every rename
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handle_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How Optinum Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: AST blast radius
&lt;/h3&gt;

&lt;p&gt;Optinum parses the diff to find every changed function. It then walks the AST to find every function that calls, inherits from, or is a sibling of the changed functions — in the same file and in the dependency graph.&lt;/p&gt;

&lt;p&gt;The output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;changed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_build_repr&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sklearn/model_selection/_split.py&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="nx"&gt;dependents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;__repr__&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sklearn/model_selection/_split.py&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;get_n_splits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sklearn/model_selection/_split.py&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="nx"&gt;highFanOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;dependents&lt;/code&gt; are the functions not in the diff but in the blast radius. These are exactly what the AI's own tests will miss.&lt;/p&gt;

&lt;p&gt;The router handles both Python and TypeScript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseBlastRadius&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changedFiles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nx"&gt;projectRoot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pyFiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;changedFiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.py&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tsFiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;changedFiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.ts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.tsx&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Dispatch to py-parser or ts-parser, merge results&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Catalog classification
&lt;/h3&gt;

&lt;p&gt;The blast radius feeds into the classifier, which maps the change to a pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cascade-change detected:
  2 non-test files changed
  _build_repr modified, __repr__ not updated
  Pattern: cascade-blindness (severity: high)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Two-layer synthesis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Layer 1&lt;/strong&gt; generates tests for the changed functions using blast radius + API contracts as grounding. &lt;strong&gt;Layer 2&lt;/strong&gt; generates AI-blind-spot tests from the matched catalog pattern — these are the tests that wouldn't appear in a normal AI-written suite.&lt;/p&gt;

&lt;p&gt;For Python projects, the synthesis switches idioms automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generated for Python ecosystem
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;

&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# cascade-blindness: sibling method not updated in fix
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_split_cascade_sibling_repr&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/model_selection/split&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_splits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Validation loop
&lt;/h3&gt;

&lt;p&gt;Generated tests pass through three self-validation cycles: structural correctness, pattern targeting, and mutation testing. Tests that don't survive the loop are regenerated.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 10 Gaps, Laid Out
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance&lt;/th&gt;
&lt;th&gt;Change Type&lt;/th&gt;
&lt;th&gt;AI Gap&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;django__django-11066&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;contract-change&lt;/td&gt;
&lt;td&gt;YES — &lt;code&gt;_rename()&lt;/code&gt; saves to wrong database; test only checks default DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;django__django-12589&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;contract-change&lt;/td&gt;
&lt;td&gt;YES — positional args removed from &lt;code&gt;filter()&lt;/code&gt;; callers break silently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;django__django-14855&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;cascade-change&lt;/td&gt;
&lt;td&gt;YES — &lt;code&gt;prefetch_related&lt;/code&gt; cache fix not propagated to &lt;code&gt;add()&lt;/code&gt;/&lt;code&gt;remove()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;django__django-15695&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;cascade-change&lt;/td&gt;
&lt;td&gt;YES — &lt;code&gt;RenameIndex&lt;/code&gt; crash on backward move not tested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;langchain-ai__langchain-35871&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;cascade-change&lt;/td&gt;
&lt;td&gt;YES — two classes, same broken key, separate AI sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;matplotlib__matplotlib-23413&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;type-widening&lt;/td&gt;
&lt;td&gt;YES — &lt;code&gt;bottom=None&lt;/code&gt; path not tested by callers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;psf__requests-1724&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;contract-change&lt;/td&gt;
&lt;td&gt;YES — bytes method names not normalized before &lt;code&gt;.upper()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;scikit-learn__scikit-learn-14983&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;cascade-change&lt;/td&gt;
&lt;td&gt;YES — &lt;code&gt;__repr__&lt;/code&gt; added to &lt;code&gt;_RepeatedSplits&lt;/code&gt; but &lt;code&gt;_build_repr&lt;/code&gt; cvargs lookup not tested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sphinx-doc__sphinx-8265&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;type-widening&lt;/td&gt;
&lt;td&gt;YES — docstring default arg None handling untested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sphinx-doc__sphinx-9367&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;contract-change&lt;/td&gt;
&lt;td&gt;YES — &lt;code&gt;Config.init_values()&lt;/code&gt; signature change breaks extension callers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 6 non-gap instances either had no AI-written test at the time (human-authored fix) or the AI happened to cover the right class. The 10 gap instances all had AI tests that covered the fix but not the pattern Optinum targets.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Isn't
&lt;/h2&gt;

&lt;p&gt;Optinum does not replace your test suite. It does not run your existing tests. It does not lint your code or rewrite anything.&lt;/p&gt;

&lt;p&gt;It generates a focused set of tests targeting the one category of failure that AI-written code most consistently misses: the assumption that what wasn't changed doesn't need testing.&lt;/p&gt;

&lt;p&gt;The test suite AI writes is correct for what it tested. The question Optinum asks is: &lt;em&gt;what didn't it test?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;62 blind spot tests. Under 2 minutes. Three production AI-native repos.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The current pipeline uses catalog-based pattern matching for test classification. The next version closes the loop fully: synthesize test via LLM → run in Docker → if the test doesn't fail on the bug commit, regenerate and retry. The loop terminates when &lt;code&gt;test_fails_on_bug: true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The catalog will grow as more AI-native patterns are confirmed with OSS evidence. Every confirmed incident is a new row. Every new row is a new category of test the tool generates automatically.&lt;/p&gt;

&lt;p&gt;If you're shipping AI-generated code and you haven't asked what its test suites systematically miss — you now have a list.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Install Optinum globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; github:anhnguyensynctree/optinum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run pattern detection on the cascade-blindness example from the blog:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;optinum &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--diff&lt;/span&gt; demo/cascade-blindness.diff
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TypeScript tests written to optinum-tests/generated.test.ts

Detected blind spot patterns:
  • Delete/Update Without Cascading to Related Entities
  • Event Emission Dropped During Refactor
  • Error Handler Swallows Exceptions Silently
  • Renamed API Parameters
  • Changed Response Shape
  • New Required Field Not Sent by Callers
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Pattern detection works offline. Test synthesis (with the &lt;code&gt;--claude&lt;/code&gt; flag) requires a Claude Code subscription.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;SWE-bench Verified dataset: &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified" rel="noopener noreferrer"&gt;princeton-nlp/SWE-bench_Verified&lt;/a&gt; (500 instances, human-verified patches).&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Pilot methodology: 16 instances, ground-truth change-type labels, catalog-classification mode.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Execution proof: sympy__sympy-18199, commit ba80d1e, verified in isolated Docker container.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>devtools</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
