<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: IshvaTheGuru</title>
    <description>The latest articles on DEV Community by IshvaTheGuru (@ishvatheguru).</description>
    <link>https://dev.to/ishvatheguru</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3999749%2Fc3bc122f-c8e2-4f6d-b64b-97183bbe94f3.png</url>
      <title>DEV Community: IshvaTheGuru</title>
      <link>https://dev.to/ishvatheguru</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ishvatheguru"/>
    <language>en</language>
    <item>
      <title>"Prove your AI-written code — or get the exact input that breaks it"</title>
      <dc:creator>IshvaTheGuru</dc:creator>
      <pubDate>Wed, 24 Jun 2026 04:54:39 +0000</pubDate>
      <link>https://dev.to/ishvatheguru/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it-5bon</link>
      <guid>https://dev.to/ishvatheguru/prove-your-ai-written-code-or-get-the-exact-input-that-breaks-it-5bon</guid>
      <description>&lt;p&gt;tags: python, opensource, ai, devtools&lt;/p&gt;

&lt;h2&gt;
  
  
  cover_image: &lt;a href="https://raw.githubusercontent.com/ishvaproducts-png/ishvacerto/main/assets/social-preview.png" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/ishvaproducts-png/ishvacerto/main/assets/social-preview.png&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;AI coding assistants are fast, and they ship confident bugs. The output looks right, the explanation sounds right, and the failing case turns up in production. The missing piece isn't a smarter generator — it's something that can &lt;em&gt;check&lt;/em&gt; the generated code and refuse to bluff when it can't.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ishvaproducts-png/ishvacerto" rel="noopener noreferrer"&gt;&lt;code&gt;ishvacerto&lt;/code&gt;&lt;/a&gt; is that gate. Give it a function and a way to check it — its own doctests, your tests, or a reference implementation — and it returns exactly one of three answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;VERIFIED&lt;/strong&gt; — it passed the captured spec on every input the gate could exercise.&lt;/li&gt;
&lt;li&gt;❌ &lt;strong&gt;REFUTED&lt;/strong&gt; — it fails, and here is the &lt;em&gt;exact failing input&lt;/em&gt;. Not "looks suspicious." For example: &lt;code&gt;REFUTED [doctest] fn=square counterexample: square(3) (got 6, expected 9)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;🤷 &lt;strong&gt;ABSTAIN&lt;/strong&gt; — no checkable spec could be captured, so it says so instead of rubber-stamping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole promise lives in that third answer. &lt;strong&gt;Never wrong, sometimes silent.&lt;/strong&gt; It verifies what it can check and abstains on the rest — which is exactly why it never false-alarms on correct code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ishvacerto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ishvacerto&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verify_against_reference&lt;/span&gt;

&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;                    &lt;span class="c1"&gt;# uses the code's own doctests
&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f(3)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;            &lt;span class="c1"&gt;# against your tests
&lt;/span&gt;&lt;span class="nf"&gt;verify_against_reference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# where does it diverge from a reference?
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the command line (exits &lt;code&gt;1&lt;/code&gt; on REFUTED, so it gates CI directly):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ishvacerto my_function.py
ishvacerto &lt;span class="nt"&gt;--ref&lt;/span&gt; reference.py &lt;span class="nt"&gt;--entry&lt;/span&gt; my_func ai_generated.py   &lt;span class="c"&gt;# differential&lt;/span&gt;
ishvacerto &lt;span class="nt"&gt;--json&lt;/span&gt; my_function.py                                &lt;span class="c"&gt;# machine-readable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Measured, not asserted
&lt;/h2&gt;

&lt;p&gt;You can reproduce the headline numbers yourself — there's a script in the repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python benchmarks/humaneval_gate.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the real &lt;strong&gt;HumanEval&lt;/strong&gt; benchmark (164 problems), the gate produces &lt;strong&gt;0 false alarms&lt;/strong&gt; on the canonical correct solutions, captures a checkable doctest spec on &lt;strong&gt;76/164 (~46%)&lt;/strong&gt; of problems, and abstains on the rest. It even flags HumanEval's &lt;em&gt;own&lt;/em&gt; wrong doctest (problem 47) as a spec/code conflict rather than a false alarm — it caught a benchmark bug instead of blaming the code.&lt;/p&gt;

&lt;p&gt;Coverage grows with the spec or reference you give it. The roadmap is a &lt;strong&gt;reference proposer&lt;/strong&gt; that retrieves a same-task verified reference for code that ships with no tests, widening reach while keeping false alarms at zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it decides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VERIFIED&lt;/strong&gt; only if the captured spec passed on every input it could exercise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REFUTED&lt;/strong&gt; only on a clean mismatch — and it tells you the input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ABSTAIN&lt;/strong&gt; if it couldn't capture a usable spec. That discipline is what keeps it from false-alarming on good code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The differential mode is the fun part: it generates inputs, runs the candidate and the reference, and shows the &lt;strong&gt;first input where they disagree&lt;/strong&gt;. Input generation is signature-agnostic — it produces generic argument tuples, lets the reference filter the valid ones, and abstains if it can't exercise at least one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering
&lt;/h2&gt;

&lt;p&gt;Pure Python &lt;strong&gt;standard library&lt;/strong&gt;, &lt;strong&gt;zero dependencies&lt;/strong&gt;, &lt;strong&gt;13/13&lt;/strong&gt; tests, CI green on Python &lt;strong&gt;3.9 / 3.11 / 3.12&lt;/strong&gt;, MIT. It &lt;strong&gt;runs entirely on your machine&lt;/strong&gt; — no account, no cloud, no telemetry, your code never leaves the box. There's also a VS Code extension that shows the counterexample inline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest scope
&lt;/h2&gt;

&lt;p&gt;It verifies what it can check and &lt;strong&gt;abstains on the rest&lt;/strong&gt; — coverage is a function of the spec or reference you give it, never a guess. And the subprocess timeout guards against hangs; it is &lt;strong&gt;not&lt;/strong&gt; a security sandbox, so verify code whose source you trust (your own assistant's output) or run it in a container.&lt;/p&gt;

&lt;p&gt;It doesn't compete with your AI coder — it makes its output &lt;strong&gt;safe to ship&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;⭐ MIT, free, and the measurements are reproducible: &lt;strong&gt;&lt;a href="https://github.com/ishvaproducts-png/ishvacerto" rel="noopener noreferrer"&gt;https://github.com/ishvaproducts-png/ishvacerto&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ishvacerto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
