<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Éric Jacopin</title>
    <description>The latest articles on DEV Community by Éric Jacopin (@pcfvw).</description>
    <link>https://dev.to/pcfvw</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3714313%2Ffe297d44-115e-493f-b93c-1a9844693cf5.png</url>
      <title>DEV Community: Éric Jacopin</title>
      <link>https://dev.to/pcfvw</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pcfvw"/>
    <language>en</language>
    <item>
      <title>In 2023, 52% of Python Devs Used Pytest. In 2026, 100% of AI Models Understand Doctests.</title>
      <dc:creator>Éric Jacopin</dc:creator>
      <pubDate>Wed, 21 Jan 2026 13:37:34 +0000</pubDate>
      <link>https://dev.to/pcfvw/in-2023-52-of-python-devs-used-pytest-in-2026-100-of-ai-models-understand-doctests-23m3</link>
      <guid>https://dev.to/pcfvw/in-2023-52-of-python-devs-used-pytest-in-2026-100-of-ai-models-understand-doctests-23m3</guid>
      <description>&lt;p&gt;&lt;em&gt;The testing format nobody uses is the one every AI actually understands.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You're probably using pytest. So is 52% of the Python community, according to the &lt;a href="https://lp.jetbrains.com/python-developers-survey-2023/" rel="noopener noreferrer"&gt;JetBrains Python Developer Survey 2023&lt;/a&gt;. It's powerful, flexible, and has an incredible plugin ecosystem.&lt;/p&gt;

&lt;p&gt;Meanwhile, doctests sit at 9%. The forgotten sibling. "Too simple for real testing." "Just for documentation examples." "Nobody uses those anymore."&lt;/p&gt;

&lt;p&gt;But here's what we discovered after testing 10 different AI models on code generation tasks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every single model—100%—preserves doctests perfectly.&lt;/strong&gt; (When you include doctests in your prompt, the AI keeps them intact in its generated code.)&lt;/p&gt;

&lt;p&gt;Not pytest. Not unittest. Doctests.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;We ran a systematic experiment across 10 large language models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Haiku 3, Haiku 4.5, Sonnet 4, Sonnet 4.5, Opus 4, Opus 4.1, Opus 4.5&lt;/li&gt;
&lt;li&gt;Mistral Medium (mistral-medium-2508), Devstral (devstral-2512)&lt;/li&gt;
&lt;li&gt;EssentialAI RNJ-1 (a 5GB model you can run locally with LM Studio)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Why these models?&lt;/em&gt; Anthropic's Claude powers popular AI coding tools like Kiro and Augment Code. Mistral offers a competitive alternative that's less explored. And EssentialAI's RNJ-1 tests whether the finding holds for small, locally-run models — if a 5GB model on your laptop gets it right, this isn't just "big model magic."&lt;/p&gt;

&lt;p&gt;The task: generate implementations for functions that included test cases in the prompt.&lt;/p&gt;

&lt;p&gt;The question: which test formats do AI models preserve?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Format&lt;/th&gt;
&lt;th&gt;Preservation Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python doctests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100% (all 10 models)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust &lt;code&gt;#[test]&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;100% (Sonnet models)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zig &lt;code&gt;test&lt;/code&gt; blocks&lt;/td&gt;
&lt;td&gt;100% (Sonnet models)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go &lt;code&gt;_test.go&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C++ gtest&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript Jest&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Python doctests were the &lt;strong&gt;only&lt;/strong&gt; format that achieved universal preservation. Every model. Every time.&lt;/p&gt;

&lt;p&gt;And yes, that includes EssentialAI's RNJ-1—a 5GB model running on a laptop. No cloud API required. No expensive tokens. Just a small local model that somehow knows exactly what to do with &lt;code&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Doctests?
&lt;/h2&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Inline Structure
&lt;/h3&gt;

&lt;p&gt;Doctests live &lt;em&gt;inside&lt;/em&gt; the docstring, which lives &lt;em&gt;inside&lt;/em&gt; the function. There's no file boundary confusion. No ambiguity about whether tests are "part of this" or "somewhere else."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return the nth Fibonacci number.
&lt;/span&gt;&lt;span class="gp"&gt;
    &amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="mi"&gt;55&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;  &lt;span class="c1"&gt;# AI will implement this AND preserve the doctests
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you give this to an AI, the structure says: "these tests are part of the function's definition." The AI preserves them.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Unambiguous Syntax
&lt;/h3&gt;

&lt;p&gt;There's exactly one way to write a doctest: &lt;code&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt; followed by code, then the expected output on the next line.&lt;/p&gt;

&lt;p&gt;No decorators to remember. No assertion library to import. No class inheritance. Just &lt;code&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;AI models thrive on unambiguous patterns. Doctests are as unambiguous as it gets.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Training Data Ubiquity
&lt;/h3&gt;

&lt;p&gt;Doctests appear everywhere in Python's ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The official Python documentation&lt;/li&gt;
&lt;li&gt;Standard library docstrings&lt;/li&gt;
&lt;li&gt;Countless tutorials and examples&lt;/li&gt;
&lt;li&gt;Stack Overflow answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every model has seen thousands of doctests during training. They're part of Python's DNA.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Irony
&lt;/h2&gt;

&lt;p&gt;The Python community moved away from doctests because they're "too simple." The criticisms are well-documented:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Probably the most significant limitation of doctest compared to other testing frameworks is the lack of features equivalent to fixtures in pytest."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://realpython.com/python-doctest/" rel="noopener noreferrer"&gt;Real Python&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Doctests are not a replacement for unit tests... You should continue using unit tests for structured, scalable, and thorough validation of the behavior of your code."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://medium.com/@laurentkubaski/python-doctest-module-or-how-to-incorporate-unit-tests-in-your-docstrings-7c5cc55cc632" rel="noopener noreferrer"&gt;Laurent Kubaski, Medium&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Though doctest is an extremely useful module, the examples we write in docstrings are only simple cases meant to illustrate typical uses of the function. As functions get more complex, we'll require more extensive tests... We could put all these tests into the function docstrings, but that would make the docstrings far too long. So instead, we will use pytest."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://www.teach.cs.toronto.edu/~csc110y/fall/notes/02-functions/08-testing-functions-1.html" rel="noopener noreferrer"&gt;University of Toronto CS Course&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Doesn't support parameterized testing. Advanced testing features like Test Discovery, Fixtures, etc not supported."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://pytest-with-eric.com/comparisons/python-testing-frameworks/" rel="noopener noreferrer"&gt;Pytest with Eric&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The verdict is clear: doctests are for documentation examples, not "real" testing. Too simple. No fixtures. No parameterization. No mocking.&lt;/p&gt;

&lt;p&gt;All true.&lt;/p&gt;

&lt;p&gt;But here's the twist: &lt;strong&gt;those same "limitations" are exactly why AI models handle them perfectly.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Doctest "Limitation"&lt;/th&gt;
&lt;th&gt;Why AI Loves It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Too simple&lt;/td&gt;
&lt;td&gt;Unambiguous for the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No fixtures&lt;/td&gt;
&lt;td&gt;No external state to track&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No parameterization&lt;/td&gt;
&lt;td&gt;Each test is self-contained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No mocking&lt;/td&gt;
&lt;td&gt;No hidden complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exact output matching&lt;/td&gt;
&lt;td&gt;Clear success criteria&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We optimized our testing for human power users. AI models—at least every one we tested in 2026—prefer the beginner-friendly format we left behind.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Practical Recommendation
&lt;/h2&gt;

&lt;p&gt;I'm not saying abandon pytest. Keep it. It's great for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex test scenarios&lt;/li&gt;
&lt;li&gt;Fixtures and setup/teardown&lt;/li&gt;
&lt;li&gt;Parameterized testing&lt;/li&gt;
&lt;li&gt;CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Coverage reporting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But consider &lt;strong&gt;dual-testing&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  For AI-Assisted Development
&lt;/h3&gt;

&lt;p&gt;Use doctests as &lt;em&gt;a template&lt;/em&gt; when prompting AI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract email components from a string.
&lt;/span&gt;&lt;span class="gp"&gt;
    &amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;John Doe &amp;lt;john@example.com&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;John Doe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;john@example.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jane@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jane@example.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;Traceback &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;most&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Invalid&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;  &lt;span class="c1"&gt;# Ask AI to implement
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The doctests communicate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The expected input/output contract&lt;/li&gt;
&lt;li&gt;Edge cases to handle&lt;/li&gt;
&lt;li&gt;Error conditions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And the AI &lt;strong&gt;will&lt;/strong&gt; preserve them—giving you working tests from the start.&lt;/p&gt;

&lt;h3&gt;
  
  
  For Comprehensive Testing
&lt;/h3&gt;

&lt;p&gt;Keep your pytest suite for everything else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_email.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mymodule&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parse_email&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;email_samples&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;load_test_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emails.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.mark.parametrize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input,expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[...])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_parse_email_parametrized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_parse_email_performance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;benchmark&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Doctests for AI prompts. Pytest for human-scale testing.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Next time you're about to ask an AI to implement a function:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write the function signature&lt;/li&gt;
&lt;li&gt;Add a docstring with 2-3 doctests showing expected behavior (like the &lt;code&gt;parse_email&lt;/code&gt; example above)&lt;/li&gt;
&lt;li&gt;Ask the AI to implement it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Compare this to your usual prompt. I bet you'll notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The AI preserves your test cases&lt;/li&gt;
&lt;li&gt;The implementation matches your examples&lt;/li&gt;
&lt;li&gt;You have working tests immediately&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This isn't just about doctests. It's about a broader pattern we discovered:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inline test formats work. External test formats don't.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you include tests in your prompt, does the AI preserve them in its output? We measured &lt;em&gt;test preservation rate&lt;/em&gt; — 100% means every test you provided appears in the generated code, intact. 0% means the AI ignored your tests entirely.&lt;/p&gt;

&lt;p&gt;The pattern was stark: &lt;strong&gt;inline tests (Python doctests, Rust &lt;code&gt;#[test]&lt;/code&gt;, Zig &lt;code&gt;test&lt;/code&gt; blocks) achieved 100% preservation. External tests (Go &lt;code&gt;_test.go&lt;/code&gt;, C++ gtest, TypeScript Jest) achieved 0%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When tests are &lt;em&gt;structurally part of&lt;/em&gt; the code, AI preserves them. When tests are &lt;em&gt;separate files&lt;/em&gt;, AI ignores them.&lt;/p&gt;

&lt;p&gt;Doctests just happen to be the most universal example of this pattern—the one format that works across every model we tested, including tiny local ones.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In 2023, pytest dominated with 52%. Doctests languished at 9%.&lt;/p&gt;

&lt;p&gt;In 2026, AI models tell a different story: doctests are the universal language of test specification.&lt;/p&gt;

&lt;p&gt;The 9% might be onto something. If you're using AI coding tools, consider joining them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is based on findings from a larger experiment on AI code generation across multiple languages and models. Full results available at &lt;a href="https://github.com/PCfVW/d-Heap-priority-queue" rel="noopener noreferrer"&gt;d-Heap-priority-queue&lt;/a&gt;. See also &lt;a href="https://github.com/PCfVW/Amphigraphic-Strict" rel="noopener noreferrer"&gt;Amphigraphic-Strict&lt;/a&gt; for strict language subsets optimized for AI-assisted development.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have you tried using doctests with AI assistants? Please share your experience in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>testing</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
