<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yaniv </title>
    <description>The latest articles on DEV Community by Yaniv  (@yaniv2809).</description>
    <link>https://dev.to/yaniv2809</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872105%2F08b03eee-3e37-4859-bab2-f155bcea6a16.png</url>
      <title>DEV Community: Yaniv </title>
      <link>https://dev.to/yaniv2809</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yaniv2809"/>
    <language>en</language>
    <item>
      <title>Why I Built an AI-Powered Test Data Generator (and When You Shouldn't Use AI for Fixtures)</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Fri, 17 Apr 2026 19:11:03 +0000</pubDate>
      <link>https://dev.to/yaniv2809/why-i-built-an-ai-powered-test-data-generator-and-when-you-shouldnt-use-ai-for-fixtures-3e4a</link>
      <guid>https://dev.to/yaniv2809/why-i-built-an-ai-powered-test-data-generator-and-when-you-shouldnt-use-ai-for-fixtures-3e4a</guid>
      <description>&lt;p&gt;Every test suite has the same dirty secret: &lt;code&gt;name="Test User"&lt;/code&gt;, &lt;code&gt;email="test@test.com"&lt;/code&gt;, &lt;code&gt;bio="Lorem ipsum"&lt;/code&gt;. Copy-pasted across 50 tests, never catching real edge cases, never feeling like production data.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/Yaniv2809/fixtureforge" rel="noopener noreferrer"&gt;FixtureForge&lt;/a&gt; to fix this — but along the way, I learned that AI is the wrong tool for most of the problem. Here's what I mean.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With "Just Use Faker"
&lt;/h2&gt;

&lt;p&gt;Faker is great for structured fields — names, emails, phone numbers, addresses. But it can't generate a realistic user bio, a convincing product review, or an angry customer complaint that actually tests your edge cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is what most test data looks like:
&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@test.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lorem ipsum...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# It doesn't catch real-world edge cases.
# It doesn't feel like production data.
# Writing 500 of them by hand? Not happening.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The obvious answer in 2026 is "use AI." But sending every field to an LLM is expensive, slow, and unnecessary. An email address doesn't need AI. An auto-incrementing ID definitely doesn't need AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight: Only Semantic Fields Need AI
&lt;/h2&gt;

&lt;p&gt;FixtureForge splits every model field into four tiers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;Generator&lt;/th&gt;
&lt;th&gt;API Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Structural&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;created_at&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Internal counters / FK registry&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;name&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt;, &lt;code&gt;phone&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Faker&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Computed&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@computed_field&lt;/code&gt; properties&lt;/td&gt;
&lt;td&gt;Pydantic&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;bio&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, &lt;code&gt;review&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;LLM (batched)&lt;/td&gt;
&lt;td&gt;API tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key: 100 users with 2 semantic fields = &lt;strong&gt;2 API calls&lt;/strong&gt;, not 200. FixtureForge batches all semantic values into a single prompt and asks the LLM to return a JSON array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fixtureforge&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Forge&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Forge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SaaS platform users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FixtureForge routes &lt;code&gt;id&lt;/code&gt; to a counter, &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;email&lt;/code&gt; to Faker, and only &lt;code&gt;bio&lt;/code&gt; hits the AI — once, for all 50 records.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI Mode: No AI, No Network, No Flakiness
&lt;/h2&gt;

&lt;p&gt;This is the part that matters most. In CI, you don't want non-deterministic AI calls making your pipeline flaky. FixtureForge has a deterministic mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Forge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_ai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Identical output every run — no network calls
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;seed=42&lt;/code&gt; guarantees byte-identical output across every run, every machine. Faker handles the standard fields deterministically, and semantic fields fall back to template-based generation. No API key required.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Context Parameter Is Where It Gets Interesting
&lt;/h2&gt;

&lt;p&gt;The real power isn't generating random data — it's generating data that tests specific scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;angry_users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1-star reviews from angry holiday shoppers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;bio&lt;/code&gt; or &lt;code&gt;review&lt;/code&gt; field comes back with realistic frustration, specific complaints, edge-case formatting (ALL CAPS, emoji, long rants). This is the kind of data that catches bugs in text processing, truncation, rendering, and content moderation — bugs that &lt;code&gt;"Lorem ipsum"&lt;/code&gt; never finds.&lt;/p&gt;

&lt;h2&gt;
  
  
  pytest Integration
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;conftest.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fixtureforge&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;forge_fixture&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;

&lt;span class="nf"&gt;forge_fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;forge_fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In your tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_users_have_emails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_order_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;forge&lt;/code&gt; fixture is auto-available. No factory classes to maintain, no fixture files to update.&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Should NOT Use This
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the limitations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use FixtureForge if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your tests only need IDs and emails — Faker alone is sufficient and simpler&lt;/li&gt;
&lt;li&gt;You're in a strict air-gapped environment with no API access — CI mode works, but you lose the AI-generated quality&lt;/li&gt;
&lt;li&gt;Your test data needs to match a specific production database schema exactly — use database dumps or migrations instead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Do use FixtureForge if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need realistic text content (bios, reviews, descriptions) at scale&lt;/li&gt;
&lt;li&gt;You want to test edge cases in text processing without writing them by hand&lt;/li&gt;
&lt;li&gt;You need deterministic CI with realistic dev-time data from one tool&lt;/li&gt;
&lt;li&gt;You're tired of maintaining factory_boy factory classes for every model change&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Compares
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;FixtureForge&lt;/th&gt;
&lt;th&gt;factory_boy&lt;/th&gt;
&lt;th&gt;faker&lt;/th&gt;
&lt;th&gt;hypothesis&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI-generated content&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic seed&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FK relationships&lt;/td&gt;
&lt;td&gt;Auto&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pytest plugin&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Via pytest-factoryboy&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large datasets (100k+)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Manual loops&lt;/td&gt;
&lt;td&gt;Manual loops&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zero config&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Factory classes needed&lt;/td&gt;
&lt;td&gt;Provider setup&lt;/td&gt;
&lt;td&gt;Strategy setup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;FixtureForge isn't a replacement for Faker — it uses Faker internally. It's the layer between "I need data" and "I need it to feel real."&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fixtureforge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Yaniv2809/fixtureforge" rel="noopener noreferrer"&gt;Yaniv2809/fixtureforge&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://yaniv2809.github.io/fixtureforge/" rel="noopener noreferrer"&gt;yaniv2809.github.io/fixtureforge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've built something similar or have opinions on AI-generated test data vs traditional fixtures, I'd like to hear about it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Yaniv Metuku (yaniv2809) — QA Automation Engineer. Also building &lt;a href="https://github.com/Yaniv2809/Financial-Integrity-Ecosystem" rel="noopener noreferrer"&gt;Financial-Integrity-Ecosystem&lt;/a&gt; and &lt;a href="https://pypi.org/project/failscope/" rel="noopener noreferrer"&gt;Failscope&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Stop writing fake test data by hand — I built a library that generates it for you</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:03:16 +0000</pubDate>
      <link>https://dev.to/yaniv2809/stop-writing-fake-test-data-by-hand-i-built-a-library-that-generates-it-for-you-2k0f</link>
      <guid>https://dev.to/yaniv2809/stop-writing-fake-test-data-by-hand-i-built-a-library-that-generates-it-for-you-2k0f</guid>
      <description>&lt;p&gt;Every Python project I've worked on has the same problem in the test suite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test@test.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lorem ipsum dolor sit amet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's not realistic. It doesn't catch edge cases. And when you need 200 of them,&lt;br&gt;
nobody writes them — you just copy-paste the same record and pretend it's a dataset.&lt;/p&gt;

&lt;p&gt;I got tired of this and built &lt;a href="https://github.com/Yaniv2809/fixtureforge" rel="noopener noreferrer"&gt;FixtureForge&lt;/a&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;Define a Pydantic model. Get realistic data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fixtureforge&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Forge&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Forge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SaaS platform users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FixtureForge routes each field to the right generator:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Generator&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sequential counter&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;name&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Faker&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bio&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LLM (batched)&lt;/td&gt;
&lt;td&gt;1 API call for all 50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Only semantic fields — descriptions, bios, reviews, messages — hit the AI.&lt;br&gt;
Everything else is free.&lt;/p&gt;


&lt;h2&gt;
  
  
  CI mode: zero AI, fully deterministic
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Forge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_ai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Same output on every machine, every run, forever
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;seed=&lt;/code&gt; parameter controls both Faker and random generation at the instance level —&lt;br&gt;
two &lt;code&gt;Forge(seed=42)&lt;/code&gt; instances produce identical data without interfering with each other.&lt;/p&gt;


&lt;h2&gt;
  
  
  pytest plugin
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# conftest.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fixtureforge&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;forge_fixture&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;

&lt;span class="nf"&gt;forge_fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;forge_fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# test_users.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_all_users_have_emails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_order_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;No boilerplate. Fixtures are named automatically from the model&lt;br&gt;
(&lt;code&gt;User&lt;/code&gt; → &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;OrderItem&lt;/code&gt; → &lt;code&gt;order_items&lt;/code&gt;).&lt;/p&gt;


&lt;h2&gt;
  
  
  Verbose mode
&lt;/h2&gt;

&lt;p&gt;Not sure where a value came from? Turn on verbose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;forge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Forge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_ai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# [structural] id    = 1
# [faker]      name  = 'Allison Hill'
# [faker]      email = 'donaldgarcia@example.net'
# [ai]         bio   = 'Passionate developer with 8 years of experience...'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Foreign keys
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;forge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# order.customer_id always points to a real customer.id — automatically
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Provider-agnostic
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gsk_...      &lt;span class="c"&gt;# Groq (free tier — 14,400 req/day)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...  &lt;span class="c"&gt;# Claude&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...     &lt;span class="c"&gt;# GPT&lt;/span&gt;
&lt;span class="c"&gt;# No key? Falls back to Faker-only mode. CI never breaks.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What it's not
&lt;/h2&gt;

&lt;p&gt;This isn't a replacement for &lt;code&gt;faker&lt;/code&gt; — it uses faker internally.&lt;br&gt;
It's not a replacement for &lt;code&gt;hypothesis&lt;/code&gt; — different problem.&lt;/p&gt;

&lt;p&gt;It's the layer between &lt;em&gt;"I need realistic data"&lt;/em&gt; and&lt;br&gt;
&lt;em&gt;"I need it to feel like production."&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How to get it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fixtureforge
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"fixtureforge[groq]"&lt;/span&gt;  &lt;span class="c"&gt;# + AI support via Groq free tier&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docs: &lt;a href="https://yaniv2809.github.io/fixtureforge/" rel="noopener noreferrer"&gt;yaniv2809.github.io/fixtureforge&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/Yaniv2809/fixtureforge" rel="noopener noreferrer"&gt;github.com/Yaniv2809/fixtureforge&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;I'd genuinely like to hear: what's your current approach to test data?&lt;br&gt;
factory_boy? raw Faker? just hardcoded dicts?&lt;br&gt;
And is there a use case this doesn't cover that you'd want it to?&lt;/p&gt;

</description>
      <category>python</category>
      <category>testing</category>
      <category>pydantic</category>
      <category>pytest</category>
    </item>
    <item>
      <title>CSV vs JSON for Test Data: What I Learned Using Both in the Same Framework</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:35:22 +0000</pubDate>
      <link>https://dev.to/yaniv2809/csv-vs-json-for-test-data-what-i-learned-using-both-in-the-same-framework-3jn2</link>
      <guid>https://dev.to/yaniv2809/csv-vs-json-for-test-data-what-i-learned-using-both-in-the-same-framework-3jn2</guid>
      <description>&lt;p&gt;Most test automation tutorials pick one data format and stick with it. I used both — CSV for Web tests, JSON for API and Mobile — in the same framework. Here's why, and what I'd change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;My framework tests a financial expense tracker across three platforms: Web (Playwright), API (Flask/requests), and Mobile (Appium). Each platform has its own test suite, and each suite needs external test data.&lt;/p&gt;

&lt;p&gt;The core requirement: test data should live outside the test code, so adding a new test case means adding a row to a file — not modifying Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Two Formats?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CSV for Web Tests
&lt;/h3&gt;

&lt;p&gt;Web test data is tabular and flat. Every expense has the same fields: name, amount, date, category. CSV maps perfectly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test_id,expense_name,amount,category,date,status
test01,Business Lunch Web,150,Food,2025-05-20,success
test02,Taxi to office,50,Transportation,2025-05-21,success
test02,Client Dinner,320,Food,2025-05-22,success
test02,New Monitor,850,Accommodation,2025-05-23,success
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;test_id&lt;/code&gt; column is the key design decision. Multiple rows can share the same &lt;code&gt;test_id&lt;/code&gt; — this is how one test function runs multiple data sets. &lt;code&gt;test02&lt;/code&gt; has three rows, so &lt;code&gt;test02&lt;/code&gt; runs three times.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON for API and Mobile Tests
&lt;/h3&gt;

&lt;p&gt;API and Mobile test data often needs nested structures or type-specific values that CSV can't express cleanly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"test_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"expense_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DDT_Coffee1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Food"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"test_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"expense_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DDT_Flight Ticket2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1500"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Transportation"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"test_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"expense_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DDT_Hotel_Berlin3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"850.50"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Transportation"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSON preserves field names with every record (self-documenting), handles special characters better, and works natively with Python's &lt;code&gt;json.load()&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Filtering Pattern
&lt;/h2&gt;

&lt;p&gt;The real power is in the &lt;code&gt;test_id&lt;/code&gt; filtering. Instead of loading all data for every test, each test requests only its own subset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_data_from_csv_by_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;read_data_from_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;test_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_json_data_by_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;test_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means one master data file per platform — not one file per test. When I need to add a new expense variant for &lt;code&gt;test03&lt;/code&gt;, I add a row to the JSON file. The test automatically picks it up via &lt;code&gt;@pytest.mark.parametrize&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring It Into pytest
&lt;/h2&gt;

&lt;p&gt;Here's how the data flows into the test:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web (CSV):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.mark.parametrize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expense_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;read_data_from_csv_by_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MASTER_CSV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test02&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test02_create_multiple_expenses_ddt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;WebWorkflows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_expense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;expense_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expense_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;API (JSON):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.mark.parametrize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expense_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;read_json_data_by_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MASTER_API_DATA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test03&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test03_create_multiple_expenses_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;APIWorkflows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_expense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;expense_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expense_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expense_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;APIVerifications&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify_status_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern, different data source. pytest's &lt;code&gt;parametrize&lt;/code&gt; handles the multiplication — 3 rows for &lt;code&gt;test02&lt;/code&gt; means 3 test executions in the report, each with its own pass/fail status.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;The Allure report shows each parameterized run as a separate test case. If the "DDT_Hotel_Berlin3" dataset fails but the other two pass, you see exactly which data combination broke — not just "test03 failed."&lt;/p&gt;

&lt;p&gt;Adding test coverage for a new edge case (say, an expense with a zero amount) means adding one line to the JSON file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"test_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test03"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"expense_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Zero Amount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Food"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No code change. No new test function. The next &lt;code&gt;pytest&lt;/code&gt; run picks it up automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. I'd standardize on JSON for everything.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CSV seemed simpler at first, but maintaining two reader functions and two data formats adds friction. JSON handles everything CSV does, plus nested data, plus better type preservation. The CSV advantage (editable in Excel) never mattered in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. I'd add schema validation to the data files.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Right now, if someone adds a row with a typo in a field name (&lt;code&gt;expnse_name&lt;/code&gt; instead of &lt;code&gt;expense_name&lt;/code&gt;), the test fails with a &lt;code&gt;KeyError&lt;/code&gt; — which looks like a code bug, not a data bug. A quick JSON Schema or Pydantic validation on load would catch this immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. I'd separate test data from test metadata.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;test_id&lt;/code&gt; field mixes data with routing information. A cleaner approach would be a separate mapping file that says "test02 uses rows 2-4 from the data file" — but honestly, for a framework this size, the inline &lt;code&gt;test_id&lt;/code&gt; approach is simple and works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tradeoff Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;CSV&lt;/th&gt;
&lt;th&gt;JSON&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Readability&lt;/td&gt;
&lt;td&gt;Easy for flat tables&lt;/td&gt;
&lt;td&gt;Better for nested data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tooling&lt;/td&gt;
&lt;td&gt;Excel, Google Sheets&lt;/td&gt;
&lt;td&gt;Any text editor, APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type safety&lt;/td&gt;
&lt;td&gt;Everything is a string&lt;/td&gt;
&lt;td&gt;Supports numbers, booleans, null&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-documenting&lt;/td&gt;
&lt;td&gt;Headers only at top&lt;/td&gt;
&lt;td&gt;Field names in every record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python parsing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;csv.DictReader&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;json.load()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge cases&lt;/td&gt;
&lt;td&gt;Commas in values break it&lt;/td&gt;
&lt;td&gt;Handles special characters natively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version control diffs&lt;/td&gt;
&lt;td&gt;Clean line-by-line diffs&lt;/td&gt;
&lt;td&gt;Noisier diffs (brackets, commas)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For test automation specifically, JSON wins in most categories. CSV's only real advantage is the Excel factor — and if your QA team doesn't use Excel for test data (most don't), it's not a factor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Source
&lt;/h2&gt;

&lt;p&gt;The complete implementation — both data formats, both reader functions, and all parameterized tests:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Yaniv2809/Financial-Integrity-Ecosystem" rel="noopener noreferrer"&gt;Financial-Integrity-Ecosystem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data files: &lt;code&gt;data/web/expense_data.csv&lt;/code&gt;, &lt;code&gt;data/ddt/expenses_json_data.json&lt;/code&gt;, &lt;code&gt;data/ddt/mobile_expense_data.json&lt;/code&gt;&lt;br&gt;
Reader functions: &lt;code&gt;utils/common_ops.py&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 3 of a series on building a multi-layer test automation framework. &lt;a href="https://dev.to/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a"&gt;Part 1&lt;/a&gt; covered Set Theory for data integrity validation. Part 2 covered the CI/CD pipeline.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yaniv Metuku — QA Automation Engineer&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>python</category>
      <category>automation</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How I Built a 12-Step CI/CD Pipeline That Spins Up MySQL, Flask, and Playwright From Scratch</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Sat, 11 Apr 2026 14:44:47 +0000</pubDate>
      <link>https://dev.to/yaniv2809/how-i-built-a-12-step-cicd-pipeline-that-spins-up-mysql-flask-and-playwright-from-scratch-912</link>
      <guid>https://dev.to/yaniv2809/how-i-built-a-12-step-cicd-pipeline-that-spins-up-mysql-flask-and-playwright-from-scratch-912</guid>
      <description>&lt;p&gt;Setting up CI for a web app is straightforward. Setting up CI for a test automation framework that needs a real database, two backend servers, and a headless browser — all starting from zero on every run — is a different problem.&lt;/p&gt;

&lt;p&gt;This is how I built a GitHub Actions pipeline that provisions the entire infrastructure, runs 37 tests across 3 layers, and deploys a historical test report — every time I push to main.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: "Works On My Machine" Doesn't Scale
&lt;/h2&gt;

&lt;p&gt;Locally, my test framework needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MySQL 8.0 with a specific schema loaded&lt;/li&gt;
&lt;li&gt;A JSON Server running on port 3000&lt;/li&gt;
&lt;li&gt;A Flask API server running on port 5000&lt;/li&gt;
&lt;li&gt;Playwright with Chromium installed&lt;/li&gt;
&lt;li&gt;Environment variables for DB credentials and API keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Running &lt;code&gt;pytest&lt;/code&gt; locally assumes all of this is already set up. In CI, nothing exists. Every run starts from an empty Ubuntu container.&lt;/p&gt;

&lt;p&gt;The challenge isn't running the tests — it's building the world they need to run in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 12 Steps
&lt;/h2&gt;

&lt;p&gt;Here's the full pipeline, and why each step exists:&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps 1-3: Foundation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1. Checkout Code&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2. Set up Python &lt;/span&gt;&lt;span class="m"&gt;3.13&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.13'&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3. Set up Node.js &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;18'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python for the test framework. Node.js for JSON Server. Nothing surprising here, but note the explicit version pinning — &lt;code&gt;3.13&lt;/code&gt;, not &lt;code&gt;3.x&lt;/code&gt;. CI flakiness often starts with "we let the runner pick the version."&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4-5: Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4. Install Python Dependencies&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;python -m pip install --upgrade pip&lt;/span&gt;
    &lt;span class="s"&gt;pip install -r requirements.txt&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5. Install Playwright Browsers&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;playwright install chromium --with-deps&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--with-deps&lt;/code&gt; is critical. Without it, Playwright installs the browser binary but not the OS-level libraries it needs (libgbm, libasound, etc.). The test run will fail with a cryptic error about missing shared objects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Database Schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mysql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql:8.0&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_ROOT_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root_password&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_DATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;expense_test_db&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test_user&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test_password&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;3306:3306&lt;/span&gt;
    &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
      &lt;span class="s"&gt;--health-cmd="mysqladmin ping"&lt;/span&gt;
      &lt;span class="s"&gt;--health-interval=10s&lt;/span&gt;
      &lt;span class="s"&gt;--health-timeout=5s&lt;/span&gt;
      &lt;span class="s"&gt;--health-retries=5&lt;/span&gt;

&lt;span class="c1"&gt;# In steps:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;6. Initialize MySQL Schema&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;sudo apt-get install -y mysql-client&lt;/span&gt;
    &lt;span class="s"&gt;mysql -h 127.0.0.1 -u test_user -ptest_password expense_test_db &amp;lt; data/init_mysql.sql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MySQL service container starts alongside the job. The &lt;code&gt;health-cmd&lt;/code&gt; ensures the container is actually ready before we try to load the schema. Without health checks, you'll hit "Connection refused" errors roughly 30% of the time.&lt;/p&gt;

&lt;p&gt;The schema itself is minimal — one table with a CHECK constraint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expense_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That CHECK constraint is actually a test target — one of my E2E tests validates that negative amounts get rejected at the DB level even though the UI accepts them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps 7-9: Server Orchestration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7. Install &amp;amp; Start JSON Server&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;npm install -g json-server&lt;/span&gt;
    &lt;span class="s"&gt;json-server --watch json-server/db.json --port 3000 &amp;amp;&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8. Start Flask Server&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;DB_TYPE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
    &lt;span class="na"&gt;MYSQL_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;python server/app.py &amp;amp;&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;9. Wait for Servers to be Ready&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:3000/expenses&lt;/span&gt;
    &lt;span class="s"&gt;curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:5000/expenses&lt;/span&gt;
    &lt;span class="s"&gt;echo "All servers are up and running!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;amp;&lt;/code&gt; at the end of each server command runs it in the background. Step 9 is the safety net — it polls both servers with retry logic until they respond, or fails after 20 seconds.&lt;/p&gt;

&lt;p&gt;This is a pattern I see skipped in a lot of CI setups. People start a server and immediately run tests, then wonder why they get intermittent connection errors. Always add a readiness check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: The Actual Tests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10. Run Tests (exclude Mobile)&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GROQ_API_KEY }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;pytest -m "not mobile" --alluredir=allure-results --ai-analysis&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;-m "not mobile"&lt;/code&gt; excludes tests that require a physical Android device. The remaining 37 tests cover Web (Playwright), API, Database, and cross-layer E2E.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--ai-analysis&lt;/code&gt; triggers an optional Groq LLM call on test failures to classify the root cause. The API key is stored as a GitHub Secret.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps 11-12: Reporting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;11. Generate Allure Report&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;simple-elf/allure-report-action@master&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;allure_results&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allure-results&lt;/span&gt;
    &lt;span class="na"&gt;allure_history&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allure-history&lt;/span&gt;
    &lt;span class="na"&gt;keep_reports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;12. Deploy Allure Report to GitHub Pages&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;peaceiris/actions-gh-pages@v3&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
    &lt;span class="na"&gt;publish_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allure-history&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;if: always()&lt;/code&gt; is key — the report is generated even when tests fail. Without this, failures produce no report, which is exactly when you need one most.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;keep_reports: 20&lt;/code&gt; maintains the last 20 runs, so you get historical trend analysis in Allure — pass rates over time, flakiness detection, duration trends.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Health checks prevent 80% of CI flakiness.&lt;/strong&gt; The MySQL health-cmd and the curl retry loops in step 9 eliminated almost all intermittent failures. Before adding them, roughly 1 in 4 runs failed due to timing issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;if: always()&lt;/code&gt; on reporting steps is non-negotiable.&lt;/strong&gt; The whole point of CI reports is to understand failures. If the report step only runs on success, it's useless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Service containers beat &lt;code&gt;docker-compose&lt;/code&gt; in GitHub Actions.&lt;/strong&gt; I originally tried running &lt;code&gt;docker-compose up&lt;/code&gt; inside the workflow. It works, but it's slower and harder to debug. Native service containers integrate better with the runner's networking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Pin your versions.&lt;/strong&gt; Python 3.13, Node 18, MySQL 8.0 — not &lt;code&gt;latest&lt;/code&gt;, not &lt;code&gt;3.x&lt;/code&gt;. A version bump in a dependency should be a deliberate commit, not a surprise in CI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Docker Alternative
&lt;/h2&gt;

&lt;p&gt;For local development, the same test suite runs via Docker Compose with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses a custom entrypoint script that replicates the CI steps — waits for MySQL, starts both servers, runs pytest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[1/5] Waiting for MySQL...        ✓
[2/5] Starting JSON Server...     ✓
[3/5] Starting Flask Server...    ✓
[4/5] Waiting for servers...      ✓
[5/5] Running tests...
========================= 34 passed, 3 xfailed =========================
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same tests, same infrastructure, same results — whether it runs in GitHub Actions or on a developer's laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Source
&lt;/h2&gt;

&lt;p&gt;The complete workflow file and Docker setup are in the repo:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Yaniv2809/Financial-Integrity-Ecosystem" rel="noopener noreferrer"&gt;Financial-Integrity-Ecosystem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CI config is at &lt;code&gt;.github/workflows/ci.yml&lt;/code&gt;. The Docker setup is in &lt;code&gt;Dockerfile&lt;/code&gt;, &lt;code&gt;docker-compose.yml&lt;/code&gt;, and &lt;code&gt;docker-entrypoint.sh&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 2 of a series on building a multi-layer test automation framework. &lt;a href="https://dev.to/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a"&gt;Part 1&lt;/a&gt; covered using Set Theory for cross-layer data integrity validation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yaniv Metuku — QA Automation Engineer&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>python</category>
      <category>devops</category>
      <category>github</category>
    </item>
    <item>
      <title>How I Used Set Theory to Catch Bugs That Unit Tests Miss</title>
      <dc:creator>Yaniv </dc:creator>
      <pubDate>Fri, 10 Apr 2026 16:40:02 +0000</pubDate>
      <link>https://dev.to/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a</link>
      <guid>https://dev.to/yaniv2809/how-i-used-set-theory-to-catch-bugs-that-unit-tests-miss-5d8a</guid>
      <description>&lt;p&gt;Most test automation tutorials teach you to test layers in isolation: UI tests check buttons, API tests check status codes, DB tests check records. But the bugs that actually cost money in production? They live &lt;strong&gt;between&lt;/strong&gt; the layers.&lt;/p&gt;

&lt;p&gt;I learned this the hard way while building a test automation framework for a financial expense tracker. This post is about one specific technique — &lt;strong&gt;Set Theory validation&lt;/strong&gt; — that catches data integrity bugs that no single-layer test will ever find.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Everything Passes, But Data Is Wrong
&lt;/h2&gt;

&lt;p&gt;Imagine this scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A user creates an expense for $100 through the Web UI&lt;/li&gt;
&lt;li&gt;The UI shows a success message ✅&lt;/li&gt;
&lt;li&gt;The API returns &lt;code&gt;201 Created&lt;/code&gt; ✅&lt;/li&gt;
&lt;li&gt;The database has... $0. Or two records. Or nothing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every individual layer test passes. The UI test confirms the success message appeared. The API test confirms the status code. But nobody verified that the &lt;strong&gt;actual data&lt;/strong&gt; made it through the entire pipeline correctly.&lt;/p&gt;

&lt;p&gt;In financial applications, this is not a cosmetic bug — it's a &lt;strong&gt;silent data inconsistency&lt;/strong&gt; that can go unnoticed until an audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach: Database State as a Mathematical Set
&lt;/h2&gt;

&lt;p&gt;Instead of checking "does a record exist?", I treat the entire database table as a mathematical set, and use &lt;strong&gt;set difference&lt;/strong&gt; to prove exactly what changed.&lt;/p&gt;

&lt;p&gt;Here's the concept:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Step 1: Capture DB state BEFORE the action
&lt;/span&gt;&lt;span class="n"&gt;old_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Perform the action (create expense via UI or API)
&lt;/span&gt;
&lt;span class="c1"&gt;# Step 3: Capture DB state AFTER the action
&lt;/span&gt;&lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;expenses&lt;/span&gt;

&lt;span class="c1"&gt;# Step 4: Validate using set difference
&lt;/span&gt;&lt;span class="n"&gt;isolated_record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_set&lt;/span&gt;

&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isolated_record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;          &lt;span class="c1"&gt;# Exactly ONE new record
&lt;/span&gt;&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;expected_amount&lt;/span&gt;  &lt;span class="c1"&gt;# Amount is correct
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is powerful because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's operation-independent.&lt;/strong&gt; Whether the expense was created via UI, API, or direct SQL — the validation is the same.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It catches duplicates.&lt;/strong&gt; If a bug causes two records to be inserted, &lt;code&gt;len(isolated_record)&lt;/code&gt; will be 2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It catches phantom data.&lt;/strong&gt; If some other process modified the table during the test, the set difference will include unexpected records.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It catches amount drift.&lt;/strong&gt; If &lt;code&gt;$100&lt;/code&gt; was entered but &lt;code&gt;$99.99&lt;/code&gt; was stored (floating point issues, rounding bugs), the sum check catches it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real Implementation
&lt;/h2&gt;

&lt;p&gt;In my framework, this looks like this across the stack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DB helper that captures state:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
&lt;span class="nd"&gt;@allure.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB: Get all expenses as set&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_all_expenses_as_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, expense_name, amount FROM expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

&lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
&lt;span class="nd"&gt;@allure.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB: Get sum of amounts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_sum_of_amounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT COALESCE(SUM(amount), 0) FROM expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cross-layer E2E test that uses it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_api_create_reflects_in_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Capture pre-state
&lt;/span&gt;    &lt;span class="n"&gt;old_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_expenses_as_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_sum_of_amounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Act: Create expense via API
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;APIActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;APIVerification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify_status_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Capture post-state
&lt;/span&gt;    &lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_expenses_as_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DBActions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_sum_of_amounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Validate integrity
&lt;/span&gt;    &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_set&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_set&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expected 1 new record, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;new_sum&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;old_sum&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;expected_amount&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern applies to &lt;strong&gt;update&lt;/strong&gt; (set difference shows one record with changed values) and &lt;strong&gt;delete&lt;/strong&gt; (old_set - new_set shows the removed record).&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Caught Real Bugs
&lt;/h2&gt;

&lt;p&gt;While building this, the Set Theory approach caught two issues that single-layer tests completely missed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. MySQL CHECK constraint silently rejecting negative amounts.&lt;/strong&gt;&lt;br&gt;
The UI happily accepted &lt;code&gt;-50&lt;/code&gt; as an expense amount. The API returned &lt;code&gt;201&lt;/code&gt;. But MySQL's &lt;code&gt;CHECK (amount &amp;gt;= 0)&lt;/code&gt; constraint blocked the INSERT — so the record never existed in the DB. The set difference was empty when it should have contained one record. Without the cross-layer test, this would have looked like a perfectly passing test suite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. VARCHAR(255) overflow truncation.&lt;/strong&gt;&lt;br&gt;
A 300-character expense name was entered through the UI. The API accepted it. MySQL truncated it to 255 characters silently. The set difference caught the mismatch because the stored record didn't match the expected data.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Full Picture
&lt;/h2&gt;

&lt;p&gt;This technique is one piece of a larger framework I built with 53 tests across 4 layers (Web/Playwright, API/Flask, Mobile/Appium, Database/MySQL). The cross-layer E2E tests that use Set Theory are a small percentage of the total test count, but they catch the highest-risk bugs.&lt;/p&gt;

&lt;p&gt;The architecture enforces strict separation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tests → Workflows → Actions/Verifications → Page Objects + Data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every layer has one job. Tests never call raw UI or API actions directly — they go through workflows that compose actions into business flows. This keeps the set theory validation reusable across different test scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Should (and Shouldn't) Use This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your application handles financial data, inventory, or any domain where data accuracy matters more than UI polish&lt;/li&gt;
&lt;li&gt;Data flows through multiple systems (frontend → backend → database → reporting)&lt;/li&gt;
&lt;li&gt;You've had production bugs where "the UI said X but the DB had Y"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Don't use it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're testing a static website or content-only app&lt;/li&gt;
&lt;li&gt;The DB is behind a well-tested ORM with strong constraints and you trust the abstraction&lt;/li&gt;
&lt;li&gt;Test execution time is critical and you can't afford the extra DB queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full framework is open source:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Yaniv2809/Financial-Integrity-Ecosystem" rel="noopener noreferrer"&gt;Financial-Integrity-Ecosystem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can run the entire suite with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Yaniv2809/Financial-Integrity-Ecosystem.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Financial-Integrity-Ecosystem
docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This spins up MySQL, Flask, JSON Server, and Playwright — runs all 37 non-mobile tests automatically.&lt;/p&gt;

&lt;p&gt;The cross-layer E2E tests are in &lt;code&gt;tests/api/test_e2e_api_db_expense.py&lt;/code&gt; and &lt;code&gt;tests/test_e2e_web_api_db.py&lt;/code&gt; if you want to see the set theory pattern in action.&lt;/p&gt;




&lt;p&gt;I recently shared this project on r/QualityAssurance and got valuable feedback that led to several improvements, including adding a testing philosophy section that explains the business risk behind each layer. If you have feedback or have used similar patterns in production, I'd genuinely like to hear about it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yaniv Metuku — QA Automation Engineer&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>python</category>
      <category>automation</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
