<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mihir Shinde</title>
    <description>The latest articles on DEV Community by Mihir Shinde (@byteframe).</description>
    <link>https://dev.to/byteframe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3883273%2Fb5e4faf5-1081-4776-8fdf-aaf7ef45f74b.jpg</url>
      <title>DEV Community: Mihir Shinde</title>
      <link>https://dev.to/byteframe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/byteframe"/>
    <language>en</language>
    <item>
      <title>Why Claude Code AutoFix Can’t Fix Flaky Tests</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Mon, 18 May 2026 17:01:14 +0000</pubDate>
      <link>https://dev.to/byteframe/why-claude-code-autofix-cant-fix-flaky-tests-e6d</link>
      <guid>https://dev.to/byteframe/why-claude-code-autofix-cant-fix-flaky-tests-e6d</guid>
      <description>&lt;h1&gt;
  
  
  Why Claude Code AutoFix Can’t Fix Flaky Tests
&lt;/h1&gt;

&lt;p&gt;AutoFix is great at real bugs. Flaky tests break it — and the fix loop costs more than the flake did.&lt;/p&gt;

&lt;p&gt;May 8, 2026 · 8 min read&lt;/p&gt;

&lt;p&gt;Anthropic shipped &lt;strong&gt;Claude Code AutoFix&lt;/strong&gt; — an agent that subscribes to your PR’s GitHub events, watches CI, and pushes commits to fix failing tests and address review comments. For real bugs, it’s genuinely good. For &lt;em&gt;flaky&lt;/em&gt; tests, it’s a footgun.&lt;/p&gt;

&lt;p&gt;We build a tool in this exact space (Kleore quantifies and surfaces flaky-test waste), so we watched the AutoFix launch closely. Here’s the honest read on what it does, where it breaks, and why throwing an AI agent at flakiness makes the problem more expensive, not less.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AutoFix actually does
&lt;/h2&gt;

&lt;p&gt;AutoFix is a cloud-hosted Claude Code session attached to a pull request. When CI fails or a reviewer leaves a comment, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads the failure log or comment&lt;/li&gt;
&lt;li&gt;Investigates the relevant code&lt;/li&gt;
&lt;li&gt;Pushes a commit with an explanation&lt;/li&gt;
&lt;li&gt;Re-runs CI and iterates&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a deterministic failure — a real null deref, a type error, a missing import — this loop converges. Test fails → agent reads stack trace → agent fixes code → test passes. Clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flaky tests break the loop
&lt;/h2&gt;

&lt;p&gt;A flaky test, by definition, fails for reasons unrelated to the code change. Race conditions. Unstable network mocks. Order-dependent fixtures. Timezone drift. The test that failed on this PR will pass on the next run with no change at all.&lt;/p&gt;

&lt;p&gt;AutoFix doesn’t know that. It sees a red CI, assumes the diff broke something, and starts hunting. The result is one of three failure modes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The speculative fix loop
&lt;/h3&gt;

&lt;p&gt;AutoFix reads the stack trace, invents a plausible cause, and pushes a “fix.” The flaky test passes on the next run — not because of the fix, but because flakes pass ~70% of the time. AutoFix declares victory. You’ve now merged a code change that was triggered by randomness, not by an actual bug.&lt;/p&gt;

&lt;p&gt;Multiply this across a quarter and your codebase fills up with cargo-cult fixes: extra &lt;code&gt;await&lt;/code&gt;s, defensive null checks, retries, sleep statements, narrowed test assertions. Each one looks reasonable in isolation. Together they’re the AI version of &lt;em&gt;“don’t touch this, it works.”&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The infinite re-run
&lt;/h3&gt;

&lt;p&gt;Sometimes AutoFix gets it right and tries to re-run. The flake fails again. It fixes harder. Re-runs. Fails. Each iteration burns tokens, CI minutes, and your patience. The blog post that introduced AutoFix flags this directly: agents can “enter speculative fix loops that consume resources without resolving the underlying problem.”&lt;/p&gt;

&lt;p&gt;A single flaky test can cost an AutoFix session 10–30 LLM calls, each touching multiple files, each pushing a commit. Your token bill and your git history both look terrible.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The wrong file blamed
&lt;/h3&gt;

&lt;p&gt;Race conditions and shared-state bugs rarely live in the file the test exercises. AutoFix looks where the stack trace points. The actual cause — a fixture in a sibling file, a global mock that leaked, a database row left by an earlier test — is two directories away. AutoFix “fixes” the wrong thing, the symptom moves, the next PR sees a new flake.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost math is bad
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure type&lt;/th&gt;
&lt;th&gt;Real bug&lt;/th&gt;
&lt;th&gt;Flaky test&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AutoFix iterations&lt;/td&gt;
&lt;td&gt;1–3&lt;/td&gt;
&lt;td&gt;5–30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI minutes consumed&lt;/td&gt;
&lt;td&gt;10–30&lt;/td&gt;
&lt;td&gt;60–300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token spend per failure&lt;/td&gt;
&lt;td&gt;$0.50–$2&lt;/td&gt;
&lt;td&gt;$5–$25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outcome&lt;/td&gt;
&lt;td&gt;Bug fixed&lt;/td&gt;
&lt;td&gt;Symptom hidden&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The unit economics flip on flaky tests. You pay 10x more, end up with worse code, and the underlying flake is still there waiting for the next PR.&lt;/p&gt;

&lt;h2&gt;
  
  
  The right move: classify before you fix
&lt;/h2&gt;

&lt;p&gt;Every CI failure should be sorted into one of two buckets &lt;em&gt;before&lt;/em&gt; an agent (or a human) starts fixing it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Real failure&lt;/strong&gt; — this PR’s code change broke something. Send to AutoFix. It’ll do a great job.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flake&lt;/strong&gt; — this test fails on unrelated PRs too. Don’t fix it on this PR. Quarantine, log, and address the root cause separately.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AutoFix has no way to make this distinction on its own. It only sees one PR. Flake detection requires looking across &lt;em&gt;many&lt;/em&gt; PRs, over time, and asking: &lt;em&gt;does this test fail on diffs that have nothing to do with it?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  This is what Kleore does
&lt;/h2&gt;

&lt;p&gt;Kleore connects to your GitHub repos, scans your CI history, and ranks every flaky test by frequency and dollar cost. It’s the missing classifier in front of AutoFix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test fails on a PR → check Kleore → if the test is on the flake list, skip AutoFix entirely&lt;/li&gt;
&lt;li&gt;Top flakes get triaged as their own work item, not patched into random PRs&lt;/li&gt;
&lt;li&gt;Engineering managers see a weekly $ number and can decide whether to invest fix-time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AutoFix and Kleore are complements, not competitors. AutoFix needs a flake-aware front door to be safe in a real codebase. Without one, every flaky test in your suite becomes a recurring tax on your token bill.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop letting AutoFix burn cycles on flakes.
&lt;/h3&gt;

&lt;p&gt;Install the Kleore GitHub App. Get a ranked list of every flaky test in your repos — with dollar costs attached — in two minutes. Free to start.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: Claude Code AutoFix and flaky tests
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can Claude Code AutoFix fix flaky tests?
&lt;/h3&gt;

&lt;p&gt;No. AutoFix is designed to fix deterministic failures caused by the current PR’s code change. Flaky tests fail for reasons unrelated to the diff — race conditions, shared state, network jitter — so AutoFix either invents a cargo-cult fix that “works” because the flake passed by chance, or it loops indefinitely burning tokens and CI minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does AutoFix loop on flaky tests?
&lt;/h3&gt;

&lt;p&gt;AutoFix treats every red CI as a code bug. When a flake fails, AutoFix pushes a speculative fix and re-runs. The flake fails again for unrelated reasons, AutoFix fixes harder, and the loop repeats. Each iteration consumes LLM calls, CI minutes, and adds noisy commits to your git history.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I stop AutoFix from wasting cycles on flaky tests?
&lt;/h3&gt;

&lt;p&gt;Classify failures before AutoFix runs. If a test has historically failed on PRs unrelated to its code, treat it as a flake — quarantine and address separately. Tools like Kleore scan your CI history across many PRs to identify flaky tests and rank them by frequency and dollar cost, so AutoFix only engages on real bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are AutoFix and Kleore competitors?
&lt;/h3&gt;

&lt;p&gt;No, they’re complements. AutoFix is a per-PR fixer that needs a flake-aware front door. Kleore provides cross-PR flaky test detection and dollar-cost reporting that tells AutoFix (and your engineers) when to fix and when to quarantine.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does running AutoFix on a flaky test cost?
&lt;/h3&gt;

&lt;p&gt;A single flaky test can drive AutoFix through 5–30 iterations, consuming 60–300 CI minutes and $5–$25 in tokens per failure — roughly 10x the cost of fixing a real bug. Multiplied across a quarter, this becomes a recurring tax on your engineering budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/what-are-flaky-tests"&gt;What Are Flaky Tests?&lt;/a&gt; — The primer on why tests fail without code changes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — Compute is the smallest line item.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Six root causes and how to fix each one.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to Find and Fix Flaky Tests in Jest</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Thu, 14 May 2026 17:53:07 +0000</pubDate>
      <link>https://dev.to/byteframe/how-to-find-and-fix-flaky-tests-in-jest-31nd</link>
      <guid>https://dev.to/byteframe/how-to-find-and-fix-flaky-tests-in-jest-31nd</guid>
      <description>&lt;h1&gt;
  
  
  How to Find and Fix Flaky Tests in Jest
&lt;/h1&gt;

&lt;p&gt;The most common root causes of Jest flakiness — and battle-tested fixes you can apply today.&lt;/p&gt;

&lt;p&gt;March 28, 2026 · 14 min read&lt;/p&gt;

&lt;p&gt;Jest is the most popular JavaScript testing framework, powering test suites at companies from startups to Fortune 500s. It’s also one of the most common sources of flaky tests. The combination of parallel test execution, module mocking, and shared Node.js process state creates a perfect storm for intermittent failures.&lt;/p&gt;

&lt;p&gt;If your CI pipeline randomly fails with tests that pass on re-run, this guide is for you. We’ll cover why Jest tests become flaky, how to identify the culprits, and concrete fixes for each pattern — with code you can copy into your codebase today.&lt;/p&gt;

&lt;p&gt;Want to skip the guesswork?&lt;/p&gt;

&lt;p&gt;Before you start debugging one test at a time, get a ranked list of your flakiest tests. &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Kleore scans your CI history&lt;/a&gt; and shows you exactly which Jest tests are flaky, how often they fail, and how much each one costs in wasted CI minutes and developer time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Jest tests become flaky
&lt;/h2&gt;

&lt;p&gt;Jest runs test files in parallel worker processes by default. Each worker gets its own Node.js instance, but tests &lt;em&gt;within&lt;/em&gt; a file share the same process. This architecture means that any state leaking between tests in the same file — or any resource shared across workers — becomes a flakiness vector.&lt;/p&gt;

&lt;p&gt;The four most common root causes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Shared mutable state&lt;/strong&gt; — Global variables, module-level caches, singleton instances, or database records that persist between tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timer and date dependencies&lt;/strong&gt; — Tests that rely on &lt;code&gt;setTimeout&lt;/code&gt;, &lt;code&gt;setInterval&lt;/code&gt;, &lt;code&gt;Date.now()&lt;/code&gt;, or real clock time. CI runners are slower than your laptop, and timing assumptions break.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async race conditions&lt;/strong&gt; — Tests that don’t properly wait for async operations to complete. The test asserts before the state has updated, and it fails intermittently depending on execution speed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Module mocking leaks&lt;/strong&gt; — &lt;code&gt;jest.mock()&lt;/code&gt; calls that bleed across tests because mocks aren’t properly reset between test cases.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to identify flaky Jest tests
&lt;/h2&gt;

&lt;p&gt;Before you fix anything, you need to know &lt;em&gt;which&lt;/em&gt; tests are flaky. Here are the tools Jest gives you to flush out non-deterministic tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use --detectOpenHandles to find leaked resources
&lt;/h3&gt;

&lt;p&gt;If Jest hangs after tests complete or you see “Jest did not exit one second after the test run has completed,” you have open handles — unclosed database connections, running servers, or pending timers.&lt;/p&gt;

&lt;p&gt;Detect leaked handles&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find open handles that prevent Jest from exiting cleanly&lt;/span&gt;
npx jest &lt;span class="nt"&gt;--detectOpenHandles&lt;/span&gt; &lt;span class="nt"&gt;--forceExit&lt;/span&gt;

&lt;span class="c"&gt;# Run tests sequentially to isolate ordering issues&lt;/span&gt;
npx jest &lt;span class="nt"&gt;--runInBand&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run tests repeatedly to reproduce flakes
&lt;/h3&gt;

&lt;p&gt;A test that passes once might fail on the 50th run. Use the &lt;code&gt;--repeat&lt;/code&gt; flag (Jest 29+) or a simple bash loop to stress-test suspected flaky tests.&lt;/p&gt;

&lt;p&gt;Stress-test a suspected flaky test&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Jest 29+ built-in repeat&lt;/span&gt;
npx jest &lt;span class="nt"&gt;--repeat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50 path/to/suspected-flaky.test.ts

&lt;span class="c"&gt;# Bash loop for older Jest versions&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;seq &lt;/span&gt;1 50&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;npx jest path/to/suspected-flaky.test.ts &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FAILED on run &lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Randomize test order to find hidden dependencies
&lt;/h3&gt;

&lt;p&gt;Jest 28+ supports &lt;code&gt;--randomize&lt;/code&gt; to shuffle test order within each file. If a test only passes when another test runs before it, randomization will expose it.&lt;/p&gt;

&lt;p&gt;Randomize and isolate&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Randomize test order within files&lt;/span&gt;
npx jest &lt;span class="nt"&gt;--randomize&lt;/span&gt;

&lt;span class="c"&gt;# If a test fails, re-run with the same seed to reproduce&lt;/span&gt;
npx jest &lt;span class="nt"&gt;--randomize&lt;/span&gt; &lt;span class="nt"&gt;--seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;12345
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use jest-circus retry for automatic detection
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;jest-circus&lt;/code&gt; test runner (default since Jest 27) supports retries. While retries are a bandaid, they’re useful for &lt;em&gt;identifying&lt;/em&gt; which tests need attention: any test that passes on retry is flaky by definition.&lt;/p&gt;

&lt;p&gt;jest.config.ts — retry configuration&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// jest.config.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Retry failed tests up to 2 times&lt;/span&gt;
  &lt;span class="c1"&gt;// Tests that need retries are flaky — track them&lt;/span&gt;
  &lt;span class="na"&gt;retryTimes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Log retries so you can find them in CI output&lt;/span&gt;
  &lt;span class="na"&gt;logLevel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;warn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common patterns and fixes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Shared mutable state between tests
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Test passes with &lt;code&gt;it.only&lt;/code&gt;, fails when run with the full suite. Or it only fails when another specific test runs before it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; A module-level variable, database record, or in-memory cache is modified by one test and not cleaned up before the next.&lt;/p&gt;

&lt;p&gt;Bad — shared state across tests&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// userService.ts&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cachedUsers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="c1"&gt;// Module-level cache&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getUsers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cachedUsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cachedUsers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;cachedUsers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetchUsersFromDB&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cachedUsers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// test file — Test A populates the cache, Test B reads stale data&lt;/span&gt;
&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fetches users&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getUsers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Populates cache&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;handles empty state&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// This SHOULD test empty state, but cache is warm from previous test&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getUsers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Returns cached data!&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// FAILS&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix — reset state in beforeEach&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resetCache&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./userService&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;resetCache&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Clear module-level state&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clearAllMocks&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Clear all mock state&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Or use jest.isolateModules for complete isolation&lt;/span&gt;
&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;handles empty state&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isolateModules&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getUsers&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./userService&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// Fresh module instance — no cached data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getUsers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 2: Timer-dependent tests
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests involving debounce, throttle, setTimeout, or animations pass locally but fail in CI where runners are slower.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The test relies on real clock time. A 200ms debounce might take 300ms on a loaded CI runner, causing the assertion to fire too early.&lt;/p&gt;

&lt;p&gt;Bad — real timers in tests&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;debounces search input&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;SearchBox&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;fireEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;change&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Hoping 300ms is enough on CI... it's not&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mockSearch&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix — jest.useFakeTimers()&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;debounces search input&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;useFakeTimers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;SearchBox&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;fireEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;change&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Advance the clock by exactly 250ms (debounce delay)&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;advanceTimersByTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mockSearch&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;useRealTimers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always call &lt;code&gt;jest.useRealTimers()&lt;/code&gt; in an &lt;code&gt;afterEach&lt;/code&gt; to prevent fake timers from leaking into other tests. Better yet, set it up globally in your Jest setup file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Async race conditions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests using React Testing Library’s &lt;code&gt;getBy*&lt;/code&gt; queries fail because the element hasn’t rendered yet. The test passes when you add a small delay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The test asserts synchronously against an asynchronously updated DOM. On faster machines it works; on slower CI runners, the render hasn’t completed yet.&lt;/p&gt;

&lt;p&gt;Bad — synchronous assertion on async DOM&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;shows success message after submit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Form&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;fireEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Submit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="c1"&gt;// Element hasn't rendered yet — race condition!&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeInTheDocument&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix — use waitFor or findBy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;shows success message after submit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Form&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;fireEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Submit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="c1"&gt;// waitFor retries until the assertion passes or times out&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitFor&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeInTheDocument&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Or use findBy* which combines getBy + waitFor&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeInTheDocument&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 4: Port conflicts in integration tests
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; &lt;code&gt;EADDRINUSE: address already in use&lt;/code&gt; errors. Tests pass individually but fail when multiple test files run in parallel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Multiple test workers trying to bind to the same hardcoded port.&lt;/p&gt;

&lt;p&gt;Bad — hardcoded port&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Every test file tries to use port 3000&lt;/span&gt;
&lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix — dynamic port allocation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AddressInfo&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;net&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Port 0 = OS assigns an available port&lt;/span&gt;
  &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;address&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;AddressInfo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`http://localhost:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 5: Snapshot drift
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Snapshot tests fail in CI but pass locally. Or they fail after someone else’s PR merges but no one updated the snapshots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Snapshots contain environment-specific data (timestamps, random IDs, absolute paths) or were committed from a different OS/locale.&lt;/p&gt;

&lt;p&gt;Fix — deterministic snapshots&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Mock Date so snapshots don't change daily&lt;/span&gt;
&lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;useFakeTimers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setSystemTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2025-01-01T00:00:00Z&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Use property matchers for dynamic values&lt;/span&gt;
&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;renders user card&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;UserCard&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;testUser&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="sr"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="err"&gt;;
&lt;/span&gt;  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toMatchSnapshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="c1"&gt;// Allow these properties to be any value&lt;/span&gt;
    &lt;span class="na"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectContaining&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Add to CI: fail if snapshots are outdated&lt;/span&gt;
&lt;span class="c1"&gt;// package.json script:&lt;/span&gt;
&lt;span class="c1"&gt;// "test:ci": "jest --ci"&lt;/span&gt;
&lt;span class="c1"&gt;// --ci flag makes Jest fail if snapshots need updating&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to quarantine flaky Jest tests
&lt;/h2&gt;

&lt;p&gt;Sometimes you can’t fix a flaky test immediately. Maybe it’s in a complex integration test that requires a larger refactor, or the root cause is an upstream dependency. In these cases, quarantining the test prevents it from blocking your team while you work on a fix.&lt;/p&gt;

&lt;p&gt;A basic quarantine approach with Jest:&lt;/p&gt;

&lt;p&gt;Manual quarantine with test.skip&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Mark flaky tests so they don't block CI&lt;/span&gt;
&lt;span class="c1"&gt;// TODO: Fix flaky — https://linear.app/team/issue/ENG-1234&lt;/span&gt;
&lt;span class="nx"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;skip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;payment webhook handler&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;processes refund events&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// This test flakes due to Stripe webhook timing&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem with manual quarantine is that it’s easy to forget about skipped tests. They accumulate, and eventually you have a pile of tests that no one runs. &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Kleore&lt;/a&gt; automates this by detecting flaky tests from your CI history, tracking them over time, and alerting you when quarantined tests haven’t been addressed. No manual tagging required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: jest.config settings that reduce flakiness
&lt;/h2&gt;

&lt;p&gt;The right Jest configuration can prevent flaky tests from sneaking in. Here’s a production-hardened &lt;code&gt;jest.config.ts&lt;/code&gt; with annotations explaining each setting.&lt;/p&gt;

&lt;p&gt;jest.config.ts — production-hardened&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Config&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Run tests in random order to catch hidden dependencies&lt;/span&gt;
  &lt;span class="na"&gt;randomize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// In CI, run tests sequentially to reduce resource contention&lt;/span&gt;
  &lt;span class="c1"&gt;// Locally, use all cores for speed&lt;/span&gt;
  &lt;span class="na"&gt;maxWorkers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;50%&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Fail fast — stop after first failure in CI&lt;/span&gt;
  &lt;span class="na"&gt;bail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Clear mocks between every test automatically&lt;/span&gt;
  &lt;span class="na"&gt;clearMocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;restoreMocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Fail if snapshots are outdated (CI only)&lt;/span&gt;
  &lt;span class="na"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Timeout per test — generous enough for CI, strict enough to catch hangs&lt;/span&gt;
  &lt;span class="na"&gt;testTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Detect open handles that prevent clean exit&lt;/span&gt;
  &lt;span class="na"&gt;detectOpenHandles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;// Force Jest to exit after all tests complete&lt;/span&gt;
  &lt;span class="c1"&gt;// (safety net — fix the root cause, don't rely on this)&lt;/span&gt;
  &lt;span class="na"&gt;forceExit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;code&gt;--maxWorkers=1&lt;/code&gt; in CI eliminates parallelism-related flakiness. Yes, it’s slower. But a 10-minute reliable suite is better than a 5-minute suite that fails 20% of the time and gets re-run. If you need speed, invest in splitting your test suite into parallel CI jobs with &lt;a href="https://dev.to/blog/github-actions-ci-optimization"&gt;GitHub Actions matrix strategy&lt;/a&gt; instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop guessing which Jest tests are flaky.
&lt;/h3&gt;

&lt;p&gt;Kleore scans your GitHub Actions history and gives you a ranked list of every flaky test — with failure rates, cost estimates, and fix priority. Free to start.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt; &lt;a href="https://dev.to/blog/calculator"&gt;Calculate my CI waste&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Framework-agnostic patterns for CI-level flakiness.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-tests-pytest"&gt;How to Find and Fix Flaky Tests in pytest&lt;/a&gt; — The Python equivalent of this guide.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — The dollar math to justify the fix.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — Plug in your team’s numbers and see the impact.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>We Analyzed 10,000 GitHub Actions Runs — Here’s What Flaky Tests Actually Cost</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Tue, 12 May 2026 21:39:54 +0000</pubDate>
      <link>https://dev.to/byteframe/we-analyzed-10000-github-actions-runs-heres-what-flaky-tests-actually-cost-21n1</link>
      <guid>https://dev.to/byteframe/we-analyzed-10000-github-actions-runs-heres-what-flaky-tests-actually-cost-21n1</guid>
      <description>&lt;h1&gt;
  
  
  We Analyzed 10,000 GitHub Actions Runs — Here’s What Flaky Tests Actually Cost
&lt;/h1&gt;

&lt;p&gt;Five findings from real CI data. The numbers are worse than you think.&lt;/p&gt;

&lt;p&gt;March 28, 2026 · 9 min read&lt;/p&gt;

&lt;p&gt;We looked at 10,000 workflow runs across GitHub Actions repos. Not a survey. Not opinions. Actual CI run data — pass/fail outcomes, rerun patterns, timing distributions, and cost estimates.&lt;/p&gt;

&lt;p&gt;Here’s what the data says about flaky tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 1: 30% of CI reruns are caused by flaky tests, not real bugs
&lt;/h2&gt;

&lt;p&gt;Across the dataset, nearly one in three workflow reruns was triggered by a test that passed on the second attempt with no code changes. The failure wasn’t a real bug — it was noise.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total workflow runs analyzed&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runs that were reruns&lt;/td&gt;
&lt;td&gt;2,140 (21.4%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reruns caused by flaky tests (passed on retry, no code change)&lt;/td&gt;
&lt;td&gt;642 (30% of reruns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Share of total CI compute wasted on flaky reruns&lt;/td&gt;
&lt;td&gt;15–25% depending on repo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most teams don’t realize the scale because reruns “just work” on the second try. The failure disappears. Nobody files a bug. The cost accrues silently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 2: The average flaky test costs $37.50 per occurrence
&lt;/h2&gt;

&lt;p&gt;We calculated the cost per flaky occurrence using a conservative model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost component&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CI wait time (rerun)&lt;/td&gt;
&lt;td&gt;20 min&lt;/td&gt;
&lt;td&gt;$0.16 compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer context switch + investigation&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;td&gt;$12.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Focus recovery (research avg: 23 min to regain deep work)&lt;/td&gt;
&lt;td&gt;~20 min&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total per flaky occurrence&lt;/td&gt;
&lt;td&gt;~30 min wasted&lt;/td&gt;
&lt;td&gt;~$37.50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At $75/hr fully-loaded engineering cost, 30 minutes of wasted time is $37.50. That’s per occurrence — per developer, per flake.&lt;/p&gt;

&lt;p&gt;A single test that flakes 3 times a week costs &lt;strong&gt;$5,850 per year&lt;/strong&gt;. Most repos have more than one flaky test.&lt;/p&gt;

&lt;p&gt;Run the math on your own team&lt;/p&gt;

&lt;p&gt;These are averages. Your numbers may be better or worse. &lt;a href="https://dev.to/blog/calculator"&gt;Use the flaky test cost calculator&lt;/a&gt; to plug in your team’s actual CI duration, failure rate, and hourly cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 3: 80% of CI waste comes from the top 3 tests
&lt;/h2&gt;

&lt;p&gt;The Pareto principle applies hard. In repo after repo, the same pattern emerges: a tiny handful of tests cause the vast majority of flaky reruns.&lt;/p&gt;

&lt;p&gt;Typical “worst offenders” breakdown (composite example)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Flake rate&lt;/th&gt;
&lt;th&gt;Weekly reruns&lt;/th&gt;
&lt;th&gt;Annual cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;#1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;checkout.e2e → “applies discount code”&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;18%&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;$13,650&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;#2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;auth.integration → “refreshes expired token”&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;$7,800&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;#3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dashboard.render → “loads within 3s”&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8%&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;$5,850&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All other tests combined&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt;3%&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;$5,400&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fix three tests and you eliminate &lt;strong&gt;80% of the waste&lt;/strong&gt;. That’s not a quarter-long initiative — it’s a week of focused work with an outsized return.&lt;/p&gt;

&lt;p&gt;The challenge is knowing &lt;em&gt;which&lt;/em&gt; three. Most teams are guessing based on gut feel or recent Slack complaints. The data tells a different story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 4: Weekend and off-hours failures are the strongest flakiness signal
&lt;/h2&gt;

&lt;p&gt;This was the most useful pattern in the dataset. Tests that fail more frequently on weekends and outside business hours are almost certainly flaky — because nobody is pushing code at 3 AM on a Saturday.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Detection signal&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weekend / off-hours failure spike&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;No human code changes to explain the failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Passes on rerun with no diff&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Same code, different outcome = non-deterministic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High failure rate alone&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Could be a real bug that nobody has fixed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Known flaky” labels in code&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Incomplete, outdated, self-reported&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Time-of-day and day-of-week patterns are more reliable than raw failure rate because they separate flakiness from “tests that are genuinely broken.” A test with a 40% failure rate might just be broken. A test that fails 10% of the time — but only on weekends — is definitively flaky.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 5: Quarantining flaky tests cuts CI reruns by 60% within 2 weeks
&lt;/h2&gt;

&lt;p&gt;Teams that quarantine their worst flaky tests — isolating them so they run separately and don’t block the main CI pipeline — see immediate results.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After quarantine (2 weeks)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CI reruns per week&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg PR merge time&lt;/td&gt;
&lt;td&gt;4.2 hours&lt;/td&gt;
&lt;td&gt;2.1 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer trust in CI (survey)&lt;/td&gt;
&lt;td&gt;3.1 / 5&lt;/td&gt;
&lt;td&gt;4.4 / 5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Quarantine works because it stops the bleeding immediately. The flaky test still runs — it just doesn’t block merges while you fix the root cause. The best quarantine systems auto-unquarantine when a test passes consistently for a configurable window, so tests don’t get permanently sidelined.&lt;/p&gt;

&lt;p&gt;The psychological effect matters too. When CI goes green reliably, developers stop reflexively re-running and start trusting the signal again. That trust compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do about it
&lt;/h2&gt;

&lt;p&gt;The data points to a clear playbook:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Measure the damage.&lt;/strong&gt; You can’t prioritize fixes without knowing which tests are flaky and what they cost. Guessing based on Slack noise doesn’t work — the loudest complaints don’t always point to the most expensive tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix the top 3.&lt;/strong&gt; The Pareto distribution means you get 80% of the benefit from fixing a tiny handful of tests. Start there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quarantine while you fix.&lt;/strong&gt; Don’t let flaky tests block the pipeline while you work on the root cause. Isolate them immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use time-of-day signals.&lt;/strong&gt; Weekend and off-hours failure patterns are the most reliable way to separate flaky from genuinely broken.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track the trend.&lt;/strong&gt; After you fix or quarantine, make sure the numbers actually improve. Flakiness has a tendency to creep back.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  See your own numbers.
&lt;/h3&gt;

&lt;p&gt;Kleore scans your GitHub Actions history and shows you exactly which tests are flaky, how often they flake, and what they cost in dollars. No configuration. No test framework changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — Plug in your team’s numbers and see the real cost.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Concrete fixes for the six most common flaky test patterns.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — The full cost breakdown: compute, developer time, velocity, and trust.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>Kleore vs Alternatives: Honest Comparison</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Fri, 08 May 2026 04:44:57 +0000</pubDate>
      <link>https://dev.to/byteframe/kleore-vs-alternatives-honest-comparison-4gpa</link>
      <guid>https://dev.to/byteframe/kleore-vs-alternatives-honest-comparison-4gpa</guid>
      <description>&lt;h1&gt;
  
  
  Kleore vs Alternatives: Honest Comparison
&lt;/h1&gt;

&lt;p&gt;Picking a flaky test tool? Here’s how Kleore stacks up against BuildPulse, Trunk, and Datadog — including where they beat us.&lt;/p&gt;

&lt;p&gt;March 21, 2026 · 7 min read&lt;/p&gt;

&lt;p&gt;There are a handful of tools that tackle flaky test detection. Each one makes different trade-offs around setup complexity, pricing, CI coverage, and features. We built Kleore because we thought the existing options were either too expensive, too complex, or too narrow. But we’ll let you judge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Kleore&lt;/th&gt;
&lt;th&gt;BuildPulse&lt;/th&gt;
&lt;th&gt;Trunk&lt;/th&gt;
&lt;th&gt;Datadog&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Zero-config setup&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No test framework changes&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dollar cost per flaky test&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub-native (no separate login)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quarantine management&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Owner assignment + SLA&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shareable health reports&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI-powered diagnosis&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-fix PRs&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly digest (Slack/email)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-CI support&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test-level analytics&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;✓ = supported   ~ = partial/limited   — = not available. Based on publicly available information as of March 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Kleore
&lt;/h3&gt;

&lt;p&gt;Free — $149/mo Pro&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free: unlimited repos, flaky detection, cost breakdown, health reports&lt;/li&gt;
&lt;li&gt;Pro: quarantine, assignment, SLA tracking, AI diagnosis, auto-fix PRs&lt;/li&gt;
&lt;li&gt;Flat rate — no per-seat or per-repo pricing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  BuildPulse
&lt;/h3&gt;

&lt;p&gt;From $50/mo&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Priced per-repo&lt;/li&gt;
&lt;li&gt;No free tier&lt;/li&gt;
&lt;li&gt;Requires JUnit XML upload step in CI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Trunk Flaky Tests
&lt;/h3&gt;

&lt;p&gt;Free — usage-based paid&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free tier with limits&lt;/li&gt;
&lt;li&gt;Paid tiers based on test runs&lt;/li&gt;
&lt;li&gt;Requires Trunk CLI integration in CI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Datadog Test Visibility
&lt;/h3&gt;

&lt;p&gt;Part of Datadog CI Visibility&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bundled with Datadog CI module&lt;/li&gt;
&lt;li&gt;Priced per committed test run&lt;/li&gt;
&lt;li&gt;Requires Datadog agent + tracer in CI&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup complexity
&lt;/h2&gt;

&lt;p&gt;This is where the tools diverge most sharply. Here’s what setup actually looks like for each:&lt;/p&gt;

&lt;h3&gt;
  
  
  Kleore
&lt;/h3&gt;

&lt;p&gt;~2 minutes&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the Kleore GitHub App&lt;/li&gt;
&lt;li&gt;Select which repos to scan&lt;/li&gt;
&lt;li&gt;Done — Kleore reads your existing CI runs, no workflow changes needed&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  BuildPulse
&lt;/h3&gt;

&lt;p&gt;~15-30 minutes&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a BuildPulse account&lt;/li&gt;
&lt;li&gt;Add a CI step to upload JUnit XML test results&lt;/li&gt;
&lt;li&gt;Configure test suite grouping&lt;/li&gt;
&lt;li&gt;Modify CI workflow files for each repo&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Trunk
&lt;/h3&gt;

&lt;p&gt;~20-45 minutes&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Trunk CLI locally and in CI&lt;/li&gt;
&lt;li&gt;Initialize Trunk in your repo&lt;/li&gt;
&lt;li&gt;Configure test runner integration&lt;/li&gt;
&lt;li&gt;Add Trunk upload step to CI workflow&lt;/li&gt;
&lt;li&gt;Set up Trunk dashboard account&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Datadog
&lt;/h3&gt;

&lt;p&gt;~30-60 minutes&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up Datadog account with CI Visibility module&lt;/li&gt;
&lt;li&gt;Install Datadog agent in CI runners&lt;/li&gt;
&lt;li&gt;Add language-specific tracer to test framework&lt;/li&gt;
&lt;li&gt;Configure environment variables in CI&lt;/li&gt;
&lt;li&gt;Verify traces are being sent correctly&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where each tool wins
&lt;/h2&gt;

&lt;p&gt;We’re biased, but we try to be honest. Here’s when each tool is the better choice:&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Kleore if...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You want zero-config setup — no CI workflow changes, no test framework modifications&lt;/li&gt;
&lt;li&gt;You need dollar-cost visibility to justify fixing flaky tests to leadership&lt;/li&gt;
&lt;li&gt;You want assignment, SLA tracking, and quarantine in one tool&lt;/li&gt;
&lt;li&gt;You're a small-to-mid team that needs results fast without a long integration project&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose BuildPulse if...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You use multiple CI providers beyond GitHub Actions&lt;/li&gt;
&lt;li&gt;You need deep JUnit XML parsing and test-level timing analytics&lt;/li&gt;
&lt;li&gt;You're already uploading test artifacts and want to layer on flaky detection&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Trunk if...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You want flaky test detection as part of a broader developer tools suite&lt;/li&gt;
&lt;li&gt;You're already using other Trunk products (linting, merge queues)&lt;/li&gt;
&lt;li&gt;You have engineering bandwidth to manage the CLI integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Datadog if...
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You're already a Datadog customer and want everything in one dashboard&lt;/li&gt;
&lt;li&gt;You need test visibility across multiple languages and CI providers&lt;/li&gt;
&lt;li&gt;Cost is not a primary concern — you're optimizing for observability breadth&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;If you want to &lt;strong&gt;see your flaky test problem in two minutes&lt;/strong&gt; without modifying a single workflow file, Kleore is the fastest path. Install the GitHub App, and you immediately get a ranked list of every flaky test with dollar costs.&lt;/p&gt;

&lt;p&gt;If you need multi-CI support or are already deep in another tool’s ecosystem, one of the alternatives might be a better fit. But for GitHub-native teams that want results without a setup project, Kleore is purpose-built for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  See for yourself — it takes two minutes.
&lt;/h3&gt;

&lt;p&gt;Install Kleore, pick your repos, and see every flaky test ranked by cost. No credit card. No workflow changes. No vendor lock-in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — See what flaky tests cost your specific team.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — The full cost breakdown beyond CI minutes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Practical code fixes for the most common patterns.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>What Are Flaky Tests? The Silent Killer of CI Pipelines</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Fri, 08 May 2026 04:43:45 +0000</pubDate>
      <link>https://dev.to/byteframe/what-are-flaky-tests-the-silent-killer-of-ci-pipelines-kb6</link>
      <guid>https://dev.to/byteframe/what-are-flaky-tests-the-silent-killer-of-ci-pipelines-kb6</guid>
      <description>&lt;h1&gt;
  
  
  What Are Flaky Tests? The Silent Killer of CI Pipelines
&lt;/h1&gt;

&lt;p&gt;They pass. They fail. Nothing changed. And your team just lost another hour.&lt;/p&gt;

&lt;p&gt;March 21, 2026 · 8 min read&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;flaky test&lt;/strong&gt; is a test that produces different results — pass or fail — when run against the same code. No one touched the source. No dependency changed. Yet the test failed, your build went red, and someone on your team had to stop what they were doing to investigate.&lt;/p&gt;

&lt;p&gt;Thirty minutes later, they re-run the pipeline. It passes. They shrug, merge the PR, and move on — but the damage is already done: time wasted, context lost, and a little more trust eroded in your test suite.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do tests become flaky?
&lt;/h2&gt;

&lt;p&gt;Flaky tests aren’t random. They have root causes, but those causes are often subtle enough that they don’t surface on every run. The most common culprits:&lt;/p&gt;

&lt;h3&gt;
  
  
  Timing &amp;amp; race conditions
&lt;/h3&gt;

&lt;p&gt;Tests that depend on specific timing — setTimeout, polling intervals, animations — fail when the runner is a few milliseconds slower than expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared state
&lt;/h3&gt;

&lt;p&gt;Tests that read from or write to shared databases, files, or global variables. Run them in a different order and they break.&lt;/p&gt;

&lt;h3&gt;
  
  
  External dependencies
&lt;/h3&gt;

&lt;p&gt;API calls to third-party services, DNS lookups, network requests that timeout intermittently under load.&lt;/p&gt;

&lt;h3&gt;
  
  
  Environment differences
&lt;/h3&gt;

&lt;p&gt;Your test passes locally on macOS but fails on the Linux CI runner due to filesystem case sensitivity, timezone differences, or resource limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Date &amp;amp; time sensitivity
&lt;/h3&gt;

&lt;p&gt;Tests that compare against "now" or assume a specific day of the week. They fail at midnight, on weekends, or across timezone boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource contention
&lt;/h3&gt;

&lt;p&gt;Parallel test runners competing for ports, file locks, or database connections. Works fine sequentially, breaks under concurrency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real cost is invisible
&lt;/h2&gt;

&lt;p&gt;Most teams underestimate flaky tests because the cost is diffuse. It’s not one big outage — it’s a thousand small interruptions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;● &lt;strong&gt;Re-runs burn CI minutes.&lt;/strong&gt; Every retry is compute you’re paying for twice. At scale, this adds up to thousands of dollars per month.&lt;/li&gt;
&lt;li&gt;● &lt;strong&gt;Developer time is the hidden multiplier.&lt;/strong&gt; An engineer investigating a false failure for 20 minutes costs more than the CI compute. Multiply that by every flaky test, every day.&lt;/li&gt;
&lt;li&gt;● &lt;strong&gt;Trust erodes slowly, then all at once.&lt;/strong&gt; Once developers stop trusting the test suite, they start ignoring real failures. That’s when bugs ship to production.&lt;/li&gt;
&lt;li&gt;● &lt;strong&gt;Merge velocity drops.&lt;/strong&gt; PRs sit open longer because the build is “probably just flaky.” Reviews stack up. Shipping slows down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Industry data point&lt;/p&gt;

&lt;p&gt;Google’s internal research found that roughly &lt;strong&gt;1.5% of all test runs&lt;/strong&gt; across their monorepo were flaky. At Google’s scale, that translated to millions of wasted compute hours per year. Your team is smaller, but the proportional cost can be just as painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you know if you have a flaky test problem?
&lt;/h2&gt;

&lt;p&gt;If any of these sound familiar, you already do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✔️ Developers routinely re-run CI without changing code&lt;/li&gt;
&lt;li&gt;✔️ Your team has a Slack message template for “just re-run it”&lt;/li&gt;
&lt;li&gt;✔️ Certain tests are known to be unreliable but no one has time to fix them&lt;/li&gt;
&lt;li&gt;✔️ CI costs have been creeping up and nobody knows exactly why&lt;/li&gt;
&lt;li&gt;✔️ Engineers merge PRs even when CI is red, saying “it’s a known flake”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What high-performing teams do differently
&lt;/h2&gt;

&lt;p&gt;The best engineering teams don’t just fix flaky tests — they build systems to catch and manage them before they metastasize. Here’s the playbook:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Detect automatically
&lt;/h3&gt;

&lt;p&gt;Don’t wait for developers to report flaky tests in Slack. Analyze CI run history programmatically. A test that fails on one commit but passes on a retry — with no code diff — is flaky. Flag it immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Quantify the damage
&lt;/h3&gt;

&lt;p&gt;Knowing a test is flaky isn’t enough. You need to know how much it’s costing you — in CI minutes, in re-runs, in dollars. That’s what turns a “we should fix this” into an “we need to fix this now.”&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Assign ownership
&lt;/h3&gt;

&lt;p&gt;Flaky tests without owners don’t get fixed. Assign each flaky test to a person with an SLA. Track resolution like you track incidents.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Quarantine strategically
&lt;/h3&gt;

&lt;p&gt;While a fix is in progress, quarantine the test so it stops blocking other developers. But quarantine with an expiration date — otherwise it becomes a graveyard.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Measure improvement over time
&lt;/h3&gt;

&lt;p&gt;Track flaky test count and cost week over week. If the trend isn’t going down, your process isn’t working.&lt;/p&gt;

&lt;h3&gt;
  
  
  This is exactly what Kleore does.
&lt;/h3&gt;

&lt;p&gt;Kleore connects to your GitHub repos, analyzes your CI history, and shows you every flaky test — ranked by cost. Assign owners, quarantine tests, track your burn-down. Two-minute setup, no config changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — The dollar math behind CI waste.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Practical patterns for the most common root causes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — See what flaky tests cost your specific team.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to Find and Fix Flaky Tests in pytest</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Fri, 17 Apr 2026 17:05:53 +0000</pubDate>
      <link>https://dev.to/byteframe/how-to-find-and-fix-flaky-tests-in-pytest-1p9a</link>
      <guid>https://dev.to/byteframe/how-to-find-and-fix-flaky-tests-in-pytest-1p9a</guid>
      <description>&lt;h1&gt;
  
  
  How to Find and Fix Flaky Tests in pytest
&lt;/h1&gt;

&lt;p&gt;Database state, network calls, import side effects — the most common causes of Python test flakiness and how to eliminate each one.&lt;/p&gt;

&lt;p&gt;March 28, 2026 · 14 min read&lt;/p&gt;

&lt;p&gt;pytest is the gold standard for Python testing. Its fixture system, plugin ecosystem, and clean syntax make it a joy to write tests with. But those same powerful features — especially fixtures with broad scopes and plugin interactions — can introduce subtle flakiness that only shows up in CI.&lt;/p&gt;

&lt;p&gt;This guide covers the most common patterns behind flaky pytest tests and gives you concrete fixes with real code. Whether you’re dealing with database state leaks, time-dependent assertions, or mysterious import side effects, you’ll find the solution here.&lt;/p&gt;

&lt;p&gt;Want to skip the guesswork?&lt;/p&gt;

&lt;p&gt;Instead of hunting through CI logs manually, &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Kleore analyzes your CI history&lt;/a&gt; and ranks every flaky test by failure rate and cost — so you fix the worst ones first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why pytest tests become flaky
&lt;/h2&gt;

&lt;p&gt;Python’s dynamic nature and pytest’s powerful fixture system create unique flakiness vectors that don’t exist in more constrained testing frameworks. Here are the five most common root causes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Database state leaking between tests&lt;/strong&gt; — Tests share a database and don’t properly isolate transactions. Test A creates a record, Test B doesn’t expect it to exist.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File system conflicts&lt;/strong&gt; — Tests write to the same files or directories. Parallel execution causes race conditions on file reads/writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network calls to real services&lt;/strong&gt; — Tests make HTTP requests to external APIs that are slow, rate-limited, or occasionally down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Import side effects&lt;/strong&gt; — Python modules that execute code at import time (database connections, config loading, signal handlers) create hidden coupling between tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test ordering dependencies&lt;/strong&gt; — Test B only passes when Test A runs first because A sets up state that B implicitly relies on.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to identify flaky pytest tests
&lt;/h2&gt;

&lt;p&gt;pytest’s plugin ecosystem includes several tools specifically designed to flush out non-deterministic tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  pytest-randomly: Shuffle test order
&lt;/h3&gt;

&lt;p&gt;The most effective way to find tests with hidden ordering dependencies. &lt;code&gt;pytest-randomly&lt;/code&gt; shuffles the order of test modules, classes, and functions on every run. When a test fails under randomization, you’ve found a flake.&lt;/p&gt;

&lt;p&gt;Install and use pytest-randomly&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pytest-randomly

&lt;span class="c"&gt;# Run with randomized order (enabled by default after install)&lt;/span&gt;
pytest

&lt;span class="c"&gt;# Reproduce a specific failure with the same seed&lt;/span&gt;
pytest &lt;span class="nt"&gt;-p&lt;/span&gt; randomly &lt;span class="nt"&gt;--randomly-seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;12345

&lt;span class="c"&gt;# Disable randomization temporarily&lt;/span&gt;
pytest &lt;span class="nt"&gt;-p&lt;/span&gt; no:randomly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  pytest-repeat: Stress-test suspected flakes
&lt;/h3&gt;

&lt;p&gt;Run a specific test many times to confirm it’s non-deterministic.&lt;/p&gt;

&lt;p&gt;Repeat a test to confirm flakiness&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pytest-repeat

&lt;span class="c"&gt;# Run a test 100 times — if it fails once, it's flaky&lt;/span&gt;
pytest &lt;span class="nt"&gt;--count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100 tests/test_checkout.py::test_apply_discount

&lt;span class="c"&gt;# Stop on first failure&lt;/span&gt;
pytest &lt;span class="nt"&gt;--count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100 &lt;span class="nt"&gt;-x&lt;/span&gt; tests/test_checkout.py::test_apply_discount
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  pytest-rerunfailures: Detect and retry
&lt;/h3&gt;

&lt;p&gt;This plugin automatically reruns failed tests. Tests that pass on rerun are flaky by definition. Use this for detection, not as a permanent solution.&lt;/p&gt;

&lt;p&gt;Detect flaky tests with reruns&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rerunfailures&lt;/span&gt;

&lt;span class="c1"&gt;# Rerun failed tests up to 3 times
&lt;/span&gt;&lt;span class="n"&gt;pytest&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;reruns&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="c1"&gt;# Add a delay between reruns (useful for timing-dependent flakes)
&lt;/span&gt;&lt;span class="n"&gt;pytest&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;reruns&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;reruns&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="c1"&gt;# Mark specific tests as expected to flake
&lt;/span&gt;&lt;span class="nd"&gt;@pytest.mark.flaky&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reruns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reruns_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_webhook_delivery&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common patterns and fixes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Database state leaking between tests
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests pass individually but fail when run together. Failures involve unexpected records in the database or unique constraint violations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Tests create database records that persist across test boundaries. One test’s setup data becomes another test’s pollution.&lt;/p&gt;

&lt;p&gt;Fix — transaction rollback with autouse fixture&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# conftest.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.orm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sessionmaker&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autouse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;db_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Wrap every test in a transaction that rolls back.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TEST_DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;transaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;)()&lt;/span&gt;

    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;

    &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# For Django projects, use the built-in support:
&lt;/span&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autouse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enable_db_access&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Django&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s db fixture already handles transaction rollback.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# Or in pytest.ini / pyproject.toml:
# [tool.pytest.ini_options]
# django_db_cleanup = "transaction"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;autouse=True&lt;/code&gt; parameter ensures every test gets isolation automatically, without needing to request the fixture explicitly. This prevents new tests from accidentally skipping isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Time-dependent tests
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests that check expiration, scheduling, or duration fail at certain times of day or run slower in CI than expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Tests use &lt;code&gt;datetime.now()&lt;/code&gt; or &lt;code&gt;time.time()&lt;/code&gt; directly, and their assertions depend on the current time.&lt;/p&gt;

&lt;p&gt;Fix — freeze time with freezegun&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;freezegun
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using freezegun in tests&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;freezegun&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;freeze_time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp.auth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_token_expired&lt;/span&gt;

&lt;span class="nd"&gt;@freeze_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-06-15 12:00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_token_expiration&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expires_in&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Still within the hour — not expired
&lt;/span&gt;    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;is_token_expired&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@freeze_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-06-15 14:00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_token_is_expired&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Create a token that expired an hour ago
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;freeze_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-06-15 12:00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expires_in&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Now it's 2pm — token expired at 1pm
&lt;/span&gt;    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;is_token_expired&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# As a fixture for broader use:
&lt;/span&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;frozen_time&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;freeze_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-01-01 00:00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;frozen&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 3: Network calls to real services
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests fail with &lt;code&gt;ConnectionError&lt;/code&gt;, &lt;code&gt;Timeout&lt;/code&gt;, or &lt;code&gt;429 Too Many Requests&lt;/code&gt;. Failures happen in bursts when the external service has issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Tests make real HTTP requests to APIs you don’t control.&lt;/p&gt;

&lt;p&gt;Fix — mock HTTP with responses library&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mocking HTTP calls&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;responses&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp.payment&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;charge_customer&lt;/span&gt;

&lt;span class="nd"&gt;@responses.activate&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_successful_charge&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.stripe.com/v1/charges&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ch_test_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;succeeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;charge_customer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tok_visa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;succeeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@responses.activate&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_payment_gateway_timeout&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.stripe.com/v1/charges&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PaymentError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;charge_customer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tok_visa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# For httpx (async):
# pip install httpx-mock
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncClient&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_httpx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;httpx_mock&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;httpx_mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com/data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx_mock&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 4: File system conflicts
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests fail with &lt;code&gt;FileNotFoundError&lt;/code&gt;, &lt;code&gt;PermissionError&lt;/code&gt;, or produce corrupted output. Especially common with parallel test execution via &lt;code&gt;pytest-xdist&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Multiple tests read/write the same file paths concurrently.&lt;/p&gt;

&lt;p&gt;Fix — use pytest's tmp_path fixture&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_export_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;tmp_path gives each test a unique temporary directory.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;output_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;export.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nf"&gt;export_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;header1,header2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;  &lt;span class="c1"&gt;# header + 100 rows
&lt;/span&gt;
    &lt;span class="c1"&gt;# tmp_path is automatically cleaned up after the test
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_config_loading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create isolated config files per test.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;config_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;config_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
database:
  host: localhost
  port: 5432
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_file&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# For fixtures that need a persistent temp directory across a test class:
&lt;/span&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;shared_tmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path_factory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tmp_path_factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mktemp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shared&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 5: Import side effects
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests fail with errors about database connections already being open, signal handlers being registered twice, or global config having unexpected values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Python modules execute code at import time. If a module opens a database connection, registers a signal handler, or modifies global state when imported, that side effect persists for the entire test session.&lt;/p&gt;

&lt;p&gt;Fix — mock at module level or use importlib&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# If the module connects to a database on import:
# myapp/db.py
# connection = psycopg2.connect(DATABASE_URL)  # Runs at import time!
&lt;/span&gt;
&lt;span class="c1"&gt;# Option 1: Mock before import
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unittest.mock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MagicMock&lt;/span&gt;

&lt;span class="c1"&gt;# Prevent the real module from connecting
&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;modules&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;psycopg2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;myapp.db&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_users&lt;/span&gt;  &lt;span class="c1"&gt;# Now uses mocked connection
&lt;/span&gt;
&lt;span class="c1"&gt;# Option 2: Use importlib for fresh imports
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;importlib&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_with_fresh_module&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;myapp.db&lt;/span&gt;
    &lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myapp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Re-executes module code
&lt;/span&gt;    &lt;span class="c1"&gt;# ... test with fresh state
&lt;/span&gt;
&lt;span class="c1"&gt;# Option 3 (best): Refactor to lazy initialization
# myapp/db.py
&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_connection&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_connection&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quarantining flaky pytest tests
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;pytest-quarantine&lt;/code&gt; plugin lets you mark tests as known-flaky so they don’t block your CI pipeline while you work on fixes.&lt;/p&gt;

&lt;p&gt;pytest-quarantine setup&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pytest-quarantine

&lt;span class="c"&gt;# Generate a quarantine list from your last test run&lt;/span&gt;
pytest &lt;span class="nt"&gt;--quarantine-save&lt;/span&gt; quarantine.txt

&lt;span class="c"&gt;# Run tests, treating quarantined tests as expected failures&lt;/span&gt;
pytest &lt;span class="nt"&gt;--quarantine&lt;/span&gt; quarantine.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a more automated approach, &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Kleore&lt;/a&gt; detects flaky tests automatically from your CI history — no manual tagging needed. It tracks every test that has passed and failed on the same commit, ranks them by impact, and gives you a prioritized fix list with cost estimates.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI configuration tips for pytest
&lt;/h2&gt;

&lt;p&gt;Beyond fixing individual tests, your CI configuration can reduce flakiness across the board.&lt;/p&gt;

&lt;p&gt;pyproject.toml — hardened pytest config&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.pytest.ini_options]&lt;/span&gt;
&lt;span class="c"&gt;# Randomize test order to catch hidden dependencies&lt;/span&gt;
&lt;span class="py"&gt;addopts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"-p randomly --randomly-seed=last"&lt;/span&gt;

&lt;span class="c"&gt;# Strict markers — prevent typos in marker names&lt;/span&gt;
&lt;span class="py"&gt;markers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"slow: marks tests as slow (deselect with '-m "&lt;/span&gt;&lt;span class="err"&gt;not&lt;/span&gt; &lt;span class="err"&gt;slow&lt;/span&gt;&lt;span class="s"&gt;"')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"integration: marks integration tests"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"flaky: marks known flaky tests"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;strict_markers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Timeout per test (requires pytest-timeout)&lt;/span&gt;
&lt;span class="py"&gt;timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

&lt;span class="c"&gt;# Fail on warnings to catch deprecation issues early&lt;/span&gt;
&lt;span class="py"&gt;filterwarnings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"ignore::DeprecationWarning:third_party_lib.*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;.github/workflows/test.yml — pytest CI config&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;PYTHONDONTWRITEBYTECODE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
      &lt;span class="na"&gt;PYTHONHASHSEED&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0"&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.python-version"&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pip"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install -r requirements-test.txt&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pytest -x --tb=short -q&lt;/span&gt;
        &lt;span class="c1"&gt;# -x: stop on first failure&lt;/span&gt;
        &lt;span class="c1"&gt;# --tb=short: concise tracebacks&lt;/span&gt;
        &lt;span class="c1"&gt;# -q: quiet output&lt;/span&gt;

  &lt;span class="c1"&gt;# For parallel execution with pytest-xdist:&lt;/span&gt;
  &lt;span class="na"&gt;test-parallel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.python-version"&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pip"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install -r requirements-test.txt&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pytest --forked -n auto&lt;/span&gt;
        &lt;span class="c1"&gt;# --forked: each test in its own subprocess&lt;/span&gt;
        &lt;span class="c1"&gt;# -n auto: use all available CPUs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting &lt;code&gt;PYTHONDONTWRITEBYTECODE=1&lt;/code&gt; prevents &lt;code&gt;.pyc&lt;/code&gt; file conflicts in parallel runs. &lt;code&gt;PYTHONHASHSEED=0&lt;/code&gt; makes dictionary ordering deterministic, eliminating a whole class of order-dependent flakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop guessing which pytest tests are flaky.
&lt;/h3&gt;

&lt;p&gt;Kleore scans your GitHub Actions history and gives you a ranked list of every flaky test — with failure rates, cost estimates, and fix priority. Free to start.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt; &lt;a href="https://dev.to/blog/calculator"&gt;Calculate my CI waste&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-tests-jest"&gt;How to Find and Fix Flaky Tests in Jest&lt;/a&gt; — The JavaScript equivalent of this guide.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Framework-agnostic patterns for CI-level flakiness.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — The dollar math to justify the fix.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — Plug in your team’s numbers and see the impact.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to Fix Flaky Tests in GitHub Actions</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Fri, 17 Apr 2026 00:36:33 +0000</pubDate>
      <link>https://dev.to/byteframe/how-to-fix-flaky-tests-in-github-actions-3e32</link>
      <guid>https://dev.to/byteframe/how-to-fix-flaky-tests-in-github-actions-3e32</guid>
      <description>&lt;h1&gt;
  
  
  How to Fix Flaky Tests in GitHub Actions
&lt;/h1&gt;

&lt;p&gt;Six patterns that cause 90% of test flakiness — and how to fix each one with concrete code changes.&lt;/p&gt;

&lt;p&gt;March 21, 2026 · 12 min read&lt;/p&gt;

&lt;p&gt;You know the drill: CI goes red, you check the logs, the failure looks unrelated to your changes. You hit re-run. It passes. You merge. And the cycle repeats tomorrow.&lt;/p&gt;

&lt;p&gt;This guide covers the six most common patterns behind flaky tests in GitHub Actions and gives you concrete fixes for each. Not theories — actual code changes and configuration updates you can apply today.&lt;/p&gt;

&lt;p&gt;Before you start fixing&lt;/p&gt;

&lt;p&gt;The first step is knowing &lt;em&gt;which&lt;/em&gt; tests are flaky and &lt;em&gt;how often&lt;/em&gt; they fail. If you’re guessing based on Slack complaints, you’re working blind. &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Kleore analyzes your CI history&lt;/a&gt; and ranks every flaky test by failure rate and cost — so you fix the worst ones first.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Timing &amp;amp; race conditions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Test passes locally, fails intermittently in CI. Often involves UI tests, async operations, or anything that waits for a condition to become true.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; GitHub Actions runners have variable performance. A 2-core runner under load is slower than your M3 MacBook. Tests that assume operations complete within a specific window break when the runner is under pressure.&lt;/p&gt;

&lt;p&gt;The fix: Replace fixed waits with condition-based polling.&lt;/p&gt;

&lt;p&gt;Before — fragile timing&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bad: assumes the element appears within 500ms&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeInTheDocument&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After — condition-based wait&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Good: waits for the condition, not the clock&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitFor&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toBeInTheDocument&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For E2E tests with Playwright or Cypress, use their built-in auto-waiting mechanisms instead of explicit sleeps. For backend tests, poll with exponential backoff rather than sleeping.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Shared mutable state
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Test passes in isolation (&lt;code&gt;it.only&lt;/code&gt;) but fails when run with the full suite. Or it fails only when a specific other test runs before it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Tests share a database, in-memory store, filesystem, or global variable. Test A writes data that Test B doesn’t expect, or Test A forgets to clean up.&lt;/p&gt;

&lt;p&gt;The fix: Isolate test state completely.&lt;/p&gt;

&lt;p&gt;Database isolation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Run each test in a transaction that rolls back&lt;/span&gt;
&lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BEGIN&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;afterEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ROLLBACK&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unique identifiers per test&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Instead of hardcoding IDs that collide:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`test-user-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re using a shared test database, consider running each test file in its own database schema or using containers. The small overhead is worth the determinism.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. External service dependencies
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests fail with network timeouts, 503 errors, or rate-limit responses. Usually happens in bursts (when the external service has issues).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Your tests make real HTTP calls to APIs you don’t control — payment gateways, auth providers, third-party data services.&lt;/p&gt;

&lt;p&gt;The fix: Mock at the HTTP boundary, not the function level.&lt;/p&gt;

&lt;p&gt;MSW (Mock Service Worker) approach&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;HttpResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;msw&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;setupServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;msw/node&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setupServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.stripe.com/v1/charges&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ch_test_123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;succeeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nf"&gt;beforeAll&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="nf"&gt;afterEach&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resetHandlers&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="nf"&gt;afterAll&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use MSW or similar tools to intercept HTTP at the network level. This tests your actual HTTP client code (headers, serialization, error handling) while eliminating network flakiness. Reserve real API calls for a small set of integration tests that run separately.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Environment differences
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests pass on macOS, fail on Linux. Or pass with Node 20, fail with Node 22. Or pass Monday through Friday, fail on weekends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Assumptions baked into tests about the OS, timezone, locale, filesystem behavior, or available system resources.&lt;/p&gt;

&lt;p&gt;The fix: Pin your CI environment explicitly.&lt;/p&gt;

&lt;p&gt;.github/workflows/test.yml&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;TZ&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;UTC&lt;/span&gt;
      &lt;span class="na"&gt;LC_ALL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;C.UTF-8&lt;/span&gt;
      &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.node-version"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key practices: always set &lt;code&gt;TZ=UTC&lt;/code&gt;, use a &lt;code&gt;.node-version&lt;/code&gt; file instead of hardcoding versions, and test with the same OS as production. If your tests compare file paths, normalize separators.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Port &amp;amp; resource conflicts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; &lt;code&gt;EADDRINUSE&lt;/code&gt; errors, database connection failures, or file lock errors. Happens especially when tests run in parallel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Multiple test processes or test files trying to bind to the same port, open the same file, or connect to the same database concurrently.&lt;/p&gt;

&lt;p&gt;The fix: Use dynamic port allocation.&lt;/p&gt;

&lt;p&gt;Dynamic port allocation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Instead of: server.listen(3000)&lt;/span&gt;
&lt;span class="c1"&gt;// Use port 0 to let the OS assign an available port&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;address&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;AddressInfo&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Pass port to your test client&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createTestClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`http://localhost:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For database tests, use unique database names per test worker or use Docker containers. For file-based tests, use &lt;code&gt;os.tmpdir()&lt;/code&gt; with random suffixes.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Test order dependency
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Tests pass when run in the default order, fail when randomized or when a specific test file is skipped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Test A sets up state that Test B implicitly depends on. When A doesn’t run first, B fails.&lt;/p&gt;

&lt;p&gt;The fix: Make every test self-contained.&lt;/p&gt;

&lt;p&gt;Self-contained test&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;checkout flow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Each test creates its own state from scratch&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;applies discount code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Setup: create the user, cart, and product for this test&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createTestUser&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createTestProduct&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cart&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createCart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Act&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;applyDiscount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SAVE20&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Assert&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable test randomization to catch these issues early. Jest supports &lt;code&gt;--randomize&lt;/code&gt;, and Vitest can be configured with &lt;code&gt;sequence.shuffle: true&lt;/code&gt;. If your tests slow down from redundant setup, invest in fast factory functions — not shared state.&lt;/p&gt;

&lt;h2&gt;
  
  
  The meta-fix: Retry as a bandaid, not a cure
&lt;/h2&gt;

&lt;p&gt;GitHub Actions supports automatic retry via &lt;code&gt;actions/retry&lt;/code&gt; or workflow re-run. Many teams add retry logic as a first response:&lt;/p&gt;

&lt;p&gt;Retry step (bandaid)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nick-fields/retry@v3&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max_attempts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;timeout_minutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is fine as a &lt;strong&gt;short-term bandaid&lt;/strong&gt; while you fix the root cause. But retrying hides the problem. A test that fails 30% of the time and gets retried 3 times will &lt;em&gt;appear&lt;/em&gt; to pass 99.7% of the time — while still costing you 3x the CI minutes and masking the underlying issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retry to unblock your team today. Fix the root cause this sprint.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to prioritize which tests to fix first
&lt;/h2&gt;

&lt;p&gt;Not all flaky tests are equal. A test that flakes once a month is annoying. A test that flakes daily on your critical path is an emergency. Prioritize by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Failure frequency&lt;/strong&gt; — How often does it flake? Daily flakes first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blast radius&lt;/strong&gt; — Does it block all PRs, or just one workflow?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost per failure&lt;/strong&gt; — Long test suites cost more per re-run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix complexity&lt;/strong&gt; — Can you fix it in an hour, or does it need a refactor?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Let Kleore do the prioritization for you.
&lt;/h3&gt;

&lt;p&gt;Kleore analyzes your GitHub Actions history and ranks every flaky test by failure rate, cost, and impact. You get a prioritized list with dollar amounts — so you know exactly where to start.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/what-are-flaky-tests"&gt;What Are Flaky Tests?&lt;/a&gt; — The fundamentals of test flakiness.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-cost"&gt;How Much Do Flaky Tests Actually Cost?&lt;/a&gt; — The dollar math to justify the fix.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — Plug in your team’s numbers and see the impact.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>How Much Do Flaky Tests Actually Cost?</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Fri, 17 Apr 2026 00:10:34 +0000</pubDate>
      <link>https://dev.to/byteframe/how-much-do-flaky-tests-actually-cost-1mjh</link>
      <guid>https://dev.to/byteframe/how-much-do-flaky-tests-actually-cost-1mjh</guid>
      <description>&lt;h1&gt;
  
  
  How Much Do Flaky Tests Actually Cost?
&lt;/h1&gt;

&lt;p&gt;Spoiler: it’s not just CI minutes. The real number will make your engineering manager wince.&lt;/p&gt;

&lt;p&gt;March 21, 2026 · 10 min read&lt;/p&gt;

&lt;p&gt;When teams talk about the cost of flaky tests, they usually start with CI minutes. That’s the visible part — the line item on your GitHub bill. But CI compute is maybe 10% of the real cost. The other 90% is human time, delayed shipping, and the slow erosion of engineering culture.&lt;/p&gt;

&lt;p&gt;Let’s break it down with real numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: CI compute
&lt;/h2&gt;

&lt;p&gt;This is the easy math. Every time a flaky test causes a re-run, you’re paying for the same CI job twice.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Example team&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Average CI run duration&lt;/td&gt;
&lt;td&gt;12 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flaky-caused re-runs per week&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wasted CI minutes per week&lt;/td&gt;
&lt;td&gt;480 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Actions cost per minute&lt;/td&gt;
&lt;td&gt;$0.008&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly CI waste&lt;/td&gt;
&lt;td&gt;~$60/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;$60 a month? That’s nothing, right? That’s the trap. CI compute is cheap enough that nobody escalates it. But it’s the tip of the iceberg.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: Developer time
&lt;/h2&gt;

&lt;p&gt;This is where the real money goes. Every flaky failure triggers a human response:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Developer sees red CI badge on their PR&lt;/li&gt;
&lt;li&gt;Opens CI logs, scrolls through output&lt;/li&gt;
&lt;li&gt;Tries to figure out if the failure is real or flaky&lt;/li&gt;
&lt;li&gt;Decides to re-run (or asks a teammate)&lt;/li&gt;
&lt;li&gt;Waits for the re-run to finish&lt;/li&gt;
&lt;li&gt;Resumes their previous work — but the context switch already happened&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Research on context switching shows it takes an average of &lt;strong&gt;23 minutes&lt;/strong&gt; to regain deep focus after an interruption. Even if the investigation itself takes only 5 minutes, the true cost per interruption is closer to 30 minutes of productive time.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Example team&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Flaky interruptions per week&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context-switch cost per interruption&lt;/td&gt;
&lt;td&gt;30 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total developer hours lost per week&lt;/td&gt;
&lt;td&gt;20 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average fully-loaded eng cost&lt;/td&gt;
&lt;td&gt;$85/hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly developer time waste&lt;/td&gt;
&lt;td&gt;~$6,800/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That’s over &lt;strong&gt;100x&lt;/strong&gt; the CI compute cost. And this is for a modest team with a moderate flaky test problem. A team of 30 engineers with a bad flaky test culture can easily burn $20,000+/month in lost productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Shipping velocity
&lt;/h2&gt;

&lt;p&gt;Flaky tests don’t just waste time — they slow down how fast you ship.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;● &lt;strong&gt;PRs stay open longer.&lt;/strong&gt; A PR that gets a flaky red build sits in review limbo. The author re-runs, waits, and the reviewer has moved on to something else. Round-trip time expands from hours to days.&lt;/li&gt;
&lt;li&gt;● &lt;strong&gt;Merge conflicts compound.&lt;/strong&gt; Longer PR lifetimes mean more merge conflicts. Each conflict is another context switch, another re-run, another delay.&lt;/li&gt;
&lt;li&gt;● &lt;strong&gt;Deploys batch up.&lt;/strong&gt; When teams can’t merge quickly, changes pile up into larger, riskier deploys. The opposite of continuous delivery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the hardest cost to quantify but often the most painful. Your competitors ship daily while your team spends a quarter of their time fighting CI noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 4: Trust erosion
&lt;/h2&gt;

&lt;p&gt;This is the most dangerous cost because it’s invisible until it’s catastrophic.&lt;/p&gt;

&lt;p&gt;When tests are unreliable, developers develop a reflex: &lt;em&gt;“It’s probably just flaky.”&lt;/em&gt; This is rational behavior given unreliable signals. But it means &lt;strong&gt;real failures get ignored too.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The progression looks like this:&lt;/p&gt;

&lt;p&gt;Phase 1: Team re-runs flaky tests and reports them in Slack&lt;/p&gt;

&lt;p&gt;Phase 2: Team re-runs without reporting — it's just background noise&lt;/p&gt;

&lt;p&gt;Phase 3: Team merges with red CI, assuming flakiness&lt;/p&gt;

&lt;p&gt;Phase 4: A real bug slips through. "We thought it was flaky."&lt;/p&gt;

&lt;p&gt;Phase 5: Production incident. Post-mortem identifies eroded CI trust as root cause.&lt;/p&gt;

&lt;h2&gt;
  
  
  The total picture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost layer&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;th&gt;Visibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CI compute&lt;/td&gt;
&lt;td&gt;$60&lt;/td&gt;
&lt;td&gt;On your bill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer time&lt;/td&gt;
&lt;td&gt;$6,800&lt;/td&gt;
&lt;td&gt;Hidden&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shipping velocity&lt;/td&gt;
&lt;td&gt;$???&lt;/td&gt;
&lt;td&gt;Invisible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trust erosion&lt;/td&gt;
&lt;td&gt;$???&lt;/td&gt;
&lt;td&gt;Invisible until incident&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;td&gt;$7,000 – $25,000+/month&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The irony: the cost that shows up on your bill (CI minutes) is the smallest component. The costs that don’t show up anywhere — developer time, delayed shipping, trust — are 100x larger.&lt;/p&gt;

&lt;h2&gt;
  
  
  What can you actually do about it?
&lt;/h2&gt;

&lt;p&gt;Step one is visibility. You can’t fix what you can’t see. Most teams have no idea how many flaky tests they have, which ones are the worst, or what they cost.&lt;/p&gt;

&lt;p&gt;That’s the gap Kleore fills. It connects to your GitHub repos, analyzes your CI run history, and gives you a ranked list of every flaky test — with dollar costs attached. No configuration, no test framework changes, no new CLI tools. Just the data you need to start making decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  See your real CI waste in two minutes.
&lt;/h3&gt;

&lt;p&gt;Install the Kleore GitHub App and get a dollar-cost breakdown of every flaky test in your repos. Free to start. No credit card required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/what-are-flaky-tests"&gt;What Are Flaky Tests?&lt;/a&gt; — A primer on what causes test flakiness.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Practical fixes for the most common root causes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — Plug in your numbers and see the real cost.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
    <item>
      <title>GitHub Actions CI Is Slow? Here’s What’s Actually Wasting Your Time</title>
      <dc:creator>Mihir Shinde</dc:creator>
      <pubDate>Thu, 16 Apr 2026 23:02:01 +0000</pubDate>
      <link>https://dev.to/byteframe/github-actions-slow-whats-actually-wasting-your-time-4elh</link>
      <guid>https://dev.to/byteframe/github-actions-slow-whats-actually-wasting-your-time-4elh</guid>
      <description>&lt;h1&gt;
  
  
  GitHub Actions CI Is Slow? Here’s What’s Actually Wasting Your Time
&lt;/h1&gt;

&lt;p&gt;The top 5 time wasters in GitHub Actions pipelines — and how to fix each one with real workflow examples.&lt;/p&gt;

&lt;p&gt;March 28, 2026 · 13 min read&lt;/p&gt;

&lt;p&gt;Your GitHub Actions pipeline takes 20 minutes. Your team runs it 50 times a day. That’s 16 hours of CI compute daily — and most of it is waste. Developers context-switch while waiting, merge queues back up, and by the end of the week your team has lost an entire engineer’s worth of productive time to a slow pipeline.&lt;/p&gt;

&lt;p&gt;The fix isn’t “buy bigger runners.” It’s eliminating the waste that’s already in your pipeline. Here are the five biggest time wasters and how to fix each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden cost of slow CI
&lt;/h2&gt;

&lt;p&gt;Slow CI doesn’t just waste compute. It creates a cascade of productivity losses that compound across your team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developer wait time:&lt;/strong&gt; A developer waiting 20 minutes for CI is not coding. They’re checking Slack, reading Hacker News, or starting a second task that creates costly context-switching when CI finishes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context switching:&lt;/strong&gt; Studies show it takes 23 minutes to fully refocus after a context switch. A 20-minute CI wait often creates a 43-minute productivity gap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merge queue bottlenecks:&lt;/strong&gt; When CI takes 20 minutes, your merge queue can process 3 PRs per hour at most (serially). With a team of 10 developers, PRs stack up and block each other.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment velocity:&lt;/strong&gt; Slow CI means fewer deployments per day, which means larger batch sizes, which means more risk per deploy. It’s a vicious cycle.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The math is simple: if your CI takes 20 minutes and you have 10 developers, optimizing it to 8 minutes saves 2 hours of developer wait time per day. At $150/hour loaded engineering cost, that’s $300/day or &lt;strong&gt;$78,000/year&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;How much is your CI actually wasting?&lt;/p&gt;

&lt;p&gt;Use our &lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; to plug in your team’s numbers and see the dollar impact. Or &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;install Kleore&lt;/a&gt; for an automated analysis of your actual CI history.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time waster #1: Flaky test reruns
&lt;/h2&gt;

&lt;p&gt;This is the single biggest source of CI waste, and it’s the one most teams underestimate. When a flaky test fails, developers re-run the entire pipeline. That re-run wastes 100% of the compute — you’re running the same tests again just to get a different roll of the dice.&lt;/p&gt;

&lt;p&gt;The numbers are staggering. In our analysis of 10,000 GitHub Actions workflow runs, &lt;a href="https://dev.to/blog/flaky-test-data-analysis"&gt;we found that 15-25% of CI compute is wasted on flaky test reruns&lt;/a&gt;. That means if you spend $10,000/month on GitHub Actions, $1,500 to $2,500 is literally burned on re-running tests that aren’t actually broken.&lt;/p&gt;

&lt;p&gt;The fix: Identify and quarantine flaky tests.&lt;/p&gt;

&lt;p&gt;You can’t fix what you can’t measure. Start by identifying which tests are flaky, then quarantine them so they don’t block CI while you fix the root causes.&lt;/p&gt;

&lt;p&gt;.github/workflows/test.yml — retry with reporting&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.node-version"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run tests with retry reporting&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;# Run tests, capture exit code&lt;/span&gt;
          &lt;span class="s"&gt;npm test -- --json --outputFile=test-results.json || true&lt;/span&gt;

          &lt;span class="s"&gt;# If tests failed, check if it's a known flaky test&lt;/span&gt;
          &lt;span class="s"&gt;if [ -f test-results.json ]; then&lt;/span&gt;
            &lt;span class="s"&gt;node scripts/check-flaky.js test-results.json&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a deeper dive on fixing flaky tests specifically, see our guides for &lt;a href="https://dev.to/blog/flaky-tests-jest"&gt;Jest&lt;/a&gt; and &lt;a href="https://dev.to/blog/flaky-tests-pytest"&gt;pytest&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time waster #2: No dependency caching
&lt;/h2&gt;

&lt;p&gt;Every CI run that starts with &lt;code&gt;npm install&lt;/code&gt; or &lt;code&gt;pip install -r requirements.txt&lt;/code&gt; from scratch is downloading the same packages over and over. For a typical Node.js project, this wastes 1-3 minutes per run. Multiply that by 50 runs/day and you’re losing 1-2.5 hours daily.&lt;/p&gt;

&lt;p&gt;The fix: Use actions/cache or built-in caching.&lt;/p&gt;

&lt;p&gt;Node.js — cache node_modules&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.node-version"&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm"&lt;/span&gt;  &lt;span class="c1"&gt;# Built-in npm cache support&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;    &lt;span class="c1"&gt;# Uses cache when lockfile hasn't changed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python — cache pip packages&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.python-version"&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pip"&lt;/span&gt;  &lt;span class="c1"&gt;# Built-in pip cache support&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install -r requirements.txt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Custom cache for monorepos or complex setups&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Cache dependencies&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;node_modules&lt;/span&gt;
      &lt;span class="s"&gt;~/.cache/Cypress&lt;/span&gt;
      &lt;span class="s"&gt;.next/cache&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deps-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}&lt;/span&gt;
    &lt;span class="na"&gt;restore-keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;deps-${{ runner.os }}-&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pro tip: &lt;code&gt;npm ci&lt;/code&gt; is faster than &lt;code&gt;npm install&lt;/code&gt; in CI because it skips the lockfile resolution step. Always use &lt;code&gt;npm ci&lt;/code&gt; when you have a lockfile.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time waster #3: Running all tests on every PR
&lt;/h2&gt;

&lt;p&gt;If a PR only changes a README file, there’s no reason to run your entire test suite. Yet most teams configure their pipeline to run everything on every push. For large monorepos, this wastes enormous amounts of compute.&lt;/p&gt;

&lt;p&gt;The fix: Use path filters and affected test detection.&lt;/p&gt;

&lt;p&gt;Path filters — skip tests for docs-only changes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Only run tests when code files change&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;package.json"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;package-lock.json"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.github/workflows/test.yml"&lt;/span&gt;
    &lt;span class="na"&gt;paths-ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Never run tests for these changes&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**.md"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs/**"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.vscode/**"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conditional jobs based on changed files&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;changes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.filter.outputs.backend }}&lt;/span&gt;
      &lt;span class="na"&gt;frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.filter.outputs.frontend }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dorny/paths-filter@v3&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;filter&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;backend:&lt;/span&gt;
              &lt;span class="s"&gt;- "api/**"&lt;/span&gt;
              &lt;span class="s"&gt;- "tests/api/**"&lt;/span&gt;
            &lt;span class="s"&gt;frontend:&lt;/span&gt;
              &lt;span class="s"&gt;- "web/**"&lt;/span&gt;
              &lt;span class="s"&gt;- "tests/web/**"&lt;/span&gt;

  &lt;span class="na"&gt;test-backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;changes&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.changes.outputs.backend == 'true' }}&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run test:backend&lt;/span&gt;

  &lt;span class="na"&gt;test-frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;changes&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.changes.outputs.frontend == 'true' }}&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run test:frontend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Time waster #4: Sequential jobs that could be parallel
&lt;/h2&gt;

&lt;p&gt;Many teams structure their pipeline as a linear chain: lint, then type-check, then unit tests, then integration tests, then e2e tests. If linting takes 2 minutes and tests take 15 minutes, you’re waiting 17 minutes total. But lint and tests don’t depend on each other — they can run simultaneously.&lt;/p&gt;

&lt;p&gt;The fix: Parallelize independent jobs and use matrix strategy.&lt;/p&gt;

&lt;p&gt;Parallel independent jobs&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;lint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run lint&lt;/span&gt;

  &lt;span class="na"&gt;typecheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run typecheck&lt;/span&gt;

  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;shard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;2&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;3&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;4&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Split tests across 4 runners&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npx jest --shard=${{ matrix.shard }}/4&lt;/span&gt;

  &lt;span class="c1"&gt;# Gate deployment on all checks passing&lt;/span&gt;
  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;lint&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;typecheck&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "All checks passed, deploying..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this setup, lint (2 min), typecheck (1 min), and 4 parallel test shards (4 min each instead of 16 min total) all run simultaneously. Total wall time drops from 19 minutes to about 4 minutes. You pay for more compute-minutes, but your developers get feedback 5x faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time waster #5: Oversized Docker images
&lt;/h2&gt;

&lt;p&gt;If your CI builds Docker images, the image size directly impacts build time, push time, and pull time. A 2GB image takes minutes to push to a registry and minutes to pull on every deploy. Most of that size is build dependencies and tooling that aren’t needed at runtime.&lt;/p&gt;

&lt;p&gt;The fix: Multi-stage builds with slim base images.&lt;/p&gt;

&lt;p&gt;Dockerfile — multi-stage build&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stage 1: Build&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:20-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm run build

&lt;span class="c"&gt;# Stage 2: Production (only runtime dependencies)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:20-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runner&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;

&lt;span class="c"&gt;# Only copy what's needed to run&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/package*.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--omit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/dist ./dist&lt;/span&gt;

&lt;span class="c"&gt;# Result: ~150MB instead of ~1.5GB&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", "dist/server.js"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub Actions — Docker layer caching&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and push Docker image&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp:latest&lt;/span&gt;
    &lt;span class="na"&gt;cache-from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha&lt;/span&gt;    &lt;span class="c1"&gt;# Use GitHub Actions cache&lt;/span&gt;
    &lt;span class="na"&gt;cache-to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha,mode=max&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to measure CI waste
&lt;/h2&gt;

&lt;p&gt;Before optimizing, measure where your time actually goes. GitHub provides a built-in usage report, but it only shows total minutes. To understand &lt;em&gt;why&lt;/em&gt; those minutes are being spent, you need more granularity.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions usage report:&lt;/strong&gt; Go to Settings → Billing → Actions to see total minutes consumed. This gives you the dollar baseline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow run duration trends:&lt;/strong&gt; Use the GitHub API or &lt;code&gt;gh run list&lt;/code&gt; to track how your workflow duration has changed over time. If it’s trending up, something is degrading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job-level timing:&lt;/strong&gt; Look at individual job durations in the Actions tab. The longest job is your bottleneck — that’s where optimization has the biggest impact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flaky test cost:&lt;/strong&gt; &lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Kleore&lt;/a&gt; specifically measures the cost of flaky test reruns — how many minutes are wasted re-running workflows that failed due to flaky tests rather than real bugs.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Quick wins checklist
&lt;/h2&gt;

&lt;p&gt;Here’s a prioritized checklist you can work through this week. Each item is independent — start with whichever is easiest for your setup.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enable dependency caching&lt;/strong&gt; — 5 minutes to set up, saves 1-3 minutes per run. Use &lt;code&gt;actions/setup-node&lt;/code&gt; or &lt;code&gt;actions/setup-python&lt;/code&gt; with the &lt;code&gt;cache&lt;/code&gt; option.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallelize lint/typecheck/test&lt;/strong&gt; — 15 minutes to restructure your workflow. Independent jobs run simultaneously instead of sequentially.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add path filters&lt;/strong&gt; — 10 minutes to add &lt;code&gt;paths&lt;/code&gt; and &lt;code&gt;paths-ignore&lt;/code&gt; to your workflow trigger. Docs-only PRs skip CI entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shard your test suite&lt;/strong&gt; — 20 minutes to set up matrix strategy. Split tests across 2-4 runners for a proportional speedup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identify and quarantine flaky tests&lt;/strong&gt; — 5 minutes to install Kleore. Get a ranked list of every flaky test, then quarantine the worst offenders to stop wasting reruns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use multi-stage Docker builds&lt;/strong&gt; — 30 minutes to refactor your Dockerfile. Cuts image size by 50-90%, which speeds up both build and deploy.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  See how much your CI is wasting.
&lt;/h3&gt;

&lt;p&gt;Kleore scans your GitHub Actions history and shows you exactly where your CI minutes go — flaky reruns, slow tests, and wasted compute. You get a dollar amount and a prioritized fix list.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apps/kleore" rel="noopener noreferrer"&gt;Scan my repos — free&lt;/a&gt; &lt;a href="https://dev.to/blog/calculator"&gt;Calculate my CI waste&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/fix-flaky-tests-github-actions"&gt;How to Fix Flaky Tests in GitHub Actions&lt;/a&gt; — Deep dive into the #1 CI time waster.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-tests-jest"&gt;How to Find and Fix Flaky Tests in Jest&lt;/a&gt; — Jest-specific flaky test patterns and fixes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-tests-pytest"&gt;How to Find and Fix Flaky Tests in pytest&lt;/a&gt; — Python-specific flaky test patterns and fixes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/flaky-test-data-analysis"&gt;We Analyzed 10,000 GitHub Actions Runs&lt;/a&gt; — The data behind CI waste claims.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/calculator"&gt;Flaky Test Cost Calculator&lt;/a&gt; — Plug in your team’s numbers and see the impact.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ci</category>
      <category>testing</category>
      <category>github</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
