DEV Community

Alex Aslam
Alex Aslam

Posted on

Slay the Flaky Test Dragon: How to Quarantine Monorepo Chaos Without Losing Your Mind

Hey there, fellow developer! 👋 Let’s talk about the silent saboteur of every monorepo pipeline: flaky tests. You know the ones—tests that pass 90% of the time but fail randomly, leaving you squinting at logs at 2 AM, muttering, “But it worked on my machine!”

In a monorepo, flaky tests aren’t just annoying—they’re codebase-wide grenades. One shaky test can block deployments for all projects. But fear not! Let’s turn those flaky foes into harmless quirks with quarantine pipelines and battle-tested hacks.


What Makes Flaky Tests So Dangerous in Monorepos?

Flaky tests fail unpredictably due to:

  • Race conditions (e.g., timing issues in async code).
  • Shared state (tests stepping on each other’s toes).
  • External dependencies (APIs, databases acting up).

In monorepos, these failures ripple across projects, grinding pipelines to a halt. 😱


Strategy 1: The Quarantine Pipeline 🚨

Don’t let flaky tests poison your main CI/CD pipeline. Isolate them!

Step 1: Identify the Culprits

Use tools like Jest Circus, pytest-flake-finder, or custom scripts to detect flaky tests by rerunning them N times:

# Example: Rerun failing tests 3 times to confirm flakiness  
npm test -- --ci --runInBand --detectOpenHandles --testFailureRetry=3  
Enter fullscreen mode Exit fullscreen mode

Step 2: Move Them to Quarantine

Create a separate pipeline/job just for flaky tests:

# GitHub Actions Example  
jobs:  
  main-tests:  
    runs-on: ubuntu-latest  
    steps:  
      - run: npm test -- --excludeFlaky  

  quarantine:  
    needs: main-tests  
    runs-on: ubuntu-latest  
    steps:  
      - run: npm test -- --onlyFlaky  
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • Main pipeline stays fast and stable.
  • Quarantine runs flaky tests after the main build, so they don’t block deploys.
  • You get visibility into flaky tests without derailing progress.

Strategy 2: Stability Hacks to Neutralize Flakiness 🛡️

Hack 1: Retry with Care

Retry flaky tests in the quarantine pipeline only:

# GitLab CI Example  
quarantine:  
  retry:  
    max: 2  # Retry failed tests up to 2 times  
  script:  
    - npm run test:flaky  
Enter fullscreen mode Exit fullscreen mode

Pro Tip: Avoid retries in your main pipeline—they mask real issues!

Hack 2: Kill Shared State

  • Isolate databases: Use Docker containers with fresh DB instances per test.
  • Randomize test order: Prevent tests from depending on execution sequence.
  jest --shuffle  # Randomize test order  
Enter fullscreen mode Exit fullscreen mode

Hack 3: Timeout Buffers

Add grace periods for slow operations:

// Jest example  
test("slow API test", async () => {  
  jest.setTimeout(15000); // 15s timeout instead of default 5s  
});  
Enter fullscreen mode Exit fullscreen mode

Real-World Win: How Startup X Saved Their Sanity

A 20-project monorepo was failing 30% of deployments due to flaky tests. They:

  1. Moved 15 flaky tests to a quarantine pipeline.
  2. Fixed 10/15 with better timeout handling and DB isolation.
  3. Reduced pipeline failures by 80% in 2 weeks.

Pitfalls to Avoid

  • Ignoring Quarantine: Don’t let flaky tests pile up—fix them incrementally.
  • Over-Retrying: Retries != fixes. Use them sparingly.
  • No Monitoring: Track flaky test frequency with dashboards (e.g., Datadog, Grafana).

Tools to Fight the Flake

  • CircleCI Flaky Test Insights: Auto-detects and surfaces flaky tests.
  • Buildkite Test Analytics: Visualizes flaky test trends.
  • Custom Scripts: Automate flaky test detection with cron jobs.

Your Action Plan

  1. Audit: Find flaky tests with reruns and logs.
  2. Quarantine: Move them to a separate pipeline.
  3. Fix: Tackle the root cause (shared state, timeouts, etc.).
  4. Monitor: Prevent new flaky tests from creeping in.

Final Thought: Flaky tests are like weeds—ignore them, and they’ll take over your garden. But with quarantine pipelines and smart hacks, you’ll reclaim your CI/CD sanity.

Got a flaky test horror story or hack? Share it below—let’s build a flaky-free future! 🚀

Top comments (0)