DEV Community

Mihir Shinde
Mihir Shinde

Posted on • Originally published at kleore.com

How to Fix Flaky Tests in GitHub Actions

You know the drill: CI goes red, you check the logs, the failure looks unrelated to your changes. You hit re-run. It passes. You merge. And the cycle repeats tomorrow.

This guide covers the six most common patterns behind flaky tests in GitHub Actions and gives you concrete fixes for each. Not theories — actual code changes and configuration updates you can apply today.

Before you start fixing

The first step is knowing which tests are flaky and how often they fail. If you're guessing based on Slack complaints, you're working blind. Kleore analyzes your CI history and ranks every flaky test by failure rate and cost — so you fix the worst ones first.

1. Timing & race conditions

Symptom: Test passes locally, fails intermittently in CI. Often involves UI tests, async operations, or anything that waits for a condition to become true.

Root cause: GitHub Actions runners have variable performance. A 2-core runner under load is slower than your M3 MacBook. Tests that assume operations complete within a specific window break when the runner is under pressure.

The fix: Replace fixed waits with condition-based polling. For E2E tests with Playwright or Cypress, use their built-in auto-waiting mechanisms instead of explicit sleeps. For backend tests, poll with exponential backoff rather than sleeping.

2. Shared mutable state

Symptom: Test passes in isolation (it.only) but fails when run with the full suite.

Root cause: Tests share a database, in-memory store, filesystem, or global variable. Test A writes data that Test B doesn't expect, or Test A forgets to clean up.

The fix: Isolate test state completely. If you're using a shared test database, consider running each test file in its own database schema or using containers.

3. External service dependencies

Symptom: Tests fail with network timeouts, 503 errors, or rate-limit responses.

Root cause: Your tests make real HTTP calls to APIs you don't control.

The fix: Mock at the HTTP boundary, not the function level. Use MSW or similar tools to intercept HTTP at the network level.

4. Environment differences

Symptom: Tests pass on macOS, fail on Linux. Or pass with Node 20, fail with Node 22.

Root cause: Assumptions baked into tests about the OS, timezone, locale, filesystem behavior, or available system resources.

The fix: Pin your CI environment explicitly. Always set TZ=UTC, use a .node-version file, and normalize filesystem paths.

5. Port & resource conflicts

Symptom: EADDRINUSE errors, database connection failures, or file lock errors — especially when tests run in parallel.

The fix: Use dynamic port allocation and unique database names per test worker.

6. Test order dependency

Symptom: Tests pass when run in the default order, fail when randomized.

Root cause: Test A sets up state that Test B implicitly depends on.

The fix: Enable test randomization (--randomize in Jest, sequence.shuffle in Vitest) to catch these issues early. Make every test self-contained.

The meta-fix: Retry as a bandaid, not a cure

Retry logic hides the problem. A test that fails 30% of the time and gets retried 3 times will appear to pass 99.7% of the time — while still costing you 3x the CI minutes and masking the underlying issue.

Retry to unblock your team today. Fix the root cause this sprint.

How to prioritize which tests to fix first

Prioritize by failure frequency, blast radius, cost per failure, and fix complexity.

Or let Kleore do the prioritization for you. Kleore analyzes your GitHub Actions history and ranks every flaky test by failure rate, cost, and impact. You get a prioritized list with dollar amounts — so you know exactly where to start.

Scan my repos — free


Also read: What Are Flaky Tests? | How Much Do Flaky Tests Actually Cost?

Top comments (0)