Mihir Shinde

Posted on Apr 17 • Edited on May 8 • Originally published at kleore.com

How to Fix Flaky Tests in GitHub Actions

#ci #testing #github #devops

How to Fix Flaky Tests in GitHub Actions

Six patterns that cause 90% of test flakiness — and how to fix each one with concrete code changes.

March 21, 2026 · 12 min read

You know the drill: CI goes red, you check the logs, the failure looks unrelated to your changes. You hit re-run. It passes. You merge. And the cycle repeats tomorrow.

This guide covers the six most common patterns behind flaky tests in GitHub Actions and gives you concrete fixes for each. Not theories — actual code changes and configuration updates you can apply today.

Before you start fixing

The first step is knowing which tests are flaky and how often they fail. If you’re guessing based on Slack complaints, you’re working blind. Kleore analyzes your CI history and ranks every flaky test by failure rate and cost — so you fix the worst ones first.

1. Timing & race conditions

Symptom: Test passes locally, fails intermittently in CI. Often involves UI tests, async operations, or anything that waits for a condition to become true.

Root cause: GitHub Actions runners have variable performance. A 2-core runner under load is slower than your M3 MacBook. Tests that assume operations complete within a specific window break when the runner is under pressure.

The fix: Replace fixed waits with condition-based polling.

Before — fragile timing

// Bad: assumes the element appears within 500ms
await new Promise(r => setTimeout(r, 500));
expect(screen.getByText("Success")).toBeInTheDocument();

After — condition-based wait

// Good: waits for the condition, not the clock
await waitFor(() => {
  expect(screen.getByText("Success")).toBeInTheDocument();
}, { timeout: 5000 });

For E2E tests with Playwright or Cypress, use their built-in auto-waiting mechanisms instead of explicit sleeps. For backend tests, poll with exponential backoff rather than sleeping.

2. Shared mutable state

Symptom: Test passes in isolation (it.only) but fails when run with the full suite. Or it fails only when a specific other test runs before it.

Root cause: Tests share a database, in-memory store, filesystem, or global variable. Test A writes data that Test B doesn’t expect, or Test A forgets to clean up.

The fix: Isolate test state completely.

Database isolation

// Run each test in a transaction that rolls back
beforeEach(async () => {
  await db.query("BEGIN");
});

afterEach(async () => {
  await db.query("ROLLBACK");
});

Unique identifiers per test

// Instead of hardcoding IDs that collide:
const userId = `test-user-${crypto.randomUUID()}`;
await createUser({ id: userId, name: "Test" });

If you’re using a shared test database, consider running each test file in its own database schema or using containers. The small overhead is worth the determinism.

3. External service dependencies

Symptom: Tests fail with network timeouts, 503 errors, or rate-limit responses. Usually happens in bursts (when the external service has issues).

Root cause: Your tests make real HTTP calls to APIs you don’t control — payment gateways, auth providers, third-party data services.

The fix: Mock at the HTTP boundary, not the function level.

MSW (Mock Service Worker) approach

import { http, HttpResponse } from "msw";
import { setupServer } from "msw/node";

const server = setupServer(
  http.post("https://api.stripe.com/v1/charges", () => {
    return HttpResponse.json({
      id: "ch_test_123",
      status: "succeeded",
      amount: 2000,
    });
  })
);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

Use MSW or similar tools to intercept HTTP at the network level. This tests your actual HTTP client code (headers, serialization, error handling) while eliminating network flakiness. Reserve real API calls for a small set of integration tests that run separately.

4. Environment differences

Symptom: Tests pass on macOS, fail on Linux. Or pass with Node 20, fail with Node 22. Or pass Monday through Friday, fail on weekends.

Root cause: Assumptions baked into tests about the OS, timezone, locale, filesystem behavior, or available system resources.

The fix: Pin your CI environment explicitly.

.github/workflows/test.yml

jobs:
  test:
    runs-on: ubuntu-latest
    env:
      TZ: UTC
      LC_ALL: C.UTF-8
      NODE_ENV: test
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version-file: ".node-version"
      - run: npm ci
      - run: npm test

Key practices: always set TZ=UTC, use a .node-version file instead of hardcoding versions, and test with the same OS as production. If your tests compare file paths, normalize separators.

5. Port & resource conflicts

Symptom: EADDRINUSE errors, database connection failures, or file lock errors. Happens especially when tests run in parallel.

Root cause: Multiple test processes or test files trying to bind to the same port, open the same file, or connect to the same database concurrently.

The fix: Use dynamic port allocation.

Dynamic port allocation

// Instead of: server.listen(3000)
// Use port 0 to let the OS assign an available port
const server = app.listen(0);
const port = (server.address() as AddressInfo).port;

// Pass port to your test client
const client = createTestClient(`http://localhost:${port}`);

For database tests, use unique database names per test worker or use Docker containers. For file-based tests, use os.tmpdir() with random suffixes.

6. Test order dependency

Symptom: Tests pass when run in the default order, fail when randomized or when a specific test file is skipped.

Root cause: Test A sets up state that Test B implicitly depends on. When A doesn’t run first, B fails.

The fix: Make every test self-contained.

Self-contained test

describe("checkout flow", () => {
  // Each test creates its own state from scratch
  it("applies discount code", async () => {
    // Setup: create the user, cart, and product for this test
    const user = await createTestUser();
    const product = await createTestProduct({ price: 100 });
    const cart = await createCart(user.id, [product.id]);

    // Act
    const result = await applyDiscount(cart.id, "SAVE20");

    // Assert
    expect(result.total).toBe(80);
  });
});

Enable test randomization to catch these issues early. Jest supports --randomize, and Vitest can be configured with sequence.shuffle: true. If your tests slow down from redundant setup, invest in fast factory functions — not shared state.

The meta-fix: Retry as a bandaid, not a cure

GitHub Actions supports automatic retry via actions/retry or workflow re-run. Many teams add retry logic as a first response:

Retry step (bandaid)

- uses: nick-fields/retry@v3
  with:
    max_attempts: 3
    timeout_minutes: 10
    command: npm test

This is fine as a short-term bandaid while you fix the root cause. But retrying hides the problem. A test that fails 30% of the time and gets retried 3 times will appear to pass 99.7% of the time — while still costing you 3x the CI minutes and masking the underlying issue.

Retry to unblock your team today. Fix the root cause this sprint.

How to prioritize which tests to fix first

Not all flaky tests are equal. A test that flakes once a month is annoying. A test that flakes daily on your critical path is an emergency. Prioritize by:

Failure frequency — How often does it flake? Daily flakes first.
Blast radius — Does it block all PRs, or just one workflow?
Cost per failure — Long test suites cost more per re-run.
Fix complexity — Can you fix it in an hour, or does it need a refactor?

Let Kleore do the prioritization for you.

Kleore analyzes your GitHub Actions history and ranks every flaky test by failure rate, cost, and impact. You get a prioritized list with dollar amounts — so you know exactly where to start.

Scan my repos — free