DEV Community

Flaky Tests: Causes, Examples, and Best Practices

Why Flaky Tests Are Worse Than You Think

Flaky tests aren’t just annoying. They’re destructive. They break trust in your CI pipeline, slow down engineering teams, and hide real bugs under the noise of random failures.

The worst part? Developers start ignoring all test failures, assuming they’re “just flaky.” At that point, your test suite is worse than useless — it’s lying to you.

This guide is about fixing flakiness at the root. Not band-aiding it. Not retrying endlessly. Actually understanding why it happens and how to prevent it — from local dev to cloud CI.

What Causes Flaky Tests (And What to Do About It)

Flakiness sneaks into every layer of testing, but it wears different disguises depending on what you’re testing.

UI Testing

When you test user interfaces, you’re testing against the most asynchronous, unpredictable layer of your stack.

Imagine this: Your test navigates to a page and clicks “Submit” — but the button’s disabled for 200ms after load to allow animations. Sometimes the click happens too soon, sometimes not. Your test randomly fails.

Code Example: Stable Clicking

import static org.junit.jupiter.api.Assertions.assertEquals;

import java.time.Duration;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import org.junit.jupiter.api.Test;

public class FormTest {

    @Test
    public void submitButtonShouldBeClickable() {
        WebDriver driver = new ChromeDriver();
        try {
            driver.get("https://example.com/form");

            WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
            WebElement submitButton = wait.until(
                ExpectedConditions.elementToBeClickable(By.id("submit-button"))
            );

            submitButton.click();

            wait.until(ExpectedConditions.urlToBe("https://example.com/success"));
            assertEquals("https://example.com/success", driver.getCurrentUrl());
        } finally {
            driver.quit();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Notice: no fixed sleeps. Only event-based waits. That’s how you remove race conditions.

API Testing (REST and GraphQL)

APIs are supposed to be deterministic — but when tests hit real servers, all bets are off.

Network spikes, caching issues, or asynchronous database replication can cause tests to fail randomly.

Code Example: Mocked REST API Test

WireMockServer wireMockServer = new WireMockServer(8080);
wireMockServer.start();

WireMock.stubFor(post(urlEqualTo("/users"))
  .willReturn(aResponse()
    .withStatus(201)
    .withBody("{\"id\":\"123\", \"name\":\"Alice\"}")));

Response response = given()
  .baseUri("http://localhost:8080")
  .contentType("application/json")
  .body("{\"name\":\"Alice\"}")
  .post("/users");

response.then().statusCode(201);
assertEquals("Alice", response.jsonPath().getString("name"));

wireMockServer.stop();
Enter fullscreen mode Exit fullscreen mode

Here, you control every byte of the server response. Zero external dependencies. Zero flakiness.

Performance Testing

Performance is inherently noisy — but that doesn’t mean your tests have to be flaky.

Imagine your load tests show 1s latency on Monday and 3s latency on Wednesday — but no code has changed. That’s not a flaky test — that’s a flaky environment.

To fix it, performance tests must run in controlled conditions.

Key Tip: Always run tests multiple times and use median, 95th percentile — not averages. Averages lie when there’s noise.

Security Testing

Security tests like fuzzing or vulnerability scanning can become flaky if you don’t control randomness.

If your fuzzing engine seeds randomly every run, you’ll get different results, making failures look random.

Solution:

  • Fix random seeds in fuzzers
  • Run against snapshots of systems
  • Log all input payloads for reproducibility

Chaos Engineering

Chaos tests intentionally cause system failures. But without tight control, they make tests untrustworthy instead of resilient.

The goal is targeted chaos, not blind chaos.

Why Environments Matter More Than Your Test Code

99% of articles about flaky tests focus only on “bad test code.”

Reality: environment stability is just as important.

Tests that hit unstable environments are doomed from the start.

Cloud Computing Considerations

  • Use ephemeral test environments spun up per PR (e.g., with Terraform)
  • Create immutable infrastructure — never “fix” a test env by hand
  • Control resource auto-scaling and instance types for tests
  • Snapshot entire DBs or services pre-test to ensure known states

Observability in Testing

Good engineers log and monitor not just their app — but their tests too.

✅ Track test start/end times
✅ Monitor resource usage during tests
✅ Correlate test failures with infrastructure anomalies

Tools to Add Observability:

  • Grafana dashboards from test results
  • Prometheus metrics from CI pipelines
  • OpenTelemetry traces through tests

Stable Testing Pyramid: Best Practice Design

Below you can find a rough distribution of the tests that you have for your applications:

Industry Case Studies

Here as 3 examples of some tech giants in order to showcase how flakiness is tackled:

Quick Pro-Tips

  • Never Retry Blindly: Retrying flaky tests without understanding the cause just masks real problems.
  • Build Test Observability First: Know exactly where your tests fail, not just that they failed.
  • Cloud is Your Friend (if used right): Use cloud ephemeral environments spun up per PR — and teardown after.
  • Prefer Mocks for External Services Always: You don’t control Google’s API. Or AWS failures. Mock them aggressively.
  • Prioritize Test Stability as a Feature: Test stability is not “extra work.” It’s a product quality feature.

Final Thoughts: Flaky Tests Are a Systemic Issue, Not a “Test Code” Issue

Flaky tests point to flaky systems:

  • Fragile environments
  • Bad assumptions about timing
  • Poor infrastructure control
  • Missing observability

Fixing flaky tests makes your product better, your systems more resilient, and your team much faster.

If you’re passionate about software testing, infrastructure, and creating high-quality solutions in a dynamic, knowledge-sharing environment, we invite you to explore our job opportunities at Agile Actors. Here, your personal growth is a key part of our collective development, especially as we tackle the ever-evolving challenges in software engineering and testing. Join us and be a part of our journey to shape the future of testing and infrastructure excellence.

Top comments (0)