Flaky Tests from Race Conditions- Root Causes and Fixes

#softwaretesting #testing #testautomation #devassure

Flaky Tests from Race Conditions- Root Causes and Fixes
Flaky tests are one of the most frustrating challenges in software development.

They make test suites unreliable, waste developer time, and erode trust in automation. Even worse, flaky tests can mask real bugs in your codebase - creating a false sense of security that slows down CI/CD pipelines and delays releases.

At the heart of many flaky tests lies a common culprit: race conditions.

What Are Race Conditions in Test Automation?

A race condition occurs when two or more processes or threads attempt to modify shared data at the same time, resulting in unpredictable outcomes.

In automated testing, race conditions typically happen when the test interacts with the application before it has reached a stable state. This is especially common in asynchronous applications, where UI updates are not immediately synchronized with backend operations.

Example:

A test clicks a button before it is fully rendered or enabled, sometimes succeeding and sometimes failing, depending on timing.

Deterministic vs Non-Deterministic Outcomes

Deterministic Outcomes: Tests produce the same result every time under the same conditions. Example: clicking a button only after it's loaded always passes.

Non-Deterministic Outcomes: Tests pass or fail unpredictably under the same conditions. Example: clicking a button before it's fully loaded may pass occasionally but fail most of the time.

Race conditions lead to non-deterministic outcomes, which are the root of flakiness.

Symptoms of Flaky Tests Caused by Race Conditions
You'll typically see:

Intermittent failures - tests pass sometimes and fail other times.
Timing issues - failures when elements take too long to load.
Inconsistent results - running the same test multiple times produces different outcomes.
Resource contention - failures due to conflicts over shared databases, files, or memory.
Order-dependent issues - tests that fail only when executed in a specific sequence.
Environment-specific errors - passing locally but failing in CI/CD.

Common Error Messages

Timeout 30000ms exceeded while waiting for element to be clickable
Element not visible / not attached to the DOM
Stale Element Reference
Element is not interactable
Network request failed or timed out
Database deadlock or timeout errors
File access errors
Unexpected application state (e.g., modal not open, form not submitted)

Root Causes of Flaky Tests

Pending AJAX/Fetch Calls Tests interact with elements before data has fully loaded.
Animations/Transitions UI elements in motion are unclickable or in intermediate states.
Stale Elements DOM nodes re-rendered by React/Vue/Angular invalidate references.
Racing Assertions Assertions made before the UI finishes updating lead to failures.

Example: Race Condition in Login Test

test('should display user profile after login', async () => { await page.goto('https://example.com/login'); await page.fill('#username', 'testuser'); await page.fill('#password', 'password'); await page.fill('#otp', '123456'); // Fails intermittently if OTP verification isn't complete before click the await page.click('#login-button'); });

Why does it fail?

The button becomes clickable only after OTP verification. If the test clicks too soon, it fails.

Why is it flaky?

Sometimes the verification is fast, sometimes it isn't - leading to inconsistent results.

Best Practices to Fix Flaky Tests

✅ Use Explicit Waits
Wait for specific conditions before continuing.
await page.waitForSelector('#login-button:enabled');
await page.click('#login-button');

✅ Wait for Network Idle

Ensure all AJAX/Fetch requests complete before interactions.

const [response] = await Promise.all([ page.waitForResponse(/api\/otp-verify/), page.click('#login-button') ]);

✅ Assertions with Retries

Wait until conditions are met before proceeding.

await expect(page.locator('#login-button')).toBeEnabled();

✅ Disable Animations in Test Environment (optional)

* { transition-duration: 0s !important; animation: none !important; }
Anti-Patterns to Avoid

❌ Fixed Delays

await page.waitForTimeout(5000);
Delays add unnecessary time and don't guarantee reliability.

❌ Chained Locators Without Context

await page.click('div >> text=Submit');
Overly generic locators break easily with UI changes.

Key Takeaways

Flaky tests undermine automation confidence and slow delivery.
Race conditions are the #1 cause, especially in async applications.
Use explicit waits, network idle strategies, and retriable assertions instead of hardcoded delays.
Avoid anti-patterns that make tests brittle and unpredictable.
Focus on synchronization with application state rather than timing hacks.

👉 By adopting robust synchronization techniques and avoiding bad practices, you can dramatically reduce test flakiness and build reliable, trustworthy automation pipelines.