Dhiraj Das

Posted on Dec 15, 2025 • Originally published at dhirajdas.dev

Why Your Selenium Tests Are Flaky (And How to Fix Them Forever)

#python #automation #testing

🎯

What This Article Covers

The Flakiness Problem: Why time.sleep() and WebDriverWait aren't enough
What Causes Flaky Tests: Racing against UI state changes
The Stability Solution: Monitoring DOM, network, animations, and layout shifts
One-Line Integration: Wrap your driver with stabilize() — zero test rewrites
Full Diagnostics: Know exactly why tests are blocked

If you've worked with Selenium for more than a week, you've written code like this:

driver.get("https://myapp.com/dashboard")
time.sleep(2)  # Wait for page to load
driver.find_element(By.ID, "submit-btn").click()
time.sleep(1)  # Wait for AJAX

And you've felt the shame of knowing it's wrong—but also the relief of "it works." Until it doesn't. Until the CI server is 10% slower than your machine, and suddenly your tests fail 20% of the time.

This is the story of flaky tests, why they happen, and how I built a library called waitless to eliminate them.

⚠️

The Flakiness Problem

Let me show you a real scenario. You have a React dashboard. User clicks a button. The button triggers an API call. The API returns data. React re-renders the component. A spinner disappears. A table appears.

This entire sequence takes maybe 400ms. But your test does this:

button = driver.find_element(By.ID, "load-data")
button.click()
table = driver.find_element(By.ID, "data-table")  # 💥 BOOM

The table doesn't exist yet. React is still fetching. Selenium throws NoSuchElementException.

So you "fix" it:

button.click()
time.sleep(2)
table = driver.find_element(By.ID, "data-table")  # Works... usually

The Problem with time.sleep()

The Problem with time.sleep()

Congratulations. You've just made your test: 1) 2 seconds slower than necessary, 2) Still flaky when the API takes 2.5 seconds, 3) Impossible to debug when it fails.

❌

Why Traditional Solutions Don't Work

time.sleep() — The Naive Approach

Sleep for a fixed duration and hope the UI is ready. Problems: Too short → test fails. Too long → test suite takes forever. No feedback on what's actually happening.

WebDriverWait — The "Correct" Approach

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 10).until(
    EC.element_to_be_clickable((By.ID, "submit-btn"))
)

This is better. You're waiting for a specific condition. But here's the dirty secret: it only checks one element.

What about the modal that's still animating into view?
What about the AJAX request that hasn't finished?
What about the React re-render that's about to move your button?

WebDriverWait says "the button is clickable." Reality says "there's an invisible overlay from an animation that will intercept your click."

Retry Decorators — The Denial Approach

@retry(tries=3, delay=1)
def test_dashboard():
    driver.find_element(By.ID, "submit-btn").click()

This is the equivalent of saying "I know my code is broken, but if I run it enough times, it'll eventually work." Retries don't fix flakiness. They hide it.

🔍

What Actually Causes Flaky Tests?

After debugging hundreds of flaky tests, I found they all come down to racing against the UI:

| What You Do              | What's Actually Happening           |
|--------------------------|-------------------------------------|
| Click a button           | DOM is being mutated by framework   |
| Assert text content      | AJAX response still in flight       |
| Interact with modal      | CSS transition still animating      |
| Click navigation link    | Layout shift moves element          |

The Real Question

The question isn't "is this element clickable?" The question is: "Is the entire page stable and ready for interaction?" That's what I set out to answer with waitless.

✨

Defining "Stability"

What does it mean for a UI to be "stable"? I identified four key signals:

1. DOM Stability

The DOM structure has stopped changing. No elements being added, removed, or modified. How to detect: MutationObserver watching the document root. Track time since last mutation.

2. Network Idle

All AJAX requests have completed. No pending API calls. How to detect: Intercept fetch() and XMLHttpRequest. Count pending requests.

3. Animation Complete

All CSS animations and transitions have finished. How to detect: Listen for animationstart, animationend, transitionstart, transitionend events.

4. Layout Stable

Elements have stopped moving. No more layout shifts. How to detect: Track bounding box positions of interactive elements. Compare over time.

🏗️

The Architecture

Waitless has two parts:

JavaScript Instrumentation (runs in browser)

window.__waitless__ = {
    pendingRequests: 0,
    lastMutationTime: Date.now(),
    activeAnimations: 0,

    isStable() {
        if (this.pendingRequests > 0) return false;
        if (Date.now() - this.lastMutationTime < 100) return false;
        return true;
    }
};

This script is injected into the page via execute_script(). It monitors everything happening in the browser.

Python Engine (evaluates stability)

class StabilizationEngine:
    def wait_for_stability(self):
        """Waits until all stability signals are satisfied."""
        # Checks performed automatically:
        # ✓ DOM mutations have settled
        # ✓ Network requests completed
        # ✓ Animations finished
        # ✓ Layout is stable

The Python engine continuously evaluates browser state until all configured stability signals indicate the page is ready for interaction.

🪄

The Magic: One-Line Integration

The key design goal was zero test modifications. Adding stability detection should require changing ONE line:

from waitless import stabilize

driver = webdriver.Chrome()
driver = stabilize(driver)  # ← This is the only change

# All your existing tests work as-is
driver.find_element(By.ID, "button").click()  # Now auto-waits!

How does this work? The stabilize() function wraps the driver in a StabilizedWebDriver that intercepts find_element() calls. Retrieved elements are wrapped in StabilizedWebElement. When you call .click(), it first waits for stability, then clicks.

class StabilizedWebElement:
    def click(self):
        self._engine.wait_for_stability()  # Auto-wait!
        return self._element.click()  # Then click

Zero Rewrites Required

Your tests don't know they're waiting. They just... stop failing.

🔧

Handling Edge Cases

Real apps aren't simple. Here's how waitless handles the messy reality:

Problem: Infinite Animations

Some apps have spinners that rotate forever. Analytics scripts that poll constantly. WebSocket heartbeats that never stop.

Solution: Configurable thresholds

from waitless import StabilizationConfig

config = StabilizationConfig(
    network_idle_threshold=2,  # Allow 2 pending requests
    animation_detection=False,  # Ignore spinners
    strictness='relaxed'        # Only check DOM mutations
)

driver = stabilize(driver, config=config)

Problem: Navigation Destroys Instrumentation

Single-page apps remake the DOM on route changes. The injected JavaScript disappears.

Solution: Re-validation before each wait

def wait_for_stability(self):
    if not self._is_instrumentation_alive():
        self._inject_instrumentation()  # Re-inject if gone
    # Then wait...

📊

Diagnostics: The Secret Weapon

When tests still fail, understanding why is half the battle. Waitless includes a diagnostic system that explains exactly what's blocking stability:

╔═════════════════════════════════════════════════════════════╗
║              WAITLESS STABILITY REPORT                      ║
╠═════════════════════════════════════════════════════════════╣
║ Timeout: 10.0s                                              ║
║                                                             ║
║ BLOCKING FACTORS:                                           ║
║   ⚠ NETWORK: 2 request(s) still pending                    ║
║   → GET /api/users (started 2.3s ago)                       ║
║   → POST /analytics (started 1.1s ago)                      ║
║                                                             ║
║   ⚠ ANIMATIONS: 1 active animation(s)                      ║
║   → .spinner { animation: rotate 1s infinite }              ║
║                                                             ║
╠═════════════════════════════════════════════════════════════╣
║ SUGGESTIONS:                                                ║
║   1. /api/users is slow. Consider mocking in tests.         ║
║   2. Spinner has infinite animation. Set                    ║
║      animation_detection=False                              ║
╚═════════════════════════════════════════════════════════════╝

This isn't just "test failed." It's "test failed because your analytics endpoint is slow, and here's exactly how to fix it."

📈

The Results

Here's what changes when you adopt waitless:

Before

driver.get("https://myapp.com")
time.sleep(2)
WebDriverWait(driver, 10).until(
    EC.element_to_be_clickable((By.ID, "login-btn"))
)
driver.find_element(By.ID, "login-btn").click()
time.sleep(1)
driver.find_element(By.ID, "username").send_keys("user")

After

driver = stabilize(driver)
driver.get("https://myapp.com")
driver.find_element(By.ID, "login-btn").click()
driver.find_element(By.ID, "username").send_keys("user")

| Metric               | Before               | After           |
|----------------------|----------------------|-----------------|
| Lines of wait code   | 4+ per test          | 1 total         |
| Arbitrary delays     | 3+ seconds           | 0               |
| Flaky failures       | Common               | Rare            |
| Debug information    | "Element not found"  | Full stability report |

🎭

Why Not Just Use Playwright?

Playwright has auto-waiting built in. It's great! But:

Migration cost — You have 10,000 Selenium tests. Rewriting isn't an option.
Framework lock-in — Playwright auto-wait is Playwright-only.
Different approach — Playwright waits for element actionability. Waitless waits for page-wide stability.

The Best of Both Worlds

Waitless gives Selenium users the reliability of Playwright without the rewrite.

⚠️

Current Limitations (v0.2.0)

Being honest about what doesn't work yet:

Selenium only — Playwright integration planned for v1
Sync only — No async/await support
Main frame only — iframes not monitored
No Shadow DOM — MutationObserver can't see shadow roots
Chrome-focused — Tested primarily on Chromium

These will be addressed in future versions — contributions welcome!

🚀

Try It Yourself

pip install waitless

from selenium import webdriver
from waitless import stabilize

driver = webdriver.Chrome()
driver = stabilize(driver)

# Your tests are now stable
driver.get("https://your-app.com")
driver.find_element(By.ID, "button").click()

One line. Zero test rewrites. No more flaky failures.

✅

Conclusion

Flaky tests are a symptom of racing against UI state. The solution isn't longer sleeps or more retries—it's understanding when the UI is truly stable.

Waitless monitors DOM mutations, network requests, animations, and layout shifts to answer one question: "Is this page ready for interaction?"

The Bottom Line

Your tests should be deterministic. Your CI should be green. And you should never write time.sleep() again.

PyPI: pypi.org/project/waitless
GitHub: github.com/godhiraj-code/waitless

Built by Dhiraj Das

Automation Architect. Making Selenium tests deterministic, one at a time.

Top comments (2)

OnlineProxy • Jan 4

The real aha moment here is that page ready isn't just about one button being clickable, it's the entire stack settling down - DOM, network calls finishing up, animations wrapping, layout stops thrashing. What really stands out is the diagnostic reporting that shows you exactly which requests are still hanging and what animations are being stubborn, most tools just throw some generic timeout error and call it a day, but this actually gives engineers the intel they need to debug like humans, not guess blindly. Curious though - how does this handle WebSocket and SSE connections that are technically open but just chilling there, and could you layer in framework-specific hooks like React's useEffect or Angular's NgZone alongside the network monitoring to catch settling work even earlier? The zero-rewrite thing is chef's kiss for real teams living with massive test suites who can't exactly refactor 10K tests overnight. One thing that'd help adoption: drop some numbers on the performance overhead per test and spell out how stabilize() actually handles SPAs and navigation so teams can make informed calls. Bottom line - this solves a real Selenium headache that's been bugging the community for years. Get this in front of people

Dhiraj Das • Jan 5 • Edited

Thanks for taking the time to read and dig in , already shared it in pycoders weekly issue 714 , pycoders.com/issues/714 , for experts to explore and find the issues or edge cases in the code. Currentlrly ,it intercepts fetch() and XMLHttpRequest for network monitoring , SSE/websocket actvitiy tracking is in the roadmap for the upcoming release (likely via an opt in flag like track_websocket=True_... , i will work on the upcoming items websocket /sse awareness , hook adapters , iframe support in the upcoming weeks and update the library. please feel free to have a look : github.com/godhira-code/waitless (most of the upcoming items , are almost complete, will update by this week)