You can Read Part1 here
When Automation Fails, It’s Usually a Design Problem
After automation has been in place for a while, teams start to notice a pattern. Certain tests fail intermittently. Others require retries to pass. Some failures disappear when rerun locally but resurface in the pipeline. Over time, the test suite becomes something engineers learn to work around rather than rely on.
At this stage, the question inevitably arises: Is our automation bad, or is the system itself the problem?
Answering that question correctly is one of the most important skills in building sustainable automation. Many teams get it wrong not because they lack experience, but because automation failures are easier to see than design flaws.
Why Automation Takes the Blame
Automation operates in public. When it fails, pipelines turn red, notifications fire, and progress stops. Application design issues, by contrast, often remain invisible. They manifest as ambiguity, hidden coupling, or unclear state things humans adapt to without consciously noticing.
When an automated test times out, fails to locate an element, or produces inconsistent results, the failure message points directly to the test. The system itself remains silent. Over time, this creates a false narrative: automation is fragile, slow, and unreliable.
In reality, automation is often exposing behavior that was already uncertain. It simply does so consistently and without bias.
The Core Diagnostic Question
A useful way to separate automation problems from design problems is to ask a simple question:
Would a human tester be able to explain this failure clearly and consistently without rerunning the test multiple times?
If the answer is no, the problem is rarely automation.
When humans need to refresh the page, repeat the action, or “just try again,” they are compensating for missing signals in the system. Automation cannot make those assumptions. It needs the system to be explicit.
Design Ambiguity Masquerading as Automation Failure
Many automation issues originate from design decisions that obscure system behavior. User interfaces that re-render unpredictably, workflows that depend on timing rather than state, and systems that expose results only visually force automation to guess.
These guesses take the form of brittle selectors, complex wait conditions, and retries. While these techniques can make tests pass, they also hide the underlying problem: the system does not clearly communicate what it is doing.
When a test fails with “element not found,” the real issue is often that the system never indicated that the element should exist yet. Automation is blamed for being impatient, when the system is simply silent.
What a True Automation Problem Looks Like
Not all failures are design-related. Genuine automation problems do exist, and recognizing them matters.
Automation problems typically:
- Fail deterministically in the same place
- Improve significantly with better tooling or implementation
- Do not affect manual testing behavior
- Are isolated to test code rather than spreading across scenarios
Examples include poor selector strategies, misuse of the automation framework, or over-reliance on end-to-end tests where lower-level tests would suffice. These issues are real, but they tend to be easier to fix and cheaper over time.
Design problems, by contrast, resist tool changes and resurface regardless of framework.
The Cost of Misdiagnosis
When design problems are misclassified as automation problems, teams respond by hardening tests rather than improving systems. They add retries, increase timeouts, and build layers of abstraction. Test suites become slower and harder to understand, while the system remains just as opaque as before.
Eventually, the automation suite becomes fragile not because the tests are poorly written, but because they are carrying the burden of compensating for unclear behavior.
This is the point where teams begin to question the value of automation altogether.
Listening to Automation Instead of Fighting It
Automation is often the first place where design weaknesses become visible at scale. It interacts with systems relentlessly and without tolerance for ambiguity. Instead of suppressing this feedback, high-performing teams treat it as a signal.
When a test is hard to write, hard to stabilize, or hard to debug, they ask what the system is failing to communicate. They look for missing state signals, unclear boundaries, or hidden dependencies. Fixing those issues improves automation and usually improves production behavior as well.
Shifting the Conversation
The most productive teams shift the conversation away from “How do we fix this test?” to “What is the system not making explicit?”
This shift changes how failures are handled. Automation failures become opportunities to improve system clarity rather than sources of frustration. Over time, automation becomes more reliable not because the tests are more complex, but because the system itself is easier to reason about.
Looking Ahead
In the next post, we’ll examine one of the most common triggers for automation instability: slow and asynchronous user interfaces. Read Part1 here
We’ll explore why performance issues are often misdiagnosed, why waiting is not a strategy, and how observability, not speed is the key to reliable automation.
If you’re finding that your automation suite is exposing uncomfortable truths about your system, you’re probably on the right path.
Top comments (0)