What 50+ QA Automation Interviews Taught Me About Flaky Tests

#qa #testing #career #playwright

Last month I asked a senior candidate the same question I ask everyone. "Tell me about a flaky test you fixed and how you fixed it."

She paused and said, "I just added a retry."

The interview was essentially over after that. Not because retries are wrong. Because she had no curiosity about why the test was flaky in the first place.

I have conducted 50+ automation engineer interviews over the past two years, and flaky tests are the single topic where I can predict pass or fail within 90 seconds. Here is what separates the candidates who get offers from the ones who do not.

Great candidates reject the "flaky equals unlucky" mental model

Most candidates treat flakiness like weather. Sometimes it rains. Sometimes the test passes. Move on.

The best candidates I interview treat every flaky test like a bug with a specific root cause. They talk about races, missing waits, shared state across tests, timezone bugs, animation frames, stale locators, hydration timing, and network jitter.

When I hear any of those words in an answer, I already know the interview is going well.

I see the same pattern in mock interviews on AssertHired. The candidates who score highest on the debugging round always start by asking, "What do we know about the failure? Is it consistent across CI runs? Is it only on the first run or the second?"

They are not guessing. They are narrowing.

"I just added a retry" is the red flag

I am not anti-retry. Playwright's built-in retries are useful for legitimate infrastructure blips. But retries are not a fix. They are a stall.

The candidates who impress me always frame retries the same way. They say something like, "I added retries as a temporary gate so the suite stopped blocking deploys, then I spent the afternoon figuring out the real cause."

That sentence alone pushes a candidate to the next round for me.

Here is a concrete example from my actual work at Resilience. We had a dashboard test that failed roughly 1 in 40 runs. It was a card count assertion. The "safe" answer would be add a wait or a retry. What I actually found after digging through three CI runs and a trace viewer session: the dashboard rendered 12 cards, then React re-rendered after a subscription update two frames later, and my locator was pointing at the first render. Switching the locator to a data-testid attached to the final rendered state eliminated 100% of the flakes.

Not 90%. 100%. That is what real root cause work feels like.

The three questions I ask on every flaky test

These are the three questions I have watched work on every flaky test I have personally chased down, and the three questions I listen for in interview answers.

First, can I reproduce it deterministically? If the answer is no, I never try to fix it yet. I slow it down, add logging, run it 50 times in a loop locally, or drop the worker count to one. Reproduction is the whole game. Teams that skip this step will patch the same test three times in six months.

Second, is the test wrong or is the product wrong? This question separates senior candidates from mid-level candidates. Sometimes the flake is a real product bug. A race condition in production code shows up as a flake in the test. If your instinct is always "the test is wrong," you will miss real bugs forever.

Third, what is the smallest change that kills the flake? The best fix is almost never the biggest fix. I have seen candidates rewrite entire page objects to solve a problem that needed one locator change. Depth means knowing exactly where to cut.

Tools I actually use, not the ones I list on my resume

Interview answer red flag: the candidate rattles off eight tools without explaining when to use any of them.

Here is my actual workflow when I hit a flake at Resilience.

Playwright trace viewer first, every single time. It is free, it is built in, and it shows me exactly what the browser saw at every step. I save at least four hours a week by starting there instead of guessing.

Second, I run the failing test in headed mode with --repeat-each=10. If it passes 10 times locally but fails on CI, I know the issue is environmental and I stop looking in my code.

Third, I check the CI log for the test that ran just before the one that failed. Shared state between tests is the most common flake cause I see in real codebases, and it is almost never the test that gets reported as failing. People lose full days chasing the wrong test because nobody taught them to look one row up in the log.

That is it. Three moves. I have solved something like 80% of the flakes I have touched with just those three.

What changed in 2026, and what did not

A lot of the noise this year is about Playwright MCP, self-healing locators, and AI agents that debug tests for you. Some of it is real. Some of it is marketing.

Here is my honest take after trying most of them. AI tools are fantastic at explaining traces, generating candidate locators, and suggesting where to look. They are not yet good at deciding whether the test is wrong or the product is wrong. That judgment call is still a human skill, and it is still the skill I pay the most attention to in interviews.

Candidates who lean on AI without understanding the fundamentals get exposed the moment I ask a follow-up question. Candidates who use AI as an accelerator on top of solid debugging instincts look like the future of the role.

This is the reason I keep pushing fundamentals over certs. A cert expires the month it comes out. Knowing how a browser actually renders a page, how your framework dispatches events, and how your CI environment differs from your laptop will serve you for the next 10 years.

What I tell candidates preparing for interviews

When candidates ask me how to prep for QA automation interviews, I give the same advice every time.

Do not memorize the Playwright API. Do not grind LeetCode. Pick one flaky test from a real codebase, even a personal project, and fix it from scratch. Write down what you tried, what worked, what did not. Do that three times and you will outperform most candidates with five years of experience.

This is the exact exercise I built AssertHired around. The debugging scenarios pull from real flaky test patterns, and the AI interviewer pushes back the way I do in live interviews. Candidates who run three to five reps before a real onsite usually walk in calmer, because they have already felt the heat once.

No tool replaces the reps. But the reps have to be on the right problems.

One thing to take away

The best QA automation engineers I have worked with and interviewed all share one habit. They get curious when something behaves weirdly, even once. They do not paper over the weirdness with a retry and move on.

Next time a test fails intermittently, do not fix it yet. Reproduce it first. The fix almost writes itself once you know what you are actually fixing.

That is the habit I hire for. That is the habit that promotes people. And that is the habit that makes a 1 in 40 flake go to zero instead of 1 in 200.

Aston Cook is a Senior QA Automation Engineer and founder of AssertHired, an AI-powered mock interview platform for QA professionals. He has conducted 50+ automation engineer interviews and writes about QA career development. Find him on LinkedIn (16K+ followers) or at asserthired.com.