DEV Community: Aston Cook

The Flaky Test Question That Separates Senior QA Engineers From Juniors

Aston Cook — Wed, 15 Apr 2026 14:11:08 +0000

I've run more than 50 automation interviews in the past year. The same question exposes experience gaps faster than any other:

"Tell me about the last flaky test you fixed. Walk me through your debugging process."

That's it. Three sentences. And within two minutes, I usually know if a candidate has actually shipped automation or just read about it.

Here's what I hear from junior candidates, what I hear from senior ones, and what you should say if you want to sound like you belong in the room.

The junior answer

Junior candidates almost always say something like this:

"I added a wait. Then it passed."

Sometimes it's sleep(5). Sometimes it's page.waitForTimeout(3000). Sometimes they swap the selector. The test goes green, they merge the PR, and they move on.

I'm not trying to pile on. Every automation engineer has done this at some point, myself included. The problem is not the quick fix. The problem is that the candidate treats a flaky test as a nuisance to silence rather than a signal to investigate.

When I push and ask "why was it flaky in the first place," the room gets quiet. That silence is what I'm listening for.

The senior answer

Senior candidates treat flakiness as a diagnostic puzzle. They have a mental checklist. When I ask the same question, they give me something closer to this:

"I start by asking whether the flake is in the test, the app, or the environment. Most flakes I've seen in the last year have been race conditions where the test asserts before the app finishes a network call. I check the trace first, then I look at whether we have an auto-waiting locator strategy, then I look at test isolation."

Notice what's happening. They are not naming a tool. They are naming a process. Tools change. Process compounds.

The three buckets every senior engineer knows

When you get the flaky test question, structure your answer around three buckets. This is how I teach it in mock interviews on AssertHired, and it mirrors how strong engineers actually think on the job.

Bucket 1: The test itself

Most flaky tests are bad tests. Common patterns I see:

Hardcoded waits instead of web-first assertions
Selectors that rely on DOM structure instead of accessibility or test IDs
Tests that depend on previous test state
Assertions that run before the network call resolves

Playwright made a lot of this easier with built-in auto-waiting. Reports suggest teams moving from Selenium with custom waits to Playwright cut flake rates by around 60 percent, and that matches what I saw when we migrated our suite at Resilience. We did not get to zero flakes. We got to a place where flakes usually pointed at a real bug instead of a timing trick.

If you fix bucket one, you sound competent. You do not yet sound senior.

Bucket 2: The application under test

This is where the senior engineers earn their title.

A test that flakes on button.click() followed by a modal assertion might not be a test problem. It might be a product bug. The modal sometimes opens in 200ms and sometimes in 2 seconds because a backend call is slow under load. Fixing the wait hides the bug. Logging the timing, filing a ticket, and partnering with the dev team surfaces it.

In one interview last month, a candidate told me she found a flaky login test that was actually catching a rare session race condition. She escalated it, the team patched it, and the flake disappeared. That story took her 90 seconds to tell, and it got her the recommendation.

When you interview, bring a story like that. One is enough.

Bucket 3: The environment and infrastructure

Flaky tests in CI that pass locally are almost always environmental. Things I check:

Is the CI runner sharing resources? A test that passes on a fresh machine but flakes on a loaded runner is a concurrency problem, not a test problem.
Are test artifacts being cleaned between runs?
Is the database seeded deterministically or are we pulling from a shared staging DB?
Are we running with too much parallelism for the app to handle?

I once chased a flake for three days before realizing two parallel test workers were hitting the same user account. The fix was not in the test. It was in the test data strategy. We moved to per-worker test users, and the flake rate on that suite dropped from around 8 percent to under 1 percent.

Test data is where a lot of senior engineers separate themselves. If you have a story about building a fixture factory, a per-test tenant, or a cleanup hook that actually works, tell it.

What I want to hear in the first 60 seconds

When I ask the flaky test question, the best answers share three traits.

First, they start with a real example. Not "I would do X." Instead: "Last month I had a test that failed once every 20 runs." Real numbers beat hypotheticals every time.

Second, they name the category before the fix. Saying "this was a test isolation problem, not a timing problem" tells me the candidate has a mental taxonomy.

Third, they end with a system change, not just a code change. "We added a pre-run health check for the staging environment" is a better answer than "I added a retry." Retries are fine. Systemic improvements are better.

Common interview traps I see

A few anti-patterns that tank otherwise strong candidates.

Talking about test retries as the primary solution. Retries are a pressure valve. They are not a strategy. If your answer starts and ends with "we retry three times," I will assume you have not actually fixed a flake.

Name-dropping tools without context. Saying "we use Playwright with the trace viewer" is fine. Saying "we use Playwright with the trace viewer because the trace viewer showed us the button re-rendered during our click" is much better. The second version proves you actually use the tool.

Blaming the devs. Sometimes the app is the problem. Saying so is fine. Saying it with contempt is not. Strong engineers talk about partnering with the dev team, filing tickets, and sharing reproductions. That cultural signal matters more than people think.

Preparing for this question

If you have an interview coming up, spend 20 minutes doing this exercise.

Pick the last three flaky tests you worked on. For each one, write down: the category (test, app, or environment), the symptom, the root cause, the fix, and the systemic improvement you made after.

If you cannot fill in all five fields for at least one test, you need more reps before your interview. That's not a judgment. It's just a gap you can close in a week of focused work.

If you can fill in all five for multiple tests, you are further along than most candidates I see. Practice telling one story in under 90 seconds, and you are going to stand out.

The bigger point

Flaky tests are the most honest question I can ask in an interview. They reveal whether a candidate has actually shipped automation, whether they treat testing as a craft or a checkbox, and whether they think in systems or in snippets.

If you can answer this question well, you can probably answer most of my other questions too. And if you cannot answer it yet, the path is straightforward: ship more automation, fix more flakes, keep a journal of what you learn. There is no substitute for reps.

Aston Cook is a Senior QA Automation Engineer and founder of AssertHired, an AI-powered mock interview platform for QA professionals. He has conducted 50+ automation engineer interviews and writes about QA career development. Find him on LinkedIn (16K+ followers) or at asserthired.com.

What 50+ QA Automation Interviews Taught Me About Flaky Tests

Aston Cook — Thu, 09 Apr 2026 16:30:20 +0000

Last month I asked a senior candidate the same question I ask everyone. "Tell me about a flaky test you fixed and how you fixed it."

She paused and said, "I just added a retry."

The interview was essentially over after that. Not because retries are wrong. Because she had no curiosity about why the test was flaky in the first place.

I have conducted 50+ automation engineer interviews over the past two years, and flaky tests are the single topic where I can predict pass or fail within 90 seconds. Here is what separates the candidates who get offers from the ones who do not.

Great candidates reject the "flaky equals unlucky" mental model

Most candidates treat flakiness like weather. Sometimes it rains. Sometimes the test passes. Move on.

The best candidates I interview treat every flaky test like a bug with a specific root cause. They talk about races, missing waits, shared state across tests, timezone bugs, animation frames, stale locators, hydration timing, and network jitter.

When I hear any of those words in an answer, I already know the interview is going well.

I see the same pattern in mock interviews on AssertHired. The candidates who score highest on the debugging round always start by asking, "What do we know about the failure? Is it consistent across CI runs? Is it only on the first run or the second?"

They are not guessing. They are narrowing.

"I just added a retry" is the red flag

I am not anti-retry. Playwright's built-in retries are useful for legitimate infrastructure blips. But retries are not a fix. They are a stall.

The candidates who impress me always frame retries the same way. They say something like, "I added retries as a temporary gate so the suite stopped blocking deploys, then I spent the afternoon figuring out the real cause."

That sentence alone pushes a candidate to the next round for me.

Here is a concrete example from my actual work at Resilience. We had a dashboard test that failed roughly 1 in 40 runs. It was a card count assertion. The "safe" answer would be add a wait or a retry. What I actually found after digging through three CI runs and a trace viewer session: the dashboard rendered 12 cards, then React re-rendered after a subscription update two frames later, and my locator was pointing at the first render. Switching the locator to a data-testid attached to the final rendered state eliminated 100% of the flakes.

Not 90%. 100%. That is what real root cause work feels like.

The three questions I ask on every flaky test

These are the three questions I have watched work on every flaky test I have personally chased down, and the three questions I listen for in interview answers.

First, can I reproduce it deterministically? If the answer is no, I never try to fix it yet. I slow it down, add logging, run it 50 times in a loop locally, or drop the worker count to one. Reproduction is the whole game. Teams that skip this step will patch the same test three times in six months.

Second, is the test wrong or is the product wrong? This question separates senior candidates from mid-level candidates. Sometimes the flake is a real product bug. A race condition in production code shows up as a flake in the test. If your instinct is always "the test is wrong," you will miss real bugs forever.

Third, what is the smallest change that kills the flake? The best fix is almost never the biggest fix. I have seen candidates rewrite entire page objects to solve a problem that needed one locator change. Depth means knowing exactly where to cut.

Tools I actually use, not the ones I list on my resume

Interview answer red flag: the candidate rattles off eight tools without explaining when to use any of them.

Here is my actual workflow when I hit a flake at Resilience.

Playwright trace viewer first, every single time. It is free, it is built in, and it shows me exactly what the browser saw at every step. I save at least four hours a week by starting there instead of guessing.

Second, I run the failing test in headed mode with --repeat-each=10. If it passes 10 times locally but fails on CI, I know the issue is environmental and I stop looking in my code.

Third, I check the CI log for the test that ran just before the one that failed. Shared state between tests is the most common flake cause I see in real codebases, and it is almost never the test that gets reported as failing. People lose full days chasing the wrong test because nobody taught them to look one row up in the log.

That is it. Three moves. I have solved something like 80% of the flakes I have touched with just those three.

What changed in 2026, and what did not

A lot of the noise this year is about Playwright MCP, self-healing locators, and AI agents that debug tests for you. Some of it is real. Some of it is marketing.

Here is my honest take after trying most of them. AI tools are fantastic at explaining traces, generating candidate locators, and suggesting where to look. They are not yet good at deciding whether the test is wrong or the product is wrong. That judgment call is still a human skill, and it is still the skill I pay the most attention to in interviews.

Candidates who lean on AI without understanding the fundamentals get exposed the moment I ask a follow-up question. Candidates who use AI as an accelerator on top of solid debugging instincts look like the future of the role.

This is the reason I keep pushing fundamentals over certs. A cert expires the month it comes out. Knowing how a browser actually renders a page, how your framework dispatches events, and how your CI environment differs from your laptop will serve you for the next 10 years.

What I tell candidates preparing for interviews

When candidates ask me how to prep for QA automation interviews, I give the same advice every time.

Do not memorize the Playwright API. Do not grind LeetCode. Pick one flaky test from a real codebase, even a personal project, and fix it from scratch. Write down what you tried, what worked, what did not. Do that three times and you will outperform most candidates with five years of experience.

This is the exact exercise I built AssertHired around. The debugging scenarios pull from real flaky test patterns, and the AI interviewer pushes back the way I do in live interviews. Candidates who run three to five reps before a real onsite usually walk in calmer, because they have already felt the heat once.

No tool replaces the reps. But the reps have to be on the right problems.

One thing to take away

The best QA automation engineers I have worked with and interviewed all share one habit. They get curious when something behaves weirdly, even once. They do not paper over the weirdness with a retry and move on.

Next time a test fails intermittently, do not fix it yet. Reproduce it first. The fix almost writes itself once you know what you are actually fixing.

That is the habit I hire for. That is the habit that promotes people. And that is the habit that makes a 1 in 40 flake go to zero instead of 1 in 200.

The Flaky Test Epidemic: A Practical Guide to Tests You Can Actually Trust

Aston Cook — Fri, 03 Apr 2026 12:12:22 +0000

Last month, I watched a senior engineer on my team disable a test that had been failing intermittently for three weeks. His exact words: "I don't have time to babysit this thing." That test was covering a critical auth flow. Two weeks later, a bug shipped to production in that exact flow.

Flaky tests are not just annoying. They are actively dangerous. And based on conversations I have in interviews and across QA communities, this problem is getting worse, not better.

The Real Cost of Flaky Tests

Here is a number that should scare you: teams with high flaky test rates spend up to 30% of their engineering time investigating false failures. I have seen this firsthand across multiple organizations.

But the bigger cost is not time. It is trust.

When your CI pipeline cries wolf enough times, people stop listening. They start clicking "re-run" without reading the failure. They start merging with failing tests. They start skipping the pipeline entirely on "small changes." And suddenly your test suite is decoration, not protection.

I see this pattern constantly when conducting mock interviews on AssertHired. I will ask a candidate how they handle flaky tests, and the most common answer is "we just retry them." That is not a strategy. That is a coping mechanism.

The Five Usual Suspects

After debugging hundreds of flaky tests across different codebases, I have found that nearly all of them fall into five categories. Knowing which category you are dealing with cuts your debugging time in half.

1. Timing and Race Conditions

This is the biggest one, accounting for roughly 40% of flaky tests I have encountered. Your test assumes something will happen in a specific order, but the application does not guarantee that order.

The classic example: clicking a button and immediately asserting that a modal appeared. Sometimes the modal takes 50ms. Sometimes it takes 500ms. Your test passes locally but fails in CI where resources are more constrained.

The fix: Stop using arbitrary waits. In Playwright, use await expect(locator).toBeVisible() instead of await page.waitForTimeout(2000). The difference is that the first approach polls intelligently until the condition is true (or times out), while the second just hopes two seconds is enough.

A more subtle version of this problem: your test creates data via an API call and immediately navigates to a page expecting that data to be rendered. If there is any async processing, caching, or eventual consistency involved, you have a race condition.

The fix: Wait for the actual signal, not an arbitrary delay. Poll the API until the data is confirmed, or wait for a specific DOM element that only appears once the data has loaded.

2. Shared State Between Tests

This one is sneaky. Test A creates a user named "testuser@example.com." Test B also creates a user with that same email. When they run in sequence, everything is fine. When they run in parallel or in a different order, one of them explodes with a unique constraint violation.

I once spent two full days debugging a flaky test that only failed on Tuesdays. Turns out, it shared a database record with another test that only ran as part of the Tuesday scheduled suite. Two days of my life I will never get back.

The fix: Every test should create its own isolated data with unique identifiers. I like using a pattern like test-${Date.now()}-${randomSuffix} for any test data. Yes, it means more data cleanup, but it means zero cross-test contamination.

3. Environment Dependencies

Your test works perfectly on your MacBook. It fails in CI. Why? Because your CI runner has 2 CPU cores and 4GB of RAM instead of your 16-core M3 with 32GB.

Other environment culprits: different timezone settings, different locale settings, different screen resolutions for visual tests, network latency to external services, and DNS resolution timing.

The fix: Make your test environment as deterministic as possible. Pin your timezone in CI. Use fixed viewports for visual tests. And for the love of all things good, do not let your tests hit real external services. Mock them.

4. Order-Dependent Assertions

Your test asserts that a list contains items in a specific order, but the API does not guarantee ordering. Or your test checks element.textContent === "3 items" but the count depends on data that other tests may have created.

The fix: If ordering does not matter for the feature, do not assert on ordering. Use expect(items).toContain(expected) instead of expect(items[0]).toBe(expected). If you need to verify a count, make sure you are counting within an isolated scope.

5. Resource Cleanup Failures

A test opens a database connection, creates a WebSocket, or spawns a subprocess. When the test passes, cleanup runs. When it fails, cleanup gets skipped. Now the next test starts with leaked resources and behaves unpredictably.

The fix: Use beforeEach/afterEach hooks for setup and teardown, not inline code. In Playwright, leverage the built-in fixtures system. The framework guarantees teardown runs regardless of test outcome. This is exactly the kind of thing that separates solid automation from fragile scripts.

My Debugging Workflow

When I encounter a flaky test, I follow a specific sequence before I touch any code:

Step 1: Reproduce it. Run the test 50 times in a loop. In Playwright: npx playwright test my-test.spec.ts --repeat-each=50. If it does not fail at least once in 50 runs, run it in CI instead, since the environment difference might be the trigger.

Step 2: Check the category. Look at the failure message and stack trace. Is it a timeout? Probably timing. Is it a data conflict? Probably shared state. Is it consistent in one environment but not another? Probably environment.

Step 3: Isolate it. Run the flaky test by itself. If it passes consistently in isolation but fails when run with others, you have a shared state or resource cleanup problem. If it fails even in isolation, you have a timing or environment issue.

Step 4: Add logging, not retries. Before you add a retry, add a console.log at every async boundary in the test. Capture timestamps. You want to see exactly where the timing gap is happening. Retries hide bugs. Logging reveals them.

Building a Flake-Resistant Culture

Fixing individual flaky tests is important, but the real win comes from building habits that prevent them in the first place.

Quarantine immediately. When a test starts flaking, move it to a separate "quarantine" test suite that runs but does not block the pipeline. This keeps your main pipeline trustworthy while giving you time to fix the flake properly.

Track your flake rate. Measure the percentage of CI runs that fail due to flaky tests (not real bugs). If that number is above 5%, you have a problem that deserves dedicated sprint time. Most mature teams I have seen target under 2%.

Make flake fixes count as real work. This is a culture problem as much as a technical one. If fixing flaky tests is seen as janitorial work that does not "count," nobody will do it. Flake fixes are quality engineering. Treat them that way.

Stop Retrying, Start Fixing

The QA community's biggest struggle right now is not a lack of tools or frameworks. It is trust erosion. Every flaky test that gets retried instead of fixed is a small withdrawal from your team's confidence in the test suite.

The next time a test flakes on you, resist the urge to click "re-run." Open the failure. Categorize it. Fix it. Your future self (and your team) will thank you.

The QA Skills Gap Nobody Talks About: Why Knowing Playwright Isn't Enough

Aston Cook — Wed, 25 Mar 2026 02:34:38 +0000

By Aston Cook

I see it every week on LinkedIn. "Just learned Playwright in 30 days! Ready for automation roles!" And I genuinely root for these people. Learning a new tool is a real accomplishment. But I also know that most of them are about to hit a wall.

Because knowing Playwright (or Cypress, or Selenium, or whatever the hot framework is this month) is table stakes. It gets your resume past the keyword filter. It does not get you the job. And it definitely does not make you effective once you are in the seat.

The real skills gap in QA is not about tools. It is about everything that surrounds the tools.

The tool trap

Here is what happens. Someone decides they want to break into automation. They Google "best automation tool 2026" and find a dozen articles saying Playwright. They take a Udemy course. They follow along with the instructor, build a practice project against a demo site, and put "Playwright" on their resume.

Then they walk into an interview and I ask them how they would design a test strategy for a microservices application with 12 APIs and a React frontend. And they stare at me.

The tool did not prepare them for that question because that question is not about the tool. It is about understanding software systems, knowing what to test and why, and being able to communicate a testing strategy to engineers and product managers who think differently than you do.

What actually separates good QA engineers

I have worked across frontend, backend, and DevOps before landing in QA automation full-time. That winding path taught me something: the best QA engineers are not the ones with the most tools on their resume. They are the ones who understand systems.

Let me break down what I mean.

They understand how software actually works. Not at a PhD level. But they know what happens when you click a button. The HTTP request, the server-side processing, the database query, the response, the DOM update. When a test fails, they can narrow down where in that chain the problem lives. If you cannot follow a request from the browser to the database and back, you are going to struggle to write meaningful automation.

They know what to test. This sounds obvious but it is shockingly rare. I have reviewed test suites with 400 tests where half of them tested the same happy path in slightly different ways and zero of them covered the error states that users actually hit in production. Knowing what to test requires understanding risk, user behavior, and business context. No tool teaches you that.

They communicate clearly. QA engineers sit between developers, product managers, and sometimes customers. You need to explain a bug to a developer with enough technical detail that they can reproduce it. You need to explain test coverage to a PM in terms they care about. You need to write test plans that other QA engineers can follow six months from now. Writing a Playwright script and writing a clear bug report are two entirely different skills, and the second one matters more than most people think.

They think in systems, not scripts. A script tests one thing. A strategy tests a system. Good QA engineers think about how components interact, where integration points can break, what happens when third-party services go down, and how data flows through the application. They are not just running tests. They are modeling risk.

The fundamentals that never go out of style

Tools change. Selenium dominated for a decade and now many teams have moved to Playwright or Cypress. In another five years it will probably be something else. But certain skills transfer across every tool and every era of testing.

HTTP and networking basics. If you do not understand status codes, headers, request/response cycles, and how cookies and sessions work, you are going to write brittle API tests and have no idea why.

Programming fundamentals. Not just "enough to write a test." Actual fundamentals. Data structures, control flow, error handling, debugging, reading stack traces. When your test fails with a cryptic error at 2am in CI, these are the skills that help you fix it.

Version control. I still see QA engineers who are uncomfortable with git beyond basic commit and push. Branching strategies, merge conflicts, rebasing, reading diffs. You work in a codebase. Act like it.

CI/CD understanding. Your tests do not exist in isolation. They run in a pipeline. Knowing how that pipeline works, how to configure it, how to debug failures that only happen in CI, and how to optimize test parallelization will set you apart from 90% of automation engineers.

Database basics. You need to set up test data. You need to verify data state after tests run. You need to clean up after yourself. Basic SQL and an understanding of how your application stores data is not optional.

The mindset problem

There is also something less tangible going on. A lot of engineers treat QA automation as "writing scripts that click buttons." That mindset limits everything they do.

The better framing: you are an engineer who specializes in quality. Your job is to find problems before users do, to give the team confidence that the software works, and to make releases less scary. Automation is one of your tools. It is not your identity.

When you adopt this framing, your behavior changes. You start attending design reviews because catching a bad API contract early saves more time than any test suite. You start thinking about observability and monitoring as extensions of testing. You start asking product managers "what would make this release risky?" instead of waiting to be handed requirements.

What I would do if I were starting over

If I were building my QA automation skills from scratch in 2026, here is how I would split my time:

40% on programming and CS fundamentals. Get comfortable in one language. Write code outside of test files. Build a small API. Understand object-oriented design well enough to structure a test framework that does not collapse under its own weight.

25% on a single automation framework. Go deep, not wide. Understand the architecture, not just the syntax. Read the source code when something behaves unexpectedly. Learn the config options that nobody talks about in tutorials.

20% on system knowledge. How browsers work. How APIs work. How databases work. How CI/CD works. How containers work. You do not need to be an expert in any of these. But you need to be conversational.

15% on communication and soft skills. Practice writing bug reports. Practice explaining technical concepts to non-technical people. Practice presenting test results. If you want a safe environment to practice the interview side of this, tools like AssertHired exist specifically for that.

This is not a gatekeeping argument

I want to be clear about something. I am not saying you need to know all of this before you apply for your first automation role. Nobody starts with the complete package. I certainly did not.

What I am saying is that if your entire learning plan is "learn Playwright," you are setting yourself up for frustration. The engineers who grow fastest are the ones who treat the tool as one piece of a much bigger puzzle.

Learn Playwright. Learn it well. But also learn why your tests matter, how your application works, and how to talk about both of those things to people who are not testers.

That is the gap. And closing it is what turns a Playwright user into a QA engineer.

Aston Cook is a Senior QA Automation Engineer at Resilience (cybersecurity) and the creator of AssertHired, an AI-powered mock interview platform for QA engineers. He writes about QA careers, automation fundamentals, and the stuff nobody tells you before your first interview. Find him on LinkedIn where he shares QA content with 16K+ followers.