What happens when you give an AI your acceptance criteria and ask it to write test cases?

#ai #automation #productivity #testing

After years of building frontend applications across e-health and e-learning products, I've sat in enough sprint reviews to notice a pattern: QA test cases are written the same way every time. Happy path first, a handful of negative cases if the deadline allows, edge cases if the tester has seen that bug before.

The process is repetitive, experience-dependent, and the first thing to get cut when a release is running late.

So I started experimenting — feeding acceptance criteria directly to an AI and asking for a complete test suite. Here's an honest account of what works, what doesn't, and what it actually changes about the process.

What the AI gets right immediately

The output quality on structured coverage is genuinely impressive. Given clear acceptance criteria, the AI will produce happy path cases, negative scenarios, boundary conditions, and precondition states faster than any manual process — and it won't skip the boring ones.

It also structures the output consistently: steps, expected results, preconditions. That consistency alone has value when you're maintaining a growing test library across releases.

Where it falls short

The AI has no knowledge of your system beyond what you give it. It doesn't know that your application handles an unauthenticated empty cart differently from an authenticated one, or that a particular field has a known edge case from three sprints ago.

More critically: vague acceptance criteria produce vague test cases. With a human tester, ambiguity triggers a question. With an AI, it triggers a confident but incorrect assumption. If your requirements only describe the happy path, the generated test suite will skew heavily toward the happy path.

What actually determines the output quality

After enough iterations, the pattern is consistent: the quality of the generated tests is almost entirely determined by the quality of the input. A few things that made a measurable difference:

Write constraints explicitly. "The form should validate correctly" is not a requirement. "The email field must reject inputs without an @ symbol and a valid domain" is.
Include failure conditions in your acceptance criteria. If you only document what should succeed, the AI will generate tests for success.
Specify the user role and context. "As an admin" and "as a guest" produce meaningfully different test suites for the same feature.
Add environment context. First-time user vs returning user, mobile vs desktop, authenticated vs unauthenticated — these details shape coverage significantly.
An honest assessment

AI doesn't replace a QA engineer. It replaces the first draft.

A good tester still needs to review the output, discard cases that don't apply to the actual system, and add scenarios based on knowledge no requirements document captures. That judgment isn't going away.

But the shift from writing to reviewing is more significant than it sounds. Starting with 80% of the test suite already structured means your QA effort goes toward the cases that actually require expertise — the ones that come from understanding the system, not from reading the spec.

That's a different kind of QA work. Arguably a more valuable one.

I cover this in more depth in a free QA handbook — link in my profile if you're interested.

Has anyone else been experimenting with AI-generated test cases? Curious whether the input quality pattern holds across different approaches — and what you've found the AI consistently gets wrong.

Top comments (1)

LUIZA-GEORGIANA ALEXANDRU • Jul 1

I completely agree that AI is replacing the first draft rather than the QA engineer.

One thing I've noticed while experimenting with AI-generated test cases is that reviewing the output has become a skill of its own. It's not enough to ask for test cases—you also need to recognize what's missing. AI can produce a well-structured list that looks comprehensive, but it may overlook product-specific workflows, historical defects, or business rules that aren't documented anywhere.

I've also found that refining the prompt often feels very similar to refining acceptance criteria. The clearer the requirements, the more useful the generated test suite becomes. In that sense, AI encourages teams to write better specifications, which benefits both development and testing.

For me, the biggest value isn't saving a few minutes writing test cases. It's starting from a solid baseline so I can spend more time on exploratory testing and validating scenarios that only someone familiar with the product would think to investigate.