After years of building frontend applications across e-health and e-learning products, I've sat in enough sprint reviews to notice a pattern: QA test cases are written the same way every time. Happy path first, a handful of negative cases if the deadline allows, edge cases if the tester has seen that bug before.
The process is repetitive, experience-dependent, and the first thing to get cut when a release is running late.
So I started experimenting — feeding acceptance criteria directly to an AI and asking for a complete test suite. Here's an honest account of what works, what doesn't, and what it actually changes about the process.
What the AI gets right immediately
The output quality on structured coverage is genuinely impressive. Given clear acceptance criteria, the AI will produce happy path cases, negative scenarios, boundary conditions, and precondition states faster than any manual process — and it won't skip the boring ones.
It also structures the output consistently: steps, expected results, preconditions. That consistency alone has value when you're maintaining a growing test library across releases.
Where it falls short
The AI has no knowledge of your system beyond what you give it. It doesn't know that your application handles an unauthenticated empty cart differently from an authenticated one, or that a particular field has a known edge case from three sprints ago.
More critically: vague acceptance criteria produce vague test cases. With a human tester, ambiguity triggers a question. With an AI, it triggers a confident but incorrect assumption. If your requirements only describe the happy path, the generated test suite will skew heavily toward the happy path.
What actually determines the output quality
After enough iterations, the pattern is consistent: the quality of the generated tests is almost entirely determined by the quality of the input. A few things that made a measurable difference:
Write constraints explicitly. "The form should validate correctly" is not a requirement. "The email field must reject inputs without an @ symbol and a valid domain" is.
Include failure conditions in your acceptance criteria. If you only document what should succeed, the AI will generate tests for success.
Specify the user role and context. "As an admin" and "as a guest" produce meaningfully different test suites for the same feature.
Add environment context. First-time user vs returning user, mobile vs desktop, authenticated vs unauthenticated — these details shape coverage significantly.
An honest assessment
AI doesn't replace a QA engineer. It replaces the first draft.
A good tester still needs to review the output, discard cases that don't apply to the actual system, and add scenarios based on knowledge no requirements document captures. That judgment isn't going away.
But the shift from writing to reviewing is more significant than it sounds. Starting with 80% of the test suite already structured means your QA effort goes toward the cases that actually require expertise — the ones that come from understanding the system, not from reading the spec.
That's a different kind of QA work. Arguably a more valuable one.
Has anyone else been experimenting with AI-generated test cases? Curious whether the input quality pattern holds across different approaches — and what you've found the AI consistently gets wrong.
Top comments (0)