Umair Iftikhar

Posted on Jun 25

What I've Learned Using AI for Testing in Industries Where Mistakes Actually Cost Something

#testing #qa #automation #ai

A while back I sat through a demo where someone generated an entire regression suite from a single prompt. It was genuinely impressive. The tests appeared in seconds, the locators looked clean, and the presenter barely touched the keyboard.

My first thought wasn't "this is going to replace me." It was "I would never let that run on a claims platform without a human reading every line first."

That gap, between what AI testing tools can do and what you can actually trust them to do in a regulated environment, is something I think about constantly. I've spent the last five years building automation frameworks for clients in fintech, insurance, and healthcare. In those worlds the conversation about AI sounds very different from the optimism I usually see online.

It isn't that AI doesn't belong in testing. It clearly does, and I use it most days. It's that the cost of being wrong changes everything about how you use it.

When a passing test is the dangerous outcome

Most AI testing tools are built, whether the makers say so or not, for environments where a false positive is irritating rather than serious.

If a test wrongly passes on an ecommerce checkout, a bad order slips through. You refund the customer and move on. If a test wrongly passes on a claims workflow, or a dosage calculation in a health system, or a transaction check in a payments engine, you are in completely different territory. Audit exposure. Regulatory questions. Real harm to real people.

So nothing here is an argument against AI. It's an argument for understanding the room you're standing in before you decide how much to trust the machine.

The three things that quietly break

Three areas come up again and again.

The first is test data. In a normal product, fake data is trivial. Invent a name, invent an email, done. In healthcare that data has to respect HIPAA. In fintech you're handling account numbers and transaction histories. In insurance, policy data is both legally and personally sensitive. Tools that generate test data on demand will happily hand you something that looks perfectly realistic and quietly breaks your data handling rules the moment it lands in a test environment. I've seen suites seeded with data exactly like that. It looked fine. It would have failed an audit on sight. The tool wasn't being careless. It simply had no idea what world it was operating in.

The second is the audit trail. Regulated work gets audited, sometimes by people with the power to stop a release. When an auditor asks why a test passed, "the model wrote it that way" is not an answer anyone will accept. Every test in that kind of pipeline needs a reason a human can read and defend. AI can absolutely help draft the test. What it can't do is own the intent. That part stays with an engineer who can explain the logic out loud.

The third is change. In consumer software you ship, test, and fix at speed. In an insurance or healthcare platform, a single release can pass through compliance gates, change boards, and approvals before it ever reaches production. This is where the feature that lets tests repair their own locators gets interesting. Tests that quietly rewrite themselves when the interface shifts sound wonderful, until you remember that in a validated suite an undocumented automatic change is still a change, and under frameworks like GAMP 5 that can drag you back into validating the whole suite again. A model that helpfully fixed your tests overnight can hand you a compliance problem that takes weeks to untangle.

Where it genuinely earns its place

I don't want this to read as a list of reasons to avoid AI, because I lean on it. The trick is putting it where the downside is small.

Drafting is the obvious one. Handing a model a requirements document or a set of user stories and asking for a first cut of test cases saves real time. In regulated work that draft is never the finished article. A human reviews it, adds the compliance edge cases the model missed, and writes down the reasoning. But starting from something solid beats starting from a blank file, and the model often surfaces a scenario a tired reviewer would skip.

Coverage analysis is similar. Feeding a spec to a model and asking what scenarios it implies is useful less as an answer and more as a second opinion. When a missed scenario can mean a failed audit, another set of eyes, even synthetic ones, has value.

Then there's the plumbing. Page objects, data factories, pipeline configuration, all the mechanical scaffolding nobody enjoys writing. Tools like GitHub Copilot have made that part of my week noticeably faster, and the risk stays low because I'm reading and integrating the output, not shipping it untouched.

The skill that actually matters now

Here's the part I think a lot of people are dancing around.

AI is not going to delete QA jobs in regulated industries. But it is going to change what makes a QA engineer worth hiring.

Knowing how to prompt a model for a test case is not really a skill anymore. It's a baseline, like knowing how to use version control. The people who become hard to replace are the ones who know when to overrule the model. The engineer who reads a generated test and instantly spots that it ignores a data residency rule in the third scenario. Who notices that a locator was silently rewritten in a way that changes what the test actually checks. Who can sit across from an auditor and explain, in plain language, what every test is verifying and why it exists at all.

That kind of judgement rests on domain knowledge. Understanding how a payment really settles, how a claim moves through a system, how health data is allowed to be handled. A model can't stand in for that, and in regulated industries it's exactly what makes an engineer valuable.

The future I see isn't AI replacing testers. It's testers who understand both the rules and the tooling, leaning on the tooling to move faster and on the rules to know where to push back.

A rough test I apply

Before I let anything a model produced into a regulated suite, I ask myself three questions.

Can I explain this to an auditor in plain language? If I can't describe the test, its inputs, and its expected results to someone outside the technical team, it needs more documentation before it goes anywhere near the suite.

Does it touch sensitive data? If it does, anything the model generated or suggested gets a compliance review before it runs anywhere that isn't fully isolated.

Is this suite under formal validation? If the project falls under something like GAMP 5 or 21 CFR Part 11, any automated change, including one the model made on its own, goes through change control. No shortcuts.

None of that is red tape for its own sake. It's the line between automation that helps a regulated product ship safely and automation that turns into a liability you discover at the worst possible moment.

The bigger picture

AI is going to reshape testing. I believe that, and I'm actively learning the new tools as they arrive.

But in regulated industries the change will look different. Slower, more deliberate, with human judgement carrying real weight for longer. That isn't a weakness. When the software you're testing handles a medical record, a mortgage, or someone's insurance claim, a testing mistake reaches a lot further than a bad afternoon for one user.

The engineers who can hold both ideas at once, moving faster with AI while keeping the rigour these industries demand, are the ones who will set the standard for what good looks like next.

I'm still working mine out. These are just the questions I've found worth asking.

If you've used AI tools inside a regulated environment, I'd genuinely like to hear what has held up for you and what hasn't.

Top comments (4)

xulingfeng • Jun 26

The 'model that helpfully fixed your tests overnight' line hit different. Compliance cleanup from that kind of auto-fix takes longer than writing tests by hand 😅 But the three-question test at the end is the real gold — especially 'can I explain this to an auditor.' Already stealing that.

Umair Iftikhar • Jun 26

Ha, the overnight auto-fix line came straight from scar tissue, so I'm glad it landed. And you've put your finger on the real cost: the fix itself is basically free, the reconciliation is what eats the week. The 'can I explain this to an auditor' question is the one that's saved me the most grief, so steal away. Curious what's actually worked for you on the cleanup side; is it vigilance at the PR stage, or have you found something better?

xulingfeng • Jun 27

PR review helps but it's not enough. What made it stick: the AI writes a quick note in plain English — what it changed and why. If someone can't read it in 30 seconds and go 'yeah okay that adds up', it doesn't merge

Umair Iftikhar • Jun 28

Exactly. In high-stakes work, "it works" was never enough, "I can see why in 30 seconds" is. The plain-English note is the accountability layer, not a nice-to-have. An explanation you can't verify at a glance is just a liability with better formatting.