DEV Community: Robert Cizmas

AI Risks in Health and Finance: When Errors Matter

Robert Cizmas — Mon, 30 Mar 2026 11:20:24 +0000

You're probably already using AI in some form, or at least thinking about it. And you should be. The potential is real. But if you work in healthcare or finance, the gap between "impressive demo" and "reliable in production" carries consequences that go well beyond a bad quarter or a frustrated customer. In these industries, errors cost money, trust, and sometimes lives.

‍

There is good news though. You don't have to choose between adopting AI and managing risk. But you do need to understand where things go wrong, and why the right checks and balances matter more here than anywhere else.

Healthcare: When the Algorithm Gets It Wrong, Patients Pay
In March 2025, ECRI, the independent nonprofit that monitors healthcare safety, named insufficient governance of artificial intelligence as the number two patient safety concern for the year. That's not a fringe concern buried in a footnote. It sits just behind medical gaslighting on a list informed by incident data, scientific literature, and expert analysis. ECRI warned that AI-generated medical errors could lead to misdiagnoses and inappropriate treatment decisions, causing injury or death, and that staff may struggle to identify when errors are actually attributable to AI.

‍

The regulatory picture hasn't kept pace, either. A study published in JAMA Health Forum in August 2025 examined recalls of AI-enabled medical devices cleared by the FDA. Researchers found that 43.4% of recalls occurred within the first year of device clearance, which is roughly double the rate for all 510(k) devices. The vast majority of recalled devices had not undergone clinical trials, and devices without reported clinical validation were associated with larger recalls and more recall events per device. Publicly traded companies manufactured 53.2% of AI-enabled devices but accounted for 91.8% of recalls.

‍

Meanwhile, the ongoing legal battle over the nH Predict algorithm continues to highlight the human cost of unchecked AI in healthcare decision-making. In September 2025, a federal judge denied UnitedHealth's request to limit discovery in the class action lawsuit alleging the insurer used the AI tool to override physicians' recommendations and deny post-acute care to elderly Medicare Advantage members. The plaintiffs allege that more than 90% of appealed denials were reversed, yet the tool remained in use.

Finance: Speed Without Oversight Is Just Fast Failure
On 7 April 2025, the Warsaw Stock Exchange suspended all trading for approximately 75 minutes after a flood of automated, high-frequency trading orders overwhelmed the exchange during a period of extreme global volatility. The WIG20 index plunged as much as 7% intraday before the halt was imposed. Bloomberg subsequently reported that the exchange began reviewing its algorithmic trading regulations in the aftermath, noting that algorithmic and high-frequency strategies accounted for 18.4% of Warsaw's equity trading volumes in the prior year.

‍

It's a reminder that in financial markets, automated systems don't just execute faster — they fail faster too. When multiple algorithms react to the same signals simultaneously, without adequate circuit breakers or human oversight, the result can escalate from a minor data error into a market-wide disruption in minutes.

The Theme Isn't Caution. It's Confidence.
None of this means you should avoid AI. These technologies are transforming both industries for good reasons, and standing still carries its own risks. But there's a meaningful difference between moving fast and moving with confidence.

‍

The pattern across every example above is the same: systems deployed without sufficient testing, monitoring, or human oversight. AI that enters production without proper validation. Outputs that nobody checks until something breaks. Governance frameworks that haven't caught up to the tools they're meant to govern.

‍

The teams that will get the most from AI in healthcare and finance are the ones that build verification into their workflows from the start — not as an afterthought. That means understanding what your models are doing, testing them before and after deployment, and maintaining the kind of visibility that lets you catch problems early rather than explaining them later.

‍

It's not about slowing down. It's about knowing what you're shipping actually works.

The Hidden Complexity of a 60-Line Script: Why Visual Programming and Testing Are the Next Step

Robert Cizmas — Wed, 11 Mar 2026 15:02:56 +0000

You've written a neat little script. Sixty lines, maybe seventy. It loads some data, runs a few transformations, trains a model, and spits out a result. Clean. Simple. Done.

Except it isn't simple. Not really.

What 60 Lines Are Actually Doing

Here's the thing about data science code: it doesn't read like a novel. It reads like a conversation happening in five different rooms at once. A dataframe created on line 12 might not be touched again until line 58. A feature engineered on line 30 quietly feeds into a join on line 47, which itself depends on a filter defined way back on line 15. The logic isn't linear, even if the script is.

And that's just one script. In most real pipelines, you're dealing with multiple scripts, shared datasets, and transformations that ripple across files in ways that aren't obvious from reading the code top to bottom.

This is what we call the interplay between data and code, and it's where hidden complexity lives. Your code executes sequentially, but the relationships between your data objects don't follow that order. They form a network: branching, merging, looping back. The script might be 60 lines, but the logical structure underneath could be far more tangled than it appears.

Why Reading the Code Isn't Enough

As data scientists, we tend to trust our ability to hold the pipeline in our heads. And for a while, that works. But the moment you step away for a week, hand the work to a colleague, or need to explain your process to a compliance team, the gaps become obvious.

You can't see the dependency between line 20 and line 60 just by scanning the code. You can't easily spot that a single data transformation feeds three downstream outputs. And you definitely can't explain the logical flow to a non-technical stakeholder by showing them a Python script.

This isn't a failure of skill. It's a limitation of the medium. Code is a set of instructions. What's missing is a way to see how those instructions actually interact with your data, and that requires a different kind of representation entirely.

Seeing the Network, Not Just the Script

This is why we built Lineage as part of Etiq's Data Science Copilot. Lineage takes your script and visualises the interplay between your data and your code as a network diagram, directly in your IDE. Data objects become nodes. Functions and transformations become connections. And suddenly, that hidden complexity isn't hidden anymore.

You can trace a single data table through your entire pipeline: where it was created, what transformed it, where its outputs end up, and how it connects to everything else. Those non-linear relationships that were invisible in the code are now laid out clearly in front of you.

It doesn't matter whether you're building something new or inheriting someone else's work. Lineage works with what you already have, analysing your scripts without requiring you to change how you code.

Testing What You Can Now See

Visibility is only half the picture. Once you can see the complexity in your pipeline, the next question is: how do you verify it? How do you know that the data flowing through those connections is behaving the way you expect?

That's where targeted testing comes in. When you can see your pipeline as a network, you can identify exactly where to place tests, at the critical junctures where data transforms, merges, or feeds into model training. You're not guessing what to test anymore; you're testing what matters, precisely where it matters.

Etiq's Testing Recommendations work alongside Lineage for exactly this reason. Once you can see the structure, our copilot recommends the right tests for the specific points in your pipeline that carry the most risk, and lets you run them with a single click.

Complexity Isn't the Enemy. Invisibility Is.

A 60-line script can hide a surprising amount of complexity, and that's fine. Data science pipelines are complex because the problems they solve are complex. The issue isn't the complexity itself. It's not being able to see it, verify it, and communicate it.

When you can visualise the network your code creates and test the data flowing through it, you move from hoping your pipeline works to knowing it does. And that's a very different place to be on a Friday afternoon.

Etiq.ai

With genAI writing all the code it feels like programmers are now debuggers

Robert Cizmas — Thu, 05 Mar 2026 11:35:17 +0000

How to Actually Solve AI Hallucinations

Robert Cizmas — Wed, 04 Mar 2026 11:21:52 +0000

Why hallucinations appear, why they're predictable in real environments, and what practical techniques teams use to reduce or control them.

You've just asked your AI coding assistant to help refactor a data pipeline. The code it hands back looks clean, reads well, runs well and even includes helpful comments. There's just one problem: the data it's transformed as part of your pipeline doesn't exist. Welcome to the world of AI hallucinations, where confidence and correctness are two very different things.

If you're a data scientist or ML engineer using AI assistants in your daily workflow, hallucinations aren't a theoretical risk. They're a practical, measurable reality that your team is almost certainly encountering right now.

The Numbers Don't Lie
The scale of the problem is becoming increasingly well-documented. CodeRabbit's State of AI vs Human Code Generation report (December 2025), which analysed 470 open-source GitHub pull requests, found that AI-authored code produces 1.7x more issues than human-written code. That includes 1.75x more logic and correctness errors, 1.57x more security findings, and 1.64x more code quality and maintainability issues.

Meanwhile, the Stack Overflow 2025 Developer Survey reveals a striking paradox: 84% of developers now use or plan to use AI tools, but only 29% trust those tools to produce accurate output, down from 40% the year before. The most commonly cited frustration, reported by 66% of developers? Code that is "almost right, but not quite."

For data science teams specifically, the risks compound. A USENIX Security 2025 study tested 16 code-generating LLMs and found that nearly 20% of over 2.2 million packages referenced across 576,000 code samples were entirely hallucinated. Worse still, 43% of those hallucinated package names were repeated consistently across multiple queries, making them predictable and therefore exploitable through what researchers have termed "slopsquatting" attacks.

Why Hallucinations Are Predictable
Here's the thing: hallucinations aren't random. They follow patterns. AI models generate code statistically, not semantically. They don't understand your business logic, your data schema, or why you chose that particular feature engineering approach. They pattern-match against training data, and when the context is ambiguous or domain-specific, they fill in the gaps with confident-sounding guesses.

This is particularly acute in data science and ML work, where pipelines involve complex interactions between code, data, and model behaviour. A general-purpose coding assistant doesn't know that your preprocessing step introduces target leakage, or that the library it's suggesting was deprecated two versions ago. It generates what looks plausible, and it's on you to verify whether it's actually correct.

Verification Is the Real Solution
The most effective mitigation isn't about finding a model that hallucinates less, though improvements are real and ongoing. It's about building verification into your workflow so that hallucinations are caught before they cause damage.

Research increasingly supports this approach. A comprehensive review of hallucination mitigation techniques published in Mathematics (2025) found that the most effective current pattern is to stop treating AI as an oracle and start treating it as a generator inside a verification loop: reduce the model's freedom to improvise, and increase your system's ability to check what was produced.

For data science and ML teams, this means systematic testing at the point of development, not as an afterthought. It means knowing what to test, where to test it, and having the ability to trace how data and code interact throughout your pipeline so that when something goes wrong, you can identify exactly where and why.

This is the approach we've built into Etiq's Data Science Copilot, verification of outputs by default. Does the code written act in the way you've intended? Through building our Lineage followed by testing and validating all data points, verification becomes a natural part of development rather than an additional burden.

Building Confidence, Not Just Code
The gap between AI adoption and AI trust is real, and it's growing. Closing that gap doesn't require abandoning AI tools. It requires building the verification infrastructure that makes them reliable. For data science teams working with complex pipelines, sensitive data, and high-stakes decisions, that infrastructure isn't optional. It's the difference between shipping models you hope work and shipping models you know work.

Your AI assistant can help you write code faster. But only proper verification ensures that speed doesn't come at the cost of reliability.

‍
Ready to build verification into your data science workflow? Start a free trial of Etiq's Data Science Copilot and see how testing recommendations, lineage, and root cause analysis work together to catch hallucinations before they catch you.

How do you solve AI Hallucinations?

Robert Cizmas — Wed, 04 Mar 2026 11:19:23 +0000