Piotr

Posted on May 5 • Originally published at mljar.com

AI-Generated Code Looked Right, but the Data Was Wrong

#ai #python #datascience #pandas

I'm working on an AI Data Analyst in MLJAR Studio.

The idea is simple: you ask a question in natural language, AI writes Python code, executes it, and shows the result.

But recently I found a small example that reminded me why AI data analysis needs more than code generation.

The code worked

I was testing a medical data analysis use case with a diabetes CSV file.

The first task was simple:

load data from this URL

AI generated Pandas code with read_csv().

The code executed without errors.

The dataframe was displayed.

The shape looked correct: 768 rows and 9 columns.

So everything looked fine.

But then I looked at the dataframe.

148 pregnancies?

In the first row, the Pregnancies column had value 148.

That immediately looked wrong.

Values like 0, 1, 2, 6, or 8 make sense for number of pregnancies.

But 148?

No.

Then I noticed more strange things:

Pregnancies had values like 148, 85, 183
Age had values like 0 and 1
Outcome was empty
the whole dataframe looked shifted

The code worked, but the data was wrong.

AI also checked the output

In MLJAR Studio, my AI Data Analyst is not a one-step workflow.

It doesn't only:

generate code
execute code
show result

After the code is executed, there is another step. The LLM analyzes the generated output.

So AI doesn't only ask:

Did the code run?

It also asks:

Does the output make sense?

In this case, AI also noticed that something was wrong. It found suspicious values, missing values in the last column, and strange statistics.

This was very useful because Pandas didn't raise an error. The dataframe was created. But the output was incorrect.

What happened?

The CSV had a small formatting issue: an extra comma in the header row.

Because of that, Pandas treated the first value in each row as the dataframe index.

So the columns were shifted.

The value 148 was not the number of pregnancies. It was the glucose value.

That is why Glucose appeared as Pregnancies, Outcome appeared as Age, and the real Outcome column was empty.

The lesson

This example is small, but the lesson is important.

AI-generated code can look correct.

The notebook can run without errors.

The dataframe can be displayed.

And the data can still be wrong.

That is why AI data analysis needs output verification.

We need a human in the loop, because humans can use common sense. In this case, 148 pregnancies was clearly impossible.

But AI in the loop is helpful as well. AI can scan the output, check basic statistics, and warn us about suspicious values.

For me, the best workflow is:

ask AI to generate code
execute the code
display the output
let AI inspect the output
let the human review the result

AI can help us move faster.

But in data analysis, the real question is not: