I'm working on an AI Data Analyst in MLJAR Studio.
The idea is simple: you ask a question in natural language, AI writes Python code, executes it, and shows the result.
But recently I found a small example that reminded me why AI data analysis needs more than code generation.
The code worked
I was testing a medical data analysis use case with a diabetes CSV file.
The first task was simple:
load data from this URL
AI generated Pandas code with read_csv().
The code executed without errors.
The dataframe was displayed.
The shape looked correct: 768 rows and 9 columns.
So everything looked fine.
But then I looked at the dataframe.
148 pregnancies?
In the first row, the Pregnancies column had value 148.
That immediately looked wrong.
Values like 0, 1, 2, 6, or 8 make sense for number of pregnancies.
But 148?
No.
Then I noticed more strange things:
-
Pregnancieshad values like148,85,183 -
Agehad values like0and1 -
Outcomewas empty - the whole dataframe looked shifted
The code worked, but the data was wrong.
AI also checked the output
In MLJAR Studio, my AI Data Analyst is not a one-step workflow.
It doesn't only:
- generate code
- execute code
- show result
After the code is executed, there is another step. The LLM analyzes the generated output.
So AI doesn't only ask:
Did the code run?
It also asks:
Does the output make sense?
In this case, AI also noticed that something was wrong. It found suspicious values, missing values in the last column, and strange statistics.
This was very useful because Pandas didn't raise an error. The dataframe was created. But the output was incorrect.
What happened?
The CSV had a small formatting issue: an extra comma in the header row.
Because of that, Pandas treated the first value in each row as the dataframe index.
So the columns were shifted.
The value 148 was not the number of pregnancies. It was the glucose value.
That is why Glucose appeared as Pregnancies, Outcome appeared as Age, and the real Outcome column was empty.
The lesson
This example is small, but the lesson is important.
AI-generated code can look correct.
The notebook can run without errors.
The dataframe can be displayed.
And the data can still be wrong.
That is why AI data analysis needs output verification.
We need a human in the loop, because humans can use common sense. In this case, 148 pregnancies was clearly impossible.
But AI in the loop is helpful as well. AI can scan the output, check basic statistics, and warn us about suspicious values.
For me, the best workflow is:
- ask AI to generate code
- execute the code
- display the output
- let AI inspect the output
- let the human review the result
AI can help us move faster.
But in data analysis, the real question is not:
Did the code run?
It is:
Does the output make sense?
Top comments (0)