Skip to content

DEV Community

degavath mamatha

Posted on Apr 3

I Gave AI a Simple Medical Instruction… It Missed a Critical Mistake

Most AI models perform well on benchmarks.
But what happens when you test them on real-world messy input?
I created a small challenge on VibeCode Arena to find out…

And the results were surprising.

🧪 The Challenge: Prescription Confusion Trap

Here’s the input I gave:

“Doctor ne bola din mein 2 baar dawa lena hai, par maine sirf ek hi li. Kal se chest pain halka hai, par breathing problem nahi hai. Mere papa ko heart problem hai. Maine ibuprofen li thi kal raat.”

🎯 The Task

Convert this into strict JSON format:

patient_info
active_symptoms
negative_symptoms
family_history_notes
medication_taken
dosage_misuse_flag (true/false) ## ⚠️ Why This Is Tricky

This is NOT just extraction.

It tests real-world understanding:

👉 Doctor said 2 times, patient took only once
👉 “No breathing problem” (negative symptom)
👉 Father has heart problem (not the patient)
👉 Medication taken: ibuprofen ## ❗ The Critical Question

Can your AI detect this?
👉 dosage_misuse_flag = true

Many models miss this completely.

🔥 Try It Yourself

I’ve made this challenge public here:

👉 https://vibecodearena.ai/duel/5a70b6a3-20b7-4bb2-94f6-bd494f5d60c2

⚔️ Challenge Rules

Use your favorite AI (ChatGPT, Claude, Gemini, etc.)
Generate the JSON output
Comment your result

Top comments (0)

Subscribe