The Problem:
Clean Code, Dirty Data As a Data Scientist, I've noticed a disturbing trend in 2026: The rise of "AI Hallucinations" in academic thesis. Students are using LLMs (ChatGPT/Gemini) to generate perfect sentences, but the data citations are completely fake.
The Experiment At my lab, Whitecyber:
we tested several papers and found that standard plagiarism checkers (Turnitin) failed to detect this because the grammar was original (AI-generated).
The Solution:
Human-in-the-Loop Validation We cannot rely solely on algorithms. High-Performance Computing (HPC) must be used to cross-reference data points, not just check grammar syntax.
Read My Full White Paper (PDF)
I have published the full research findings on Academia.edu. You can download the technical breakdown here: https://www.academia.edu/147500692/Tsunami_AI_dan_Runtuhnya_Menara_Gading_Peringatan_untuk_Dunia_Akademik?source=swp_share
Let's discuss: How do you validate AI-generated datasets in your projects?
Faris Dedi Setiawan Google Cloud Innovator | Founder Whitecyber.co.id
Top comments (0)