lidianycs

Posted on Jan 5

How I Built a Tool to Detect AI-Generated Fake References

#opensource #java #github #programming

Large Language Models(LLMs) have become part of everyday academic and technical writing. But there is a problem the academic community has been flagging for a while, and many of us have encountered it firsthand: LLMs are very good at inventing citations. They may look plausible and almost match real papers. But they confidently cite work that does not exist at all. The academic community is calling them Ghost References.

Closing out my year with a journal editor shocker 🧵 Checking new manuscripts today I reviewed a paper attributing 2 papers to me I did not write. A daft thing for an author to do of course. But intrigued I web searched up one of the titles and that's when it got real weird...
— Ben Williamson (@benpatrickwill.bsky.social) 2025-12-19T17:20:04.127Z

As Professor Ben Williamson and Aaron Tay explained, the root problem is deep-seated:

"The ghost reference problem is a chronic condition that has become acute. The infection predates GenAI; the technology has simply lowered our immune response while accelerating transmission."

The issue is compounded because LLMs with general web search capabilities can fail to reliably verify references, as the web itself contains fake citations, creating a dangerous feedback loop. Despite being wrong, the sources are widely assumed to be authentic, the more they appear in published literature. For instance, one of the Ghost References to Prof. Williamson's work has accumulated 43 citations in Google Scholar.

Addressing the Reviewer's Burden

Peer reviewers are already stretched thin, and now, due to the proliferation of fake references, they have to manually copy-paste every single reference into a search engine to verify its existence.

This is a tedious, low-reward task often skipped in favor of focusing on the paper's actual content. But this "verification gap" is exactly where ghost references can slip through.

When It Happened to Me

That abstract concern turned into a concrete problem worth addressing when I discovered my own paper had been incorrectly cited in a published paper.

Seeing the flawed metadata published in a journal was a wake-up call that led me to build CERCA, an open-source tool designed to assist researchers, reviewers, and editors in quickly verifying the accuracy of references. It was developed to improve trust, transparency, and reliability academic writing.

What Is CERCA?

CERCA stands for Citation Extraction & Reference Checking Assistant.

Here's what it looks like in action:

In seconds, CERCA:

Scans a PDF and extracts the references
Queries OpenAlex, Crossref, and Zenodo
Flags potentially invalid citations with confidence scores
Shows you which metadata fields don't match

Instead of copy-pasting each reference manually, you get a verification report you can review in minutes.CERCA automates the tedious process of verifying whether the papers cited in a PDF file actually exist and if the metadata is accurate.

Development Insights

Building CERCA required solving a few interesting engineering challenges, particularly around fuzzy matching and bibliographic parsing.

Academic citations are messy. They come in dozens of formats (APA, MLA, IEEE, ACM, Vancouver, etc.). Creating a parser that could reliably extract these references without false positives was the first hurdle. I used Cermine, a Java library, to handle the heavy lifting of PDF parsing and metadata extraction.

The second was the verification logic. I used fuzzy matching to determine if a citation is close enough to be a typo or far enough to be a hallucination. Here's what the tool can detect:

Cerqueira, M.; Tavares, A.; Couto, C.; Maciel, R.; Santos, D.; Figueira, A. "Assessing software practitioners' work engagement and job satisfaction." [Example of Ghost Citation]

CERCA detects:
⚠️ Author list mismatch (6 fabricated, 9 omitted)
⚠️ Title incomplete
⚠️ First author name inconsistency

Correct paper reference:

Cerqueira, L., Nunes, L., Guerra, R., Malheiros, V., Freire, S., Carneiro, G., ... & Mendonça, M. (2025). Assessing Software Practitioners’ Work Engagement and Job Satisfaction in a Large Software Company—What We Have Learned. SN Computer Science, 6(3), 273.

🗃️ CERCA queries trusted repositories (OpenAlex, Crossref, Zenodo) and
uses fuzzy matching to catch these discrepancies, saving reviewers from
manually checking each citation.

🔍 Manual Fallback: If automatic search fails, you can right-click to
search for reference titles manually.

🔐 Due to reviewers' confidentiality, I put privacy first by design: PDFs are not uploaded and never leave your machine. All PDF parsing and reference extraction are performed locally.

Tech Stack

Java + JavaFX – Cross-platform desktop application
Cermine – PDF parsing and metadata extraction
OpenAlex, Crossref, Zenodo APIs – Reference verification
JavaWuzzy – Handles citation variations and typos

I chose this stack to build a Java desktop app using JavaFX for cross-platform compatibility (Windows, Mac, Linux).

Why Open Source?

Due to the tool's purpose, it must be transparent itself. Besides, this is a collective problem. By making CERCA open source, I'm inviting the community to audit the code, improve the parsers, and integrate more databases.

It is licensed under the GNU Affero General Public License (AGPL-3.0).

Who Can Use CERCA?

It is useful for anyone working on scholarly or technical writing. It is intended for:

Researchers performing final manuscript checks
Reviewers assessing reference consistency
Editors supporting editorial quality control
Meta-research and reproducibility workflows

Join the Effort

Ghost references are threatening scholarly trust. CERCA is a start, but it
needs your expertise:

Try it now:
📥 Download CERCA
(Windows | Mac | Linux)

Cerca does not solve the problem of ghost references, and it is not yet finished. It is a small, practical step. If it helps a researcher catch one incorrect reference, saves a reviewer time, or encourages more critical engagement with AI-generated text, then it is already serving its purpose. But you can help improve it:

🐛 Found an edge case?
💡 Have ideas?
🔧 Want to contribute?

👉🏾 Download the tool and explore the repository here

This project is a work in progress and an invitation to the research and developer communities to experiment, evaluate, and build better tools together.

Share your results: Did CERCA catch a ghost reference in your work? I'd love to hear about it in the comments.

DEV Community