arXiv Bans Papers With Hallucinated LLM References for One Year

#ai #webdev #tutorial #productivity

arXiv changed the rules for how you can use a language model in a research paper. The preprint server now imposes a one-year submission ban when a paper contains incontrovertible evidence of unchecked LLM output — most commonly hallucinated citations or fabricated results that no one verified before the paper went live.

The policy doesn't ban LLM-assisted writing. It punishes laziness. If you ran a draft through a model, accepted its made-up reference list, and submitted without checking, you're now blocked from posting any preprint for twelve months. That's a real cost, especially for grad students and early-career researchers who use arXiv as a timestamp for priority claims.

What the policy actually targets

The trigger isn't AI use. It's verifiable error left in the manuscript. Three patterns get papers flagged:

Hallucinated citations — references that don't exist, or that exist but say something different from what the paper claims they say. The most common failure mode for ChatGPT, Claude, and Gemini when asked for sources.
Fabricated experimental results — numbers in tables that don't appear in any code or dataset the authors can produce, figures generated to illustrate a story rather than describe data.
Phantom prior work — claims about what a competing paper does or doesn't show, where the cited paper does no such thing.

You don't get banned for clean LLM-assisted prose. You get banned when a moderator can open a citation, see it doesn't exist, and conclude the author never opened it either.

The ban applies per author, not per paper. If you co-authored a flagged submission, your name carries the suspension across other manuscripts you'd otherwise post during that year. Verify before your co-authors do — your account is on the line too.

Why hallucinated citations slipped past so many drafts

The reason this is a policy and not just a guideline is volume. Reviewers and moderators have been reporting flagged submissions where the bibliography has the right shape — plausible journal names, real-looking DOIs, author lists that include genuine researchers — but the specific paper doesn't exist. The format is correct because the model learned what citations look like. The content is wrong because the model has no retrieval guarantee for the specific work cited.

When you paste a related-work section into a chat model and ask it to "add citations," you get strings that look like references. They are not references. The model is producing a sequence of tokens that match the statistical pattern of a bibliography. Some of those will be real. Some will be combinations of real authors with real-sounding titles attached to real journals — and they won't exist anywhere.

Three habits make this worse:

Copy-paste from the model's bibliography into your reference manager without DOI resolution. If your reference manager can't find the DOI, the paper probably doesn't exist.
Trusting "I'll check it later" for citation accuracy. Later is submission day. Submission day is when you ship the hallucination.
Skipping the "open the PDF" step for every cited claim. If you can't point to the paragraph in the cited paper that supports your claim, you can't defend the citation in review.

A verification workflow that actually works

The fix isn't a single tool. It's a workflow that closes the loop between every claim in your draft and a verifiable source. Here's what catches hallucinations before submission:

Step 1 — resolve every citation by DOI. Run your reference list through Crossref or a reference manager that resolves DOIs. Any citation that doesn't resolve is suspect. If you can't find it on Google Scholar, Semantic Scholar, and Crossref, treat it as hallucinated until proven otherwise.

Step 2 — for every cited claim, link to the supporting passage. Use a research workspace that lets you attach annotated PDFs to each claim. Notion, Obsidian, and Zotero with annotation plugins all work — the point is the discipline, not the tool. If a cited passage doesn't exist in the source, that's the citation to delete.

Step 3 — run a separate model pass that questions citations rather than generates them. Feed your bibliography and your claims into a second model and ask: "for each citation, what evidence in the cited paper supports the claim?" If the model can't answer, the citation is probably wrong, or your claim is overstated.

Step 4 — diff your bibliography against your own pre-LLM search. If you searched for related work yourself before the model helped you write, compare what you found to what's in the final bibliography. Citations that appeared only after the LLM touched the section get extra scrutiny.

The fastest tell for a hallucinated citation: the DOI doesn't resolve. Before you do anything else, paste every DOI in your bibliography into doi.org and confirm each one redirects to a real paper. This single check catches the majority of LLM-generated reference errors.

What this means for AI-assisted research writing

The policy shifts the accountability bar in a productive direction. You can still use models to draft, summarize, restructure, and edit. You cannot use them to produce references you haven't read or numbers you haven't computed. That distinction is straightforward to honor, and most researchers were already on the right side of it.

The hard cases are subtler: claims about what a cited paper "shows" that drift from what the paper actually argues, paraphrased findings that flip a sign, or summaries of method that omit the constraint that makes the comparison meaningful. Those errors don't always trigger the ban — they're invisible to automated checks. But they're the failures the policy is gesturing at. Treat citation hygiene as a first-class part of your writing workflow, not an end-stage chore.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.