D. Ceabron Williams

Posted on May 6 • Originally published at sabialibrarian.com

A librarian's guide to evaluating sources in the age of AI

#informationliteracy #ai #programming #webdev

The problem isn't AI. It's us.

Every day, developers ask ChatGPT, Claude, and Perplexity for code samples, architecture patterns, and technical explanations. We copy the answer. We ship it. We move on.

But here's what we don't ask: Where did that come from?

AI generates answers that sound authoritative—fluent, confident, well-structured. It does not tell you where the information originated. And when you ask for citations, it confidently generates ones that don't exist.

This isn't a bug in AI. It's a feature of how language models work. They predict the next most likely word based on patterns in training data. When they don't have a fact, they guess. And they guess so convincingly that MIT research in 2025 found they're 34% more confident when lying than when telling the truth.

The stakes are real.

In 2025, Deloitte submitted a $440,000 report to the Australian government—complete with fabricated academic sources. In November 2025, a $1.6 million health plan for Newfoundland & Labrador was discovered to contain at least four citations to non-existent research papers. In September 2025, a lawyer in San Francisco was sanctioned by a federal judge for submitting AI-hallucinated case citations to the court.

Over 700 legal cases in 2025 alone involved AI-generated hallucinated content. In academic publishing, NeurIPS 2025 accepted 4,841 papers—and GPTZero identified at least 100+ hallucinated citations across 53 papers, despite rigorous peer review.

For developers: A hallucination in your architecture recommendation doesn't get you sued. But it does get copied into production, into tutorials, into the next person's codebase. The technical debt compounds.

You already know how to solve this. You just don't know it.

Librarians have been evaluating sources for centuries. Long before Google, before citation indexes, before the internet itself, they built frameworks to determine: Is this source trustworthy? Where did this come from? Who benefits if I believe it?

These frameworks are still the gold standard for information evaluation. And they work perfectly for AI-generated content.

The Librarian's Framework: CRAAP

The most widely taught evaluation method in libraries is CRAAP:

Currency — When was this published or last updated? Is it current for my use?
Relevance — Does it actually address my question?
Authority — Who created this? What are their credentials?
Accuracy — Can I verify the claims? Are there citations? Can I cross-check them?
Purpose — Why does this exist? Who benefits from me believing it?

When you ask AI for a code sample, you're asking it to be a source. Apply CRAAP:

Currency? AI training data has a knowledge cutoff. If you ask ChatGPT about a library update from last month, you're asking it to guess.

Relevance? AI often answers the question you asked, not the question you need answered. It optimizes for plausibility, not precision.

Authority? An AI has no credentials, no affiliation, no reputation on the line. It's predicting words. When authority matters—cryptographic best practices, HIPAA compliance, security-critical algorithms—you need a source that can be wrong and suffer consequences.

Accuracy? A Columbia Journalism Review analysis found ChatGPT hallucinated citations 67% of the time. Grok-3 hallucinated 94% of the time when asked to identify the original source of news excerpts.

Purpose? AI has no purpose beyond the next token. It's not trying to help you or mislead you. It's generating statistically likely text. That neutrality doesn't make it reliable.

Three Real Hallucinations (and What They Cost)

Example 1: The Fabricated Legal Citation

In September 2025, attorney Katherine Cervantes submitted a brief to U.S. District Court citing a case that was completely invented. The judge sanctioned her—and later sanctioned her supervising partner for insufficient oversight of AI use.

For developers: If AI recommends a library, verify it exists on npm. Run npm view <library>. Check GitHub. Look at the commit history. A hallucinated library recommendation won't get you sued, but it will get copy-pasted.

Example 2: The Government Report with Fake Sources

Deloitte's 2025 report to the Australian government included several invented academic references. A $440,000 contract now under review—and scrutiny on every other AI-generated deliverable.

For developers: If you use AI to write documentation, architecture decisions, or threat models—verify every external claim. Don't assume the AI knows the difference between "standard practice" and "thing I hallucinated."

Example 3: The Predatory Journal Flooded with AI Hallucinations

In 2025–2026, lower-tier academic journals published hundreds of papers with AI-generated citations and fabricated data summaries. Many passed peer review. Why? Reviewers didn't have tools to detect hallucinations at scale.

For developers: Your code reviews catch logic errors. You need a different check for AI-generated components: Does every external claim have a verifiable source?

How to Evaluate AI Sources: The Practical Workflow

Step 1: Assume it's wrong until proven right.

When AI gives you an answer, don't ask "Does this look right?" Ask "Can I verify this independently?" Hallucinations look right. They're fluent, confident, well-structured. Your job is to override that instinct.

Step 2: Check the citation (the ACCURACY check).

If AI provides a source, verify it exists:

Copy the exact claim into Google Scholar
Search for the exact paper title
If it doesn't exist, it's hallucinated
If it exists but says something different, it's misattributed

A study tested eight AI assistants on identifying original news sources:

Perplexity: 37% hallucination rate
ChatGPT: 67% hallucination rate
Grok-3: 94% hallucination rate

None expressed uncertainty despite being wrong most of the time.

Step 3: Use lateral reading (the AUTHORITY check).

Open a new browser tab and search for the topic independently. Cross-reference multiple sources. When you read three independent sources, disagreement screams.

Step 4: Check the purpose (the PURPOSE check).

Ask: Who might have trained the model on this information? What assumptions are baked into the training data? If the AI recommends a popular framework, check if that's because it's genuinely better—or because it's more common in training data.

Step 5: Verify currency (the CURRENCY check).

Always ask the AI: What's your knowledge cutoff date? Then assume knowledge from the last 3–6 months is unreliable.

Where AI Actually Fails (and When to Trust It More)

It fails on:

Recent events or updates (knowledge cutoff)
Citations and attribution (fabrication by design)
Niche or specialized domains (sparse training data)
Things that only exist in paywalled sources

You can trust it more on:

Writing and editing (LLMs are good at language)
Brainstorming and ideation (generating options, not facts)
Summarization of content you provide
Refactoring and code style
Explaining concepts you already partially understand

The difference: Generative tasks are safer than retrieval tasks. Generate code from your spec. Don't retrieve "best practices" without verification.

The Tool That Does This Automatically

A librarian evaluates a source by looking at who created it, when, where, and for what purpose. They spot inconsistencies. They verify citations. They integrate multiple signals into a judgment call.

That's what Sabia does.

Sabia evaluates any URL in 30–60 seconds using librarian-grade criteria:

Authorship: Who wrote this? What are their credentials?
Publication: Where did this come from? Is it peer-reviewed, editorial, self-published?
Currency: When was it published?
Accuracy: Are claims supported by citations?
Objectivity: Does the source have a clear bias or agenda?

Feed Sabia a URL that an AI recommended—a tutorial, a research paper, a documentation page—and it tells you: Is this trustworthy? Who should trust this? What's the catch? Can I cite this?

It's what a librarian would do in real time. Except Sabia works while you code.

Why This Matters Beyond Not Getting Sued

A hallucinated architecture recommendation gets copied into production. The next developer inherits it. They don't know it came from an AI, so they treat it as established practice. Months later, when performance degrades or security issues arise, the investigation starts with "This is how we've always done it."

You wouldn't ship code without code review. Don't ship AI-generated information without information review.

The Framework You Already Have

You know how to do this. You do it every day in code review:

Authority: Does this PR come from someone who understands the system?
Accuracy: Are the changes correct?
Currency: Is this solution current, or are we using an outdated pattern?
Relevance: Does this solve the actual problem?
Purpose: What's the intent here? Is there a hidden cost?

These are the exact questions a librarian asks about sources. Apply that same rigor to AI-generated sources. It's not a new skill—it's a skill you already have, applied to a new problem.

Start Here

Next time you ask AI a question: Screenshot the answer and the source.
Verify one claim: Use Google Scholar. Does the cited paper exist?
Cross-check laterally: Search for the topic independently.
Keep a scorecard: How often does AI get this right?
Use Sabia for high-stakes sources: Try it at sabialibrarian.com.

Information literacy in the age of AI isn't about distrusting AI. It's about trusting yourself to be the filter AI can't be.

DEV Community