D. Ceabron Williams

Posted on Jun 21

I Scored My Own AI Tool 35/100. Here's What I Fixed.

#edtech #ai #webdev #education

The Score: 35 Out of 100

I ran sabialibrarian.com through our own evaluator — the same tool we give to school librarians evaluating AI-generated student work — and scored 35 out of 100.

That hurt.

Not because the score was low in absolute terms, but because I'd been telling people for months that our evaluator was credible, well-reasoned, and built on actual librarian methodology. And then I turned it on ourselves and watched it hand back a 35.

Here's the thing about self-evaluation: it's easy to skip. It's easier to ship and say "we'll audit later." I was about to do that. Then I read the criteria breakdown and realized some things were genuinely broken — not just mediocre, but broken in ways that would confuse or mislead the librarians using the tool.

This is what I found, and what I fixed.

Why Self-Evaluating Your Own Tool Is Not Optional

If you're building a credibility tool and you don't test it against your own website, you have a credibility problem.

We built the Sabia evaluator so librarians could apply the CRAAP framework (Currency, Relevance, Authority, Accuracy, Purpose) to AI-generated content and web sources. The whole premise is: use a structured, transparent methodology to surface what's wrong with a source.

If that tool scores sabialibrarian.com at 35/100, I have to ask: what are the librarians who use this tool seeing when they evaluate their own students' work? And the answer is — whatever we shipped. Unaudited.

A tool you don't test against yourself is a tool you're not in control of.

The 35/100 Score: What Failed

Running the evaluator against sabialibrarian.com returned the following failures across the CRAAP criteria:

Criterion	Status	Reason
Currency	PASS	Site live and maintained; current domain registration
Relevance	FAIL	Content not matched to evaluator; no clear statement of who the tool is for
Authority	FAIL	No author credentials visible on the site; /about page returned 404
Accuracy	FAIL	Evaluator returning errors on the homepage; results not displaying
Purpose	PASS	Clear intent to help librarians evaluate sources

Two of the five criteria failed outright. One — Accuracy — was broken at a code level. Purpose passed, but only because the intent was legible even when the execution wasn't.

What Was Actually Broken

1. The evaluator was crashing on sabialibrarian.com

The scoring logic had a bug in the hero section evaluation — the code was calling a function that didn't exist in the current deployment. When a user landed on the homepage and ran an evaluation, the tool would error out before returning a result. This wasn't a low score. It was a broken tool.

2. The criteria breakdown wasn't connected

Even when evaluations ran, the per-criterion pass/fail breakdown — the most important part for librarian users — wasn't displaying in the results. The score showed, but not the reasoning. That's like handing someone a report card without the grades.

3. The /about page returned a 404

This one is embarrassing to write, but: the /about page — the single most important page for establishing who we are and why a librarian should trust us — was not actually deployed. It returned a 404. We'd been talking about the "about the founder" page for weeks. It was never on the server.

4. No methodology page existed

The evaluator asks users to trust its reasoning. But if you can't explain how it evaluates sources, the output is just an AI making claims with no accountability. We had no public methodology page — nothing showing what the CRAAP framework means in the context of the tool, what counts as a pass or fail, or how the scoring works.

What I Shipped to Fix It

Evaluator crash fixed (May 27)

Replaced the broken function call in the hero evaluation module. Added a guard clause for pages with dynamic content. Evaluations now complete without throwing errors.

Criteria breakdown wired live (May 28)

The per-criterion pass/fail results now display correctly. Users see why the score is what it is — which criteria passed, which failed, and what evidence the tool found. This is the feature that should have existed at launch.

/about page deployed (May 28)

The /about page is now live at sabialibrarian.com/about. It includes my background as a public and academic librarian, my M.L.I.S. credentials, and the specific experience that grounds the evaluator's methodology. Librarians evaluating the tool should be able to evaluate us too.

/methodology page added (May 29)

A methodology page explaining exactly how the evaluator applies the CRAAP framework — what each criterion measures, what counts as a pass or fail, and what the score means in practice. Transparent methodology is not optional for a tool that claims to evaluate credibility.

CRAAP reference sheet (May 29)

A print-ready CRAAP reference PDF is available on the /resources page. This gives librarians something to use with students even if they're not using the full tool.

Honest Limitations That Remain

The 35/100 score came from the evaluator — not from me. I can explain what it found, but I need to be straight about what it doesn't cover yet:

No real-time web crawling: The evaluator analyzes content signals and structural patterns but doesn't do live page verification. Citations and domain registration are checked at the time of the run, not against live records.
Multilingual evaluation is partial: The tool works in English, Spanish, and Portuguese — but not equally well across all content types.
No confidence calibration for edge cases: Sources that are borderline pass/fail still get scored without indicating uncertainty. A "38/100" score doesn't communicate that the evaluator was close to a different call on the Authority criterion.

These are not excuses. They're what the evaluator found, and they point to the next round of improvements.

Try It on Your Own Site

If you're a school librarian — or anyone building an AI literacy tool — run your own site through the evaluator at sabialibrarian.com.

If it scores low, that's useful information. Not all low scores mean "bad." Sometimes they mean "unfinished." I'd rather know which one I'm looking at than guess.

D. Ceabron Williams, M.L.I.S., is a retired public and academic librarian. The Sabia Librarian evaluator is built on the CRAAP framework and designed for school librarians evaluating AI-generated content and web sources. Try it at sabialibrarian.com.

DEV Community