If you've ever studied for a CompTIA exam with free practice questions online, you already know the problem: most "practice tests" are recycled exam dumps. The same leaked pool, copy-pasted across a dozen sites, with a worrying number of "correct" answers that are subtly wrong and no indication of where any of it came from.
For a student, that's worse than useless. You can memorize a wrong answer with total confidence and walk into the exam having actively trained the mistake.
I wanted to build a practice-test engine that made the opposite promise to students: every answer you see can be traced back to a primary source. Not "I checked most of them." Not "a reviewer will probably catch the bad ones." A hard guarantee, enforced by code. This post is how I turned that promise into an actual pipeline — and what it cost.
The guarantee, stated as a rule
The whole engine is built around one deliberately unreasonable constraint:
If a question's correct answer can't be traced to a primary source, the pipeline isn't allowed to publish it.
That single rule is what separates a study tool a student can trust from a dump that quietly teaches them wrong. Everything below is the machinery that makes the rule true by default instead of by good intentions.
Correctness shouldn't be a vibe
The usual way AI-generated quiz content gets made is: prompt a model, get 50 questions, skim them, ship. The correctness of any given question is a judgment someone makes (or forgets to make) at review time. That doesn't scale, and it isn't reproducible — the same question might pass on Monday and fail on Tuesday depending on who's looking.
I wanted correctness to be a property of the engine, not a per-question opinion. The same way you don't "review" whether your tests pass — they go green or they don't — a question either carries proof or it gets killed automatically.
That proof is what I call a source receipt.
The receipt
Every question that survives generation has to carry a verbatim excerpt from the primary source material that justifies its correct answer. Not a vibe, not "the model said so" — an actual quoted span, stored on the question object, that a human or a script can check.
A question ends up looking roughly like this:
json{
"id": "netplus-1-0012",
"objective": "1.2",
"stem": "Which transport protocol establishes a session before data transfer?",
"answer": "TCP",
"evidence": {
"excerpt": "...connection-oriented transport establishes a session prior to exchange, in contrast to connectionless transport...",
"source": "official exam objectives, domain 1.2"
}
}
That evidence block is the whole game. It's a durable, stored receipt — it travels with the question forever, so a student (or I) can audit correctness months later, not just at the moment of generation.
There's a hard rule layered on top that matters both pedagogically and legally: approximate and explain, never reproduce. The receipt grounds the question in the real objective, but the question and explanation are written fresh. I'm not republishing a copyrighted exam pool — the engine generates original questions that are provably aligned to public objectives. That distinction is the entire reason a student can use this without studying leaked material.
The gates
Generation is the easy part. The interesting engineering is everything that tries to stop a bad question from ever reaching a student.
- The adversarial verifier. After a question is drafted, a second pass plays prosecutor instead of author. Its only job is to attack: does the stored excerpt actually support the marked answer? Is there a more correct option? Is the excerpt being stretched to cover a claim it doesn't make? The generator wants to ship; the verifier wants to reject. Correctness lives in the gap between them.
- check-mocks.mjs. A CI script that validates structural integrity across the whole bank — every question has a receipt, every receipt is non-empty, every answer maps to a real option, no orphaned references. It runs in CI and fails the build if anything's off.
- The blueprint-sum gate. CompTIA publishes domain weightings (e.g. domain 1 is X% of the exam). A gate checks the generated distribution actually matches the official blueprint, so a student's practice set mirrors the real exam shape instead of over-indexing on whatever was easy to generate.
- The NO-GO gate — the part I'm most proud of. When verification confidence drops below threshold, the question is cut. I deliberately tuned this gate to be trigger-happy: it currently runs at roughly a 24% false-cut rate, meaning about a quarter of the questions it kills were probably fine. That sounds like a bug. It's the most important design decision in the system. Shipping a confidently-wrong answer to a student costs them real money and a failed exam. Dropping a good question costs me nothing but a little generation budget. The failure modes are wildly asymmetric, so I tuned the gate toward the cheap failure. For a student-facing tool, over-cutting is a feature.
text$ node check-mocks.mjs
scanned: 187 candidates
verified: 142
NO-GO: 45 (confidence < threshold)
→ 142 shipped, receipts attached
The deliberately boring stack
None of this needs a heavy framework, and reaching for one would've been a mistake. The site is a vanilla JS quiz engine on Cloudflare Pages — static, fast, no build step to babysit, free to host. A weekly CI job re-runs the gates and flags staleness when objectives change. The discipline is in the pipeline, not the runtime.
It's live, and free for students
The engine currently feeds six CompTIA tracks — Network+, Security+, A+ Core 1 & 2, CySA+, and PenTest+ — all free, no signup, no paywall. You can try it here:
certpracticelab
Core1
Core2
Network+
Security+
CySA+
PenTest+
Every answer a student sees carries a receipt behind it. That's the promise, and it's enforced by code rather than by my good intentions.
Top comments (0)