DEV Community

Ted
Ted

Posted on • Originally published at tedagentic.com

The Database Was Fixed Months Ago. The Website Disagreed.

I run a travel site whose core dataset is legal-status data: whether cannabis is recreational, medical-only, decriminalized, or illegal in 213 countries and all 50 US states. Getting one of those entries wrong isn't a typo. Someone plans a trip around it.

The problem is that this kind of data rots quietly. Laws moved constantly through 2025 and 2026: one US state opened its first dispensaries three weeks before I ran this audit, a Caribbean nation decriminalized this year, and one country famous for legalizing reversed course almost entirely. A dataset that was accurate when written is a dataset that is wrong now — you just don't know where.

So I gave an AI agent (Claude Code, in my case) a blunt instruction: go through every state and every country, compare each stored status against current law, and tell me what's wrong. No edits until I approve.

I expected a list of stale entries. I got that — plus two other categories of wrong I wasn't expecting, and they were both worse. I thought I was auditing facts. It took three kinds of wrong to see I was auditing a system.

Way #1 to be wrong: the facts moved

The boring, expected kind. Around a dozen entries where the law simply changed after the data was written.

Two details from this pile worth keeping:

  • Recency cuts both ways. The most out-of-date entry wasn't old — it described a legal regime that ended three weeks earlier.
  • Some "stale" data was never right. One US state was labeled decriminalized-only. It has had a medical program since 2013. That error sat on the page through every previous review, because nobody audits the entries that look plausible.

Fine. This is why you audit. If the story ended here it wouldn't be worth writing up.

Way #2 to be wrong: invented by a default

Around thirty small countries and territories — Pacific micro-states, Caribbean islands, a handful of French and Dutch territories — all showed the same status: Medical.

Almost none of them has any medical program. Several are among the strictest prohibition jurisdictions on Earth.

The tell was the uniformity. These entries had been bulk-generated at some point, and whoever — or whatever — generated them needed a value for places it knew nothing about. It picked "Medical." Plausible-sounding, middle-of-the-road, and wrong in the one direction that matters: permissive.

Think about the asymmetry. If you label a legal place "Illegal," the cost is a missed trip and maybe an annoyed email. If you label a strict-prohibition island "Medical," the cost is someone packing something they should never have packed. A default that errs permissive isn't a data bug, it's a liability generator.

The rule I took from this: defaults are policy. If a generator has to fill in an unknown, it must fill in the strictest value the schema allows — and ideally flag it as unverified. "Unknown" should never round to "kind of allowed."

Way #3 to be wrong: fixed, but the page never heard about it

This is the one that made the audit worth it.

The database had correct rows for five Caribbean nations — statuses corrected months ago in an earlier data pass. The live page still showed the old, wrong values for three of them. Not stale data. Correct data that never rendered.

The page builds its country list by merging a static fallback array with the database at runtime. The match was by URL slug or exact name:

const dbCountry = dbCountries.find(
  (db) => db.slug === staticCountry.slug ||
          db.name.toLowerCase() === staticCountry.name.toLowerCase()
);
Enter fullscreen mode Exit fullscreen mode

Now look at the actual rows:

Database says Static entry says Match?
Saint Lucia St Lucia
Antigua and Barbuda Antigua & Barbuda
Saint Vincent St Vincent & the Grenadines

"Saint" vs "St". "and" vs "&". Same countries, different spellings, zero matches. When the lookup missed, the code did the reasonable-looking thing — fell back to the static entry. No error. No log line. No way to notice unless you compared what was stored with what was displayed, country by country.

The database corrections had been shipped, reviewed, and celebrated. The page disagreed for months, silently.

The fix was two parts. First, normalize both sides before comparing — lowercase, strip accents, treat "&" as "and" and "St" as "Saint" — plus an alias map for the two names normalization can't bridge. Second, and more importantly, a test: run every database row against every static entry and print the pairs. Exactly the five broken pairs matched, and nothing else did. A join you haven't tested isn't a join, it's a hope.

The real disease: one fact, four copies

Digging into all this surfaced the structural problem underneath. The site held each country's legal status in four places:

  1. A static fallback array inside the page component
  2. The database (source of truth, in theory)
  3. A "rich profiles" data file used for detail rendering
  4. The site-search index, with its own copy of every status chip

Nobody ever decided "let's keep four copies." Each copy was added for a locally reasonable reason — a fallback for resilience, an index for speed, a profile file for richer fields. That's how it always happens. And with four copies and no reconciliation, drift isn't a risk. It's a schedule.

The audit method that actually worked, if you want to reproduce it:

  1. Dump every source. All four copies, into flat files.
  2. Simulate the merge. Compute what each entry displays, not what any store contains. The displayed value is the only value.
  3. Diff against verified current law. Web-verify each contested claim against primary sources — more on that below.
  4. Fix all copies in one pass, plus the join, plus the tests, and make any headline counts ("legal in N countries") derived from the data at build time, never hand-written. Hardcoded counts are just a fifth copy.

Don't let the auditor grade its own answer key

One more thing, because it's the part people skip: during verification, the agent's own recollection of the law was flat-out wrong at least once. It "remembered" that a certain country had enacted a medical bill. Web search showed the bill was introduced and never passed — the database's boring "illegal" was right, and the confident model memory was the error.

Which means the audit loop has to be: model proposes, primary sources dispose. An LLM auditing data against its own training memory isn't an audit. It's the same failure mode as the template default in Way #2 — confident filler — wearing a reviewer's badge.

What I'd carry to any dataset

  • Audit what renders, not what's stored. Simulate the merge. The user never sees your database.
  • Defaults are policy. Unknown must round to the strictest value, never the most plausible one.
  • Every copy of a fact is a liability. Count your copies. You have more than you think, and the search index is always one of them.
  • A fix without a render test isn't a fix. The Caribbean corrections were "done" for months. Done means displayed.
  • Verify against sources, not model memory — especially when the model is the one doing the verifying.

Total cost of the whole thing — audit, verification, code fixes, database corrections, tests, deploy — was one long session with the agent doing the dumps, diffs, and searches. The only genuinely human part was the judgment calls: what "medical" should even mean to a traveler. Everything else was exactly the kind of work agents should be doing: patient, exhaustive, and immune to the plausibility that let these errors sit unnoticed for months.

Top comments (0)