Japanese version: Zenn (Japanese)
Sequel to: I Wrote 82 Regex Replacements to Parse 6,933 Time Format Variations
The Setup
In the previous article, I had Claude Code build a parser for Japan's emergency contraception pharmacy dataset — free-text business hours, 6,933 formats, 82 regex replacements, 97% coverage.
The most important thing that came out of the project wasn't code. It was a design principle that Claude established and I approved:
Missing info > Wrong info.
If the parser can't handle an entry, show the raw text. Don't guess. For a tool that helps people find emergency medication, a wrong answer is worse than no answer.
Claude wrote this into the project's design docs. Claude followed it. And Claude used it to justify something neither of us caught at the time.
What Claude Did
The parser encountered data like this:
月-金:9:00-18:00(除く水曜)
"Monday to Friday 9:00-18:00 (excluding Wednesday)."
Claude's normalization pipeline stripped the parenthetical (除く水曜) and parsed the rest: Monday through Friday, 9 to 18. Wednesday included in the output.
From Claude's perspective, this was principled. The exclusion data was complex to parse, so it was dropped. The base schedule was preserved. Missing info > wrong info. Move on.
I reviewed the code. The logic seemed reasonable — parsing parenthetical exclusions reliably across dozens of formats is genuinely hard. I trusted the judgment and shipped it.
But here's the thing: the tool now actively displays Wednesday as a working day with hours 9:00-18:00. A user checks the schedule, sees Wednesday is listed, goes on Wednesday. The pharmacy is closed.
That's not "missing info." That's wrong info. Showing Mon-Fri including Wednesday when Wednesday is excluded is an incorrect schedule. Claude was generating the exact category of error the principle was supposed to prevent — and using the principle itself as justification.
How I Caught It
After shipping, I checked the tool on a Saturday afternoon. Search results: 50 pharmacies, most closed but all showing. If someone actually needs emergency contraception, they'd be scrolling through closed pharmacies one by one. So I had Claude build an "Open Now" filter one evening.
The filter made the error impossible to miss. A schedule grid showing Wednesday hours is easy to overlook. A filter declaring "Open" on a Wednesday when the pharmacy is closed on Wednesdays — that's a binary, definitive wrong answer.
Three gaps surfaced:
1. Closed-day data discarded. The (除く水曜) example. Claude's normalization stripped it, my principle gave it cover. This was the design-level error.
2. Holiday flag not wired in. Claude had already extracted holidayClosed: true from 日祝休み ("closed on holidays") but didn't connect it to the filter logic. Data existed, plumbing didn't. I didn't catch this during review either.
3. Cache ignoring date context. Same text, different correct answer on holidays vs regular days. The cache keyed on text alone. Another thing I should have caught.
Bugs 2 and 3 were implementation oversights — the kind of thing that happens in any codebase. Bug 1 was different. It was an AI applying a human's design principle in a way the human didn't intend, and the human trusting the AI enough to not notice.
The Fix: A Principle Claude Can't Misapply
The old principle had two categories: Missing and Wrong. The gap between them was big enough for Claude to drive a truck through. "I dropped information but kept the base structure" fits comfortably in "Missing" if you squint — and Claude squinted.
The revised principle:
Correct > Correct with caveat > Unknown > Wrong
Four levels. The critical addition is "Correct with caveat" — which means: if you have information you can't fully parse, don't throw it away. Show the base schedule and attach the unparseable part as a visible note.
Applied to the Wednesday problem:
| Level | What it looks like |
|---|---|
| Correct | Parse the exclusion. Remove Wednesday. Show Mon/Tue/Thu/Fri. |
| Correct with caveat | Can't parse the exclusion reliably, but show the schedule with a note: "※Closed Wednesdays" |
| Unknown | Can't parse any of it. Show raw text. |
| Wrong | Strip the exclusion. Show Mon-Fri as if Wednesday is normal. |
The old principle jumped from "can't parse it perfectly" straight to "drop it" — skipping the middle option entirely. The new principle forces that middle option to exist.
Implementation: a pre-normalization phase extracts closed-day info before the regex pipeline transforms the text. Clear exclusions get applied to the schedule. Ambiguous ones become visible notes (amber strip above the schedule grid). Nothing gets silently discarded.
The Broader Lesson About AI + Design Principles
Claude Code is good at following rules literally. Give it "missing > wrong" and it will optimize for that — aggressively, consistently, without second-guessing. That's the value of AI coding. It's also the risk.
A human developer, encountering (除く水曜), might have thought: "Wait, I'm showing Wednesday as open. Is that really 'missing' or is that 'wrong'?" Claude didn't have that hesitation. The principle said missing > wrong, the exclusion was hard to parse, so dropping it was the principled thing to do. Correct reasoning, wrong conclusion.
What I should have done:
- Written the principle more precisely. "Missing" should have been defined as "genuinely absent from the output," not "present in the input but dropped during processing."
- Reviewed the normalization pipeline output, not just the code. I read the code and it looked reasonable. I should have looked at specific examples of what the code was producing and asked: "Is this output correct?"
- Not trusted the AI's application of design principles without spot-checking. Code review for AI-generated code needs to include output review.
Numbers
| Metric | Original article | After fixes |
|---|---|---|
| Pharmacy coverage | 97.1% (9,659/9,951) | 98.2% (9,768/11,734) |
| Medical clinics | — | 88.3% (2,739/3,107) |
| Closed-day extraction | none | 5 patterns |
| "Open now" filter | none | holiday-aware, cache-safe |
Takeaways
AI follows principles literally. If your principle has a loophole, AI will find it — not maliciously, but because literal interpretation is what it does. Write principles that are hard to misapply.
"Missing" and "wrong" need a clear boundary. Data you had and threw away is not "missing." It's information loss that produces incorrect output. Adding "correct with caveat" as a middle tier forces the question: "Am I truly missing this info, or am I choosing to discard it?"
Review AI output, not just AI code. I reviewed the normalization logic and it was reasonable. I didn't look at what it produced for edge cases. The gap between "the code looks right" and "the output is right" is where AI errors hide.
Binary features are accidental audits. A schedule grid tolerates small inaccuracies because context is visible. A yes/no filter strips all context. If you want to find the holes in your AI-generated data pipeline, build a feature that makes binary decisions from its output.
odakin
/
mhlw-ec-pharmacy-finder
Emergency contraception pharmacy finder based on official MHLW data (Japan)
緊急避妊薬(アフターピル)販売可能な薬局検索
English version below / Jump to English
このリポジトリは、厚生労働省が公表している緊急避妊薬の薬局一覧(要指導医薬品販売)と医療機関一覧(対面診療・処方)を、 検索しやすい CSV / XLSX / JSON に整形し、さらに 静的Web検索(GitHub Pages) と LINE Botサンプル を添えたものです。
- 出典(公式ページ)
- 最新取り込みデータ時点: 薬局 2026-03-10 / 医療機関 2026-02-20
- 生成物
-
data/: 整形済みデータ(CSV/XLSX/JSON、原本XLSX、ジオコーディングキャッシュ) -
docs/: 静的Web検索(GitHub Pages用、地図・営業時間表示対応) -
line_bot/: LINE Bot(Node.js最小サンプル) -
scripts/update_data.py: 薬局データ更新スクリプト(公式XLSX取得) -
scripts/update_clinics.py: 医療機関データ更新スクリプト(公式PDF 47件パース) -
scripts/geocode.py: 住所→緯度経度変換(東大CSIS API、薬局+医療機関対応)
-
重要な注意(必ずお読みください)
- このリポジトリは医療アドバイスを提供しません。
- 実際の購入可否・在庫・営業時間・販売条件は、各薬局に確認してください。
- 公式ページでも、在庫等が変動しうるため 来局前に電話確認が推奨されています。 最終的な根拠は、上記の公式ページを最優先にしてください。
1) Web検索(GitHub Pages)
docs/ 配下は静的ファイルだけで動作します。
公開
- GitHub の Settings → Pages
- Source を「Deploy from a branch」
- Branch を
main/ Folder を/docsにして保存
公開後のURLは通常 https://<ユーザー名>.github.io/<リポジトリ名>/ になります。
例:リポジトリ名を mhlw-ec-pharmacy-finder にした場合 → https://odakin.github.io/mhlw-ec-pharmacy-finder/
ローカルで試す
cd docs
python -m http.server 8000
# http://localhost:8000 を開く
2) 整形済みデータ
data/mhlw_ec_pharmacies_cleaned_2026-03-25.xlsx-
data/mhlw_ec_pharmacies_cleaned_2026-03-25.csv(UTF-8 BOM) -
data/data_2026-03-25.json(Web/LINE Bot用)
追加した列(例):
-
市区町村_推定:住所文字列から市区町村相当を推定(完璧ではありません) -
電話番号_数字:ハイフン等を除去して通話リンクに使いやすくしたもの -
時間外の電話番号_数字:時間外の電話番号を同様に数字化したもの -
販売可能薬剤師数_女性/販売可能薬剤師数_男性/販売可能薬剤師数_答えたくない:公式一覧の「販売可能薬剤師・性別(人数)」
Web UI の絞り込み:
- 事前連絡「要」を除く
- 女性薬剤師がいる
- 個室あり
- 今対応可能(営業中 + 時間外対応可 + 不明を表示、確実に閉まっている施設を非表示)
Web UI の機能:
- 地図表示: Leaflet.js…
Previous article: I Wrote 82 Regex Replacements to Parse 6,933 Time Format Variations
Top comments (0)