DEV Community

SEN LLC
SEN LLC

Posted on

I Built a Japanese Poetry Quiz and the Web Speech API Showed Me Its Teeth

A thousand-year-old poetry anthology, a thirty-year-old JavaScript API, and modern TypeScript — what could go wrong?

Answer: the historical Japanese spelling, the browser-specific voice availability, and the queue-that-never-clears behaviour of SpeechSynthesis. I built a tiny quiz around the 百人一首 (the One Hundred Poets, One Poem Each — a canonical anthology compiled in the 13th century), and writing it surfaced three things I wouldn't have guessed from the MDN page alone.

📦 GitHub: https://github.com/sen-ltd/hyakunin-isshu
🔗 Demo: https://sen.ltd/portfolio/hyakunin-isshu/

Hyakunin-isshu quiz UI. Minchō-font display of the kami (opening phrase), hiragana reading, read-aloud button, four shimo (closing phrase) choices below.

What it does:

  • Shows the kami (first 5-7-5 phrase) of a waka poem. You pick the correct shimo (closing 7-7) from four options.
  • Alternative mode: "who wrote this?" — the choices become four poet names.
  • Optional read-aloud (kami only, or kami + shimo after answering) via the browser's Web Speech API, with a ja-JP voice.
  • 30 of the 100 poems curated as an MVP — everything else is a pure-data extension of src/data/poems.ts.
  • Vue 3 + Vite + TypeScript, no runtime deps beyond Vue.

The rest of this post is about the three walls I hit.

Wall 1: "as written" and "what the TTS wants" are different strings

Classical Japanese poetry uses historical kana that most modern TTS engines handle inconsistently. Take Shikishi Naishinnō's famous poem:

玉の緒よ絶えなばたえねながらへば
Enter fullscreen mode Exit fullscreen mode

The spelling ながらへば ends in -he-ba in modern romanization, but the pronunciation is -e-ba. What happens when you feed the literal string to SpeechSynthesis?

  • Chrome + Google Japanese voice: reads it literally — na-ga-ra-he-ba
  • macOS Safari + Kyoko: silently normalizes to na-ga-ra-e-ba

Both correct according to their own logic, neither right for the user.

The fix is to store two strings per poem: one for display, one for speech.

{
  kami: '玉の緒よ絶えなばたえねながらへば',          // display (historical kana preserved)
  kamiYomi: 'たまのおよ たえなばたえね ながらえば',  // speech (modern kana, spaced)
}
Enter fullscreen mode Exit fullscreen mode

kami goes into the DOM where a minchō-font serif shows off the kanji + kana mix. kamiYomi goes into SpeechSynthesisUtterance.text where what matters is that every engine gets the same phonemes. Splitting on spaces in kamiYomi also gives the synthesizer slightly more natural phrasing than one long run-on string.

This isn't just a poetry problem. Any JP historical content — Edo-period literature, pre-war newspapers, Heian-era place names — will hit the same fork.

Wall 2: voice enumeration is async and some browsers never finish

You'd think window.speechSynthesis.getVoices() returns voices. On Chrome + macOS it does, on first call, instantly. On most other combinations it returns an empty array and you have to wait for the voiceschanged event.

export function loadJaVoices(): Promise<SpeechSynthesisVoice[]> {
  if (!('speechSynthesis' in window)) return Promise.resolve([]);
  return new Promise((resolve) => {
    const pick = () =>
      window.speechSynthesis.getVoices().filter((v) => v.lang.startsWith('ja'));

    const first = pick();
    if (first.length > 0) {
      resolve(first);
      return;
    }
    const onChange = () => {
      window.speechSynthesis.removeEventListener('voiceschanged', onChange);
      resolve(pick());
    };
    window.speechSynthesis.addEventListener('voiceschanged', onChange);

    // Some browsers (Firefox in certain OS combos) never fire the event.
    // Time out and resolve empty so the UI doesn't wait forever.
    setTimeout(() => {
      window.speechSynthesis.removeEventListener('voiceschanged', onChange);
      resolve(pick());
    }, 1500);
  });
}
Enter fullscreen mode Exit fullscreen mode

The 1.5-second safety timeout is load-bearing. Without it, a user on Firefox + Linux will see a permanently disabled read-aloud button because voiceschanged simply never fires. I spent an embarrassing amount of time debugging this one in a VM before I realized the API just goes silent.

Feature detection goes a step further — I disable the button up front if the API isn't present, and change the label to (音声非対応) ("voice unsupported") so the user understands it's not their fault:

<button class="speak-btn" :disabled="!speechSupported" @click="speakKami">
  {{ speechSupported ? '♪ Listen to kami' : '(Voice unsupported)' }}
</button>
Enter fullscreen mode Exit fullscreen mode

Wall 3: speak() queues. cancel() is mandatory.

Press the read-aloud button twice in Chrome and the second press doesn't replay — it enqueues a second utterance behind the first. The user will wait through the entire first read, then hear the second one. This is technically correct but unambiguously wrong UX for a quiz.

The fix is boilerplate: cancel before speaking.

export async function speak(text: string, options: SpeakOptions = {}) {
  if (!isSupported()) throw new Error('Web Speech API is not supported.');
  window.speechSynthesis.cancel();   // ← clear anything queued

  const utter = new SpeechSynthesisUtterance(text);
  utter.lang = options.lang ?? 'ja-JP';
  utter.rate = options.rate ?? 0.85;

  return new Promise<void>((resolve, reject) => {
    utter.onend = () => resolve();
    utter.onerror = (e) => reject(new Error(e.error ?? 'speech error'));
    window.speechSynthesis.speak(utter);
  });
}
Enter fullscreen mode Exit fullscreen mode

And a second cancel() when advancing to the next question — otherwise a slow reader who clicks next mid-utterance keeps hearing the previous poem while the new one is on screen.

function nextPoem() {
  selected.value = null;
  cancelAll();            // ← prevent stale read-alouds
  current.value = pickQuestion(POEMS, seen.value);
}
Enter fullscreen mode Exit fullscreen mode

rate: 0.85 is a light slowdown for waka — default speed runs the syllables together in a way that makes the 5-7-5 cadence invisible.

Testable quiz logic: pull it out of Vue

Vue single-file components are nice for markup but painful to unit-test without @vue/test-utils. The quiz logic (pick a target poem, generate distractor choices, shuffle) is pure TypeScript, so I kept it in src/quiz.ts separately:

export function pickQuestion(
  poems: Poem[],
  seen: Set<number>,
  rng: () => number = Math.random,
): Question | null {
  const available = poems.filter((p) => !seen.has(p.number));
  if (available.length === 0) return null;

  const target = pickOne(available, rng);
  const distractors = sampleWithout(poems, target.number, 3, rng);
  const choices = shuffle([target.shimo, ...distractors.map((d) => d.shimo)], rng);
  return { poem: target, choices, correctIndex: choices.indexOf(target.shimo) };
}
Enter fullscreen mode Exit fullscreen mode

Passing rng in means tests can inject a seeded PRNG for determinism:

function seeded(seed: number): () => number {
  let s = seed;
  return () => {
    s = (s * 1664525 + 1013904223) & 0xffffffff;
    return (s >>> 0) / 0x100000000;
  };
}

it('spreads the correct answer across every index', () => {
  const counts = [0, 0, 0, 0];
  for (let seed = 0; seed < 400; seed++) {
    const q = pickQuestion(POEMS, new Set(), seeded(seed))!;
    counts[q.correctIndex]!++;
  }
  for (const c of counts) expect(c).toBeGreaterThan(50);
});
Enter fullscreen mode Exit fullscreen mode

This exact test caught a Fisher-Yates bug early on where the shuffle was biased toward the last slot. If you're shipping any "four random choices, one correct" UX, keep this test in your pocket.

Data trivia: classical Japanese has formulaic closings

While curating the 30 poems I noticed a subtle data problem: multiple poems end with near-identical phrases. Teiji Tennō (#1) ends with わが衣手は露にぬれつつ ("my sleeves are wet with dew"). Kōkō Tennō (#15) ends with わが衣手に雪は降りつつ ("snow falls on my sleeves"). Both start with わが衣手 (waga-koromode, "my sleeves") and end with -tsutsu.

If both appear in the same four-choice set, the quiz gets genuinely hard — arguably too hard, because guessing depends on snow-vs-dew detail more than poet skill. The next iteration should penalize distractors that are edit-distance-close to the correct shimo when building choices, not just ban exact duplicates. For now the dataset test enforces uniqueness-by-string, which is necessary but not sufficient.

Takeaways

  1. For historical Japanese content, carry two strings per entry: the one humans see, and the one TTS hears.
  2. speechSynthesis.getVoices() is async-with-escape-hatches. Always wrap voiceschanged in a promise with a timeout.
  3. speechSynthesis.cancel() before every speak(), and again before any UI navigation, or you'll queue utterances for free.
  4. Keep quiz picking logic out of your component tree. A pure function + a seeded RNG is cheap, fast, and correctness-testable.
  5. "Near-duplicate" data in your content is a distinct problem from "exact-duplicate" data. Edit distance, not string equality.

Repository: https://github.com/sen-ltd/hyakunin-isshu

Top comments (0)