DEV Community

SEN LLC
SEN LLC

Posted on

A Japanese Kana Converter With Hepburn, Kunrei, and Nihon-shiki Romanization

A Japanese Kana Converter With Hepburn, Kunrei, and Nihon-shiki Romanization

Hiragana → Katakana is one Unicode offset (+0x60). Kana → Romaji is a lookup table, but which table? Japan has three official romanization systems: Hepburn (what most foreigners see), Kunrei-shiki (taught in Japanese schools), and Nihon-shiki (historical, strictest). "shi" vs "si", "tsu" vs "tu", "chi" vs "ti" — they're all correct depending on which system you mean.

Japanese text conversion sounds trivial but opens a surprising set of questions about romanization standards, half-width katakana, and the one-to-many mapping problem of converting back from romaji.

🔗 Live demo: https://sen.ltd/portfolio/kana-converter/
📦 GitHub: https://github.com/sen-ltd/kana-converter

Screenshot

Features:

  • Hiragana ↔ Katakana
  • Hiragana / Katakana ↔ Romaji (3 systems)
  • Half-width ↔ Full-width katakana
  • Live conversion
  • Swap direction button
  • Japanese / English UI
  • Zero dependencies, 73 tests

Hiragana to Katakana: one offset

Hiragana range: U+3041-U+3096. Katakana range: U+30A1-U+30F6. The difference: exactly 0x60.

export function hiraganaToKatakana(text) {
  return [...text].map(c => {
    const code = c.charCodeAt(0);
    if (code >= 0x3041 && code <= 0x3096) {
      return String.fromCharCode(code + 0x60);
    }
    return c;
  }).join('');
}
Enter fullscreen mode Exit fullscreen mode

Same conversion in reverse: - 0x60. あ (0x3042) + 0x60 = ア (0x30A2). The Unicode consortium aligned the two kana scripts intentionally to make this conversion trivial.

Three romanization systems

Kana-to-romaji needs a lookup table. The three major systems disagree on several characters:

Kana Hepburn Kunrei Nihon
shi si si
chi ti ti
tsu tu tu
fu hu hu
ji zi zi
ji zi di
zu zu du
しゃ sha sya sya

Hepburn is what Japanese train stations use and what most English speakers see. Kunrei-shiki is what Japanese elementary schools teach — more phonetically consistent but less intuitive for English speakers. Nihon-shiki is the strictest, distinguishing homophones like じ and ぢ (both pronounced "ji") by the kana column they come from.

For the converter, each system gets its own lookup table:

const HEPBURN = { '': 'shi', '': 'chi', '': 'tsu', ... };
const KUNREI  = { '': 'si',  '': 'ti',  '': 'tu',  ... };
const NIHON   = { '': 'si',  '': 'ti',  '': 'tu',  '': 'di', '': 'du', ... };
Enter fullscreen mode Exit fullscreen mode

ん before vowels

A subtle Hepburn rule: ん before a vowel or y is written as "n'" with an apostrophe to prevent ambiguity. 案内 is "an'nai", not "annai", and 反応 is "han'nō", not "hannō".

// Detect ん followed by あいうえお or やゆよ and insert apostrophe
result = result.replace(/n([あいうえおやゆよ])/g, "n'$1");
Enter fullscreen mode Exit fullscreen mode

The regex runs on the hiragana source before conversion — that way it sees the actual ん character and can check the following character.

Half-width katakana

Half-width katakana (アイウエオ) lives in a different Unicode block: U+FF66-U+FF9F. They were introduced for 8-bit character sets in the 80s and are still used in some legacy systems (ATMs, older printers).

The quirk: dakuten and handakuten are separate characters in half-width. ガ is one char in full-width (U+30AC) but two chars in half-width: カ (U+FF76) + ゙ (U+FF9E).

const FULL_TO_HALF = {
  '': '', '': 'ガ', '': 'ザ', '': 'パ', '': 'ヴ', ...
};
Enter fullscreen mode Exit fullscreen mode

So converting ガガ (2 characters) produces ガガ (4 characters). The string length doubles. The conversion is inherently not length-preserving.

Romaji to Hiragana: greedy matching

Going back from romaji requires greedy longest-match:

const TABLE = [
  ['shi', ''], ['chi', ''], ['tsu', ''],
  ['sha', 'しゃ'], ['shu', 'しゅ'], ['sho', 'しょ'],
  ['ka', ''], ['ki', ''], ...
];

export function romajiToHiragana(text) {
  let result = '';
  let i = 0;
  while (i < text.length) {
    let matched = false;
    // Try longer keys first
    for (const [rom, kana] of TABLE) {
      if (text.slice(i, i + rom.length) === rom) {
        result += kana;
        i += rom.length;
        matched = true;
        break;
      }
    }
    if (!matched) { result += text[i]; i++; }
  }
  return result;
}
Enter fullscreen mode Exit fullscreen mode

The TABLE is sorted so longer keys come first. This ensures shi matches before s when they overlap. sha matches as a single digraph instead of s + ha.

Series

This is entry #84 in my 100+ public portfolio series.

Top comments (0)