A Japanese Kana Converter With Hepburn, Kunrei, and Nihon-shiki Romanization
Hiragana → Katakana is one Unicode offset (
+0x60). Kana → Romaji is a lookup table, but which table? Japan has three official romanization systems: Hepburn (what most foreigners see), Kunrei-shiki (taught in Japanese schools), and Nihon-shiki (historical, strictest). "shi" vs "si", "tsu" vs "tu", "chi" vs "ti" — they're all correct depending on which system you mean.
Japanese text conversion sounds trivial but opens a surprising set of questions about romanization standards, half-width katakana, and the one-to-many mapping problem of converting back from romaji.
🔗 Live demo: https://sen.ltd/portfolio/kana-converter/
📦 GitHub: https://github.com/sen-ltd/kana-converter
Features:
- Hiragana ↔ Katakana
- Hiragana / Katakana ↔ Romaji (3 systems)
- Half-width ↔ Full-width katakana
- Live conversion
- Swap direction button
- Japanese / English UI
- Zero dependencies, 73 tests
Hiragana to Katakana: one offset
Hiragana range: U+3041-U+3096. Katakana range: U+30A1-U+30F6. The difference: exactly 0x60.
export function hiraganaToKatakana(text) {
return [...text].map(c => {
const code = c.charCodeAt(0);
if (code >= 0x3041 && code <= 0x3096) {
return String.fromCharCode(code + 0x60);
}
return c;
}).join('');
}
Same conversion in reverse: - 0x60. あ (0x3042) + 0x60 = ア (0x30A2). The Unicode consortium aligned the two kana scripts intentionally to make this conversion trivial.
Three romanization systems
Kana-to-romaji needs a lookup table. The three major systems disagree on several characters:
| Kana | Hepburn | Kunrei | Nihon |
|---|---|---|---|
| し | shi | si | si |
| ち | chi | ti | ti |
| つ | tsu | tu | tu |
| ふ | fu | hu | hu |
| じ | ji | zi | zi |
| ぢ | ji | zi | di |
| づ | zu | zu | du |
| しゃ | sha | sya | sya |
Hepburn is what Japanese train stations use and what most English speakers see. Kunrei-shiki is what Japanese elementary schools teach — more phonetically consistent but less intuitive for English speakers. Nihon-shiki is the strictest, distinguishing homophones like じ and ぢ (both pronounced "ji") by the kana column they come from.
For the converter, each system gets its own lookup table:
const HEPBURN = { 'し': 'shi', 'ち': 'chi', 'つ': 'tsu', ... };
const KUNREI = { 'し': 'si', 'ち': 'ti', 'つ': 'tu', ... };
const NIHON = { 'し': 'si', 'ち': 'ti', 'つ': 'tu', 'ぢ': 'di', 'づ': 'du', ... };
ん before vowels
A subtle Hepburn rule: ん before a vowel or y is written as "n'" with an apostrophe to prevent ambiguity. 案内 is "an'nai", not "annai", and 反応 is "han'nō", not "hannō".
// Detect ん followed by あいうえお or やゆよ and insert apostrophe
result = result.replace(/n([あいうえおやゆよ])/g, "n'$1");
The regex runs on the hiragana source before conversion — that way it sees the actual ん character and can check the following character.
Half-width katakana
Half-width katakana (アイウエオ) lives in a different Unicode block: U+FF66-U+FF9F. They were introduced for 8-bit character sets in the 80s and are still used in some legacy systems (ATMs, older printers).
The quirk: dakuten and handakuten are separate characters in half-width. ガ is one char in full-width (U+30AC) but two chars in half-width: カ (U+FF76) + ゙ (U+FF9E).
const FULL_TO_HALF = {
'ア': 'ア', 'ガ': 'ガ', 'ザ': 'ザ', 'パ': 'パ', 'ヴ': 'ヴ', ...
};
So converting ガガ (2 characters) produces ガガ (4 characters). The string length doubles. The conversion is inherently not length-preserving.
Romaji to Hiragana: greedy matching
Going back from romaji requires greedy longest-match:
const TABLE = [
['shi', 'し'], ['chi', 'ち'], ['tsu', 'つ'],
['sha', 'しゃ'], ['shu', 'しゅ'], ['sho', 'しょ'],
['ka', 'か'], ['ki', 'き'], ...
];
export function romajiToHiragana(text) {
let result = '';
let i = 0;
while (i < text.length) {
let matched = false;
// Try longer keys first
for (const [rom, kana] of TABLE) {
if (text.slice(i, i + rom.length) === rom) {
result += kana;
i += rom.length;
matched = true;
break;
}
}
if (!matched) { result += text[i]; i++; }
}
return result;
}
The TABLE is sorted so longer keys come first. This ensures shi matches before s when they overlap. sha matches as a single digraph instead of s + ha.
Series
This is entry #84 in my 100+ public portfolio series.
- 📦 Repo: https://github.com/sen-ltd/kana-converter
- 🌐 Live: https://sen.ltd/portfolio/kana-converter/
- 🏢 Company: https://sen.ltd/

Top comments (0)