DEV Community

Cover image for 【kanjiconv】A Python Conversion Library for Converting "Kanji" to "Kana/Romanized" Text, Supporting Proper Nouns
sea-turt1e
sea-turt1e

Posted on • Edited on • Originally published at zenn.dev

【kanjiconv】A Python Conversion Library for Converting "Kanji" to "Kana/Romanized" Text, Supporting Proper Nouns

Introduction

When processing Japanese text, there are many situations where you need to convert kanji into hiragana, katakana, or romanized text. For instance, when creating educational materials for Japanese learners or preprocessing for speech synthesis. Thus, I developed the Python library "kanjiconv," which allows easy retrieval of kanji readings and pronunciations.

"kanjiconv" is built on the morphological analysis engine SudachiPy and its dictionary SudachiDict. This allows for highly accurate readings, including proper nouns.

Previously, I created a similar library based on mecab-unidic-neologd. However, since neologd has stopped being updated, I developed kanjiconv as an alternative.

※ kanjiconv is licensed under the Apache License 2.0.

GitHub Repositry

GitHub Repositry of this library is here

Installation

Installing kanjiconv

First, install "kanjiconv" using pip:

pip install kanjiconv
Enter fullscreen mode Exit fullscreen mode

Usage

Import and Instance Generation

First, import the library and create an instance of KanjiConv. You can optionally specify a separator (default is "/").

# Import
from kanjiconv import KanjiConv
# Create an instance and specify the separator (e.g., '/')
kanji_conv = KanjiConv(separator="/")
Enter fullscreen mode Exit fullscreen mode

Convert Kanji to Hiragana

text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_hiragana(text))
# Output: ゆうゆうはくしょ/は/、/さいこう/の/まんが/です/。
Enter fullscreen mode Exit fullscreen mode

Convert Kanji to Katakana

print(kanji_conv.to_katakana(text))
# Output: ユウユウハクショ/ハ/、/サイコウ/ノ/マンガ/デス/。
Enter fullscreen mode Exit fullscreen mode

Convert Kanji to Romanized Text

print(kanji_conv.to_roman(text))
# Output: yuuyuuhakusho/ha/, /saikou/no/manga/desu/.
Enter fullscreen mode Exit fullscreen mode

Change the Separator

# Set the separator to '_'
kanji_conv = KanjiConv(separator="_")
print(kanji_conv.to_hiragana(text))
# Output: ゆうゆうはくしょ_は_、_さいこう_の_まんが_です_。

# Set the separator to none
kanji_conv = KanjiConv(separator="")
print(kanji_conv.to_hiragana(text))
# Output: ゆうゆうはくしょは、さいこうのまんがです。
Enter fullscreen mode Exit fullscreen mode

Conclusion

With "kanjiconv," you can easily convert kanji to hiragana, katakana, or romanized text. It's a handy tool for various applications, such as preprocessing Japanese text or supporting language learning. Give it a try!

License

  • kanjiconv: Apache License 2.0
  • SudachiPy: Apache License 2.0
  • SudachiDict: Apache License 2.0

Top comments (0)