Introduction
When processing Japanese text, there are many situations where you need to convert kanji into hiragana, katakana, or romanized text. For instance, when creating educational materials for Japanese learners or preprocessing for speech synthesis. Thus, I developed the Python library "kanjiconv," which allows easy retrieval of kanji readings and pronunciations.
"kanjiconv" is built on the morphological analysis engine SudachiPy and its dictionary SudachiDict. This allows for highly accurate readings, including proper nouns.
Previously, I created a similar library based on mecab-unidic-neologd. However, since neologd has stopped being updated, I developed kanjiconv as an alternative.
※ kanjiconv is licensed under the Apache License 2.0.
GitHub Repositry
GitHub Repositry of this library is here
Installation
Installing kanjiconv
First, install "kanjiconv" using pip:
pip install kanjiconv
Usage
Import and Instance Generation
First, import the library and create an instance of KanjiConv. You can optionally specify a separator (default is "/").
# Import
from kanjiconv import KanjiConv
# Create an instance and specify the separator (e.g., '/')
kanji_conv = KanjiConv(separator="/")
Convert Kanji to Hiragana
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_hiragana(text))
# Output: ゆうゆうはくしょ/は/、/さいこう/の/まんが/です/。
Convert Kanji to Katakana
print(kanji_conv.to_katakana(text))
# Output: ユウユウハクショ/ハ/、/サイコウ/ノ/マンガ/デス/。
Convert Kanji to Romanized Text
print(kanji_conv.to_roman(text))
# Output: yuuyuuhakusho/ha/, /saikou/no/manga/desu/.
Change the Separator
# Set the separator to '_'
kanji_conv = KanjiConv(separator="_")
print(kanji_conv.to_hiragana(text))
# Output: ゆうゆうはくしょ_は_、_さいこう_の_まんが_です_。
# Set the separator to none
kanji_conv = KanjiConv(separator="")
print(kanji_conv.to_hiragana(text))
# Output: ゆうゆうはくしょは、さいこうのまんがです。
Conclusion
With "kanjiconv," you can easily convert kanji to hiragana, katakana, or romanized text. It's a handy tool for various applications, such as preprocessing Japanese text or supporting language learning. Give it a try!
License
- kanjiconv: Apache License 2.0
- SudachiPy: Apache License 2.0
- SudachiDict: Apache License 2.0
Top comments (0)