Korean text is fundamentally different from English — morphemes attach to stems in ways that make simple tokenization useless. If you're building AI agents that process Korean text, here's how to do it properly.
Why Korean NLP Is Hard
Korean is an agglutinative language. The word "학교에서" means "from school" — but it's one morpheme cluster. Split it naively and you get garbage embeddings.
English NLP assumes space-separated words. Korean doesn't work that way.
Morpheme Analysis
import requests
API_KEY = "your-key"
# Analyze Korean text into morphemes
resp = requests.post("https://api.lazy-mac.com/korean-nlp/morphemes",
json={"text": "저는 오늘 학교에서 공부했습니다"},
headers={"Authorization": f"Bearer {API_KEY}"}
)
result = resp.json()
# {
# "morphemes": [
# {"token": "저", "pos": "NP", "meaning": "I/me"},
# {"token": "는", "pos": "JX", "meaning": "topic marker"},
# {"token": "오늘", "pos": "MAG", "meaning": "today"},
# {"token": "학교", "pos": "NNG", "meaning": "school"},
# {"token": "에서", "pos": "JKB", "meaning": "from/at"},
# {"token": "공부", "pos": "NNG", "meaning": "study"},
# {"token": "했습니다", "pos": "XSV+EP+EF", "meaning": "did (formal)"}
# ]
# }
Sentiment Analysis
Standard sentiment models trained on English fail on Korean formality markers. Korean has multiple politeness levels that change the emotional register of text.
resp = requests.post("https://api.lazy-mac.com/korean-nlp/sentiment",
json={
"text": "이 제품 정말 별로네요", # "This product is really not great"
"include_formality": True
},
headers={"Authorization": f"Bearer {API_KEY}"}
)
sentiment = resp.json()
# {
# "score": -0.72,
# "label": "negative",
# "formality": "polite",
# "confidence": 0.91
# }
Keyword Extraction for Korean Content
For SEO and content analysis, extract meaningful Korean keywords:
resp = requests.post("https://api.lazy-mac.com/korean-nlp/keywords",
json={
"text": long_article_text,
"top_n": 10,
"pos_filter": ["NNG", "NNP"] # Nouns only
},
headers={"Authorization": f"Bearer {API_KEY}"}
)
keywords = resp.json()["keywords"]
# [{"word": "인공지능", "frequency": 12, "tfidf_score": 0.89}, ...]
Named Entity Recognition
resp = requests.post("https://api.lazy-mac.com/korean-nlp/ner",
json={"text": "삼성전자 이재용 회장이 서울 본사에서 발표했습니다"},
headers={"Authorization": f"Bearer {API_KEY}"}
)
entities = resp.json()["entities"]
# [
# {"text": "삼성전자", "type": "ORG"},
# {"text": "이재용", "type": "PERSON"},
# {"text": "서울", "type": "LOC"}
# ]
Top comments (0)