LexicRo Phase 1 is live — Romanian NLP API now callable

Peter Abolins — Sun, 10 May 2026 11:54:53 +0000

A few weeks ago I wrote about building the Romanian NLP API that should already exist. Today Phase 1 is live.

What's callable right now:

GET /conjugate/{verb} — full conjugation table across all moods and tenses, including perfect simplu and viitor I
GET /lookup/{word} — definitions from DEXonline (DEX '09, MDA2, DLRLC), HTML stripped, source attributed
GET /inflect/{word} — basic inflection info extracted from dictionary headers
POST /difficulty — word validation against standard Romanian dictionaries

Free tier: 1,000 requests/day, no credit card, no account required for anonymous use (10 req/day without a key).

Try it: https://api.lexicro.com/docs

The interactive Swagger UI is live — you can call every endpoint directly from the browser without writing any code.

What's next (Phase 2): Fine-tuning bert-base-romanian-cased-v1 for morphological analysis — the POST /analyze endpoint that returns lemma, POS, case, gender, number, person, and tense per token. That's the hard part. Building in public — feedback still very welcome.

→ Original announcement: https://dev.to/peterabolins/building-the-romanian-nlp-api-that-should-already-exist-2gg7
→ GitHub: https://github.com/LexicRo/lexicro

Building the Romanian NLP API that should already exist

Peter Abolins — Sat, 18 Apr 2026 11:47:45 +0000

If you've tried to do anything programmatic with Romanian text, you've probably hit the same wall I did.

There's no clean API for it. You end up scraping DEXonline, wrestling with incomplete library support, or calling a general-purpose LLM and hoping it gets the grammar right. None of that is good enough for production.

The specific gap

Given an arbitrary Romanian sentence, return for each token: its lemma, part of speech, grammatical case, number, gender, person, and tense.

This is what spaCy does for English, French, and German in a pip install. For Romanian, no equivalent exists as a callable REST API.

Romanian NLP tooling sits at roughly 15% of what exists for English. The academic resources are there — DEXonline (313k+ lemmas), RoLEX (330k morphosyntactic entries), the Universal Dependencies Romanian Treebank — they're just not packaged in a way developers can actually use.

But why not just use ChatGPT?

Fair question. The short answer: for production, an LLM is not infrastructure — it's an oracle. The accusative form of câine is always câinele, regardless of the LLM's mood that day. LexicRo returns deterministic, structured linguistic data by contract. Beyond that: LLM costs at scale are unpredictable; a dedicated conjugation endpoint responds in under 50ms vs 500ms–3s for an LLM call; and structured output from an LLM requires prompt engineering, validation, and retry logic. LexicRo returns clean JSON, every time.

What I'm building

LexicRo — an open-core, hosted API platform covering the endpoints Romanian developers actually need:

POST /analyze
→ lemma, POS, case, gender, number, person, tense per token

GET /conjugate/{verb}
→ full conjugation table — all moods and tenses including
  perfect simplu and viitor I (both tested at B1+)

GET /inflect/{word}
→ all inflected forms across cases, numbers, genders

GET /lookup/{word}
→ lexical data from DEXonline: definition, gender, plural, etymology

POST /difficulty
→ CEFR level scoring (A1–C2), calibrated to Romanian B1/B2 exams

Technical approach

Not starting from scratch — the data and models are there:

Base model: bert-base-romanian-cased-v1 fine-tuned for morphological tagging
Conjugation: verbecc Romanian XML templates, extended with full B1+ tense coverage
Lexical: DEXonline database dump + RoLEX dataset
Infrastructure: FastAPI, Docker, full OpenAPI spec, Python and JS SDKs

Licence and access

Code: MIT (self-hostable)
Model weights: CC BY-NC 4.0 (free for research/non-commercial, commercial use via hosted API)
Free tier: 1,000 req/day, no credit card, all endpoints
Paid tiers from €9/month for production use

Phase 1 ships first

The conjugation and lexical lookup endpoints are the straightforward part — wrapping verbecc and DEXonline cleanly. That's what ships first (~3 months). The morphological analyser (the hard part, requiring fine-tuned BERT) follows in phase 2.

What I'm looking for

I'm in pre-development and genuinely looking for:

Feedback on the endpoint design — does this cover what you'd actually need?
Early users working with Romanian text at any scale
Academic connections — pursuing EU grant funding (Horizon Europe, CEF Digital)
Anyone who's built adjacent to this — what did you learn?

Links: lexicro.com · github.com/LexicRo · contact@lexicro.com

Romanian deserves the same NLP infrastructure as French or German. Building it in public — feedback welcome.

DEV Community: Peter Abolins