TL;DR. Every open-source LMS treats internationalization as one problem. It's three. UI strings, user-generated content, and canonical artifacts each need a different mechanism. Most codebases collapse them into one — that's the bug.
We've been building an open-source Bible school LMS for about 5 months. It runs in Russian and English (Ukrainian coming), with auto-translated user content and canonical-text preservation for scripture quotes. Building this forced me to look at how Moodle, Open edX, Canvas LMS, and Chamilo handle the same problem.
The pattern is the same in all of them — and was the same in ours when we started.
Mistake #1: User-generated content treated like UI strings
Every LMS has solid gettext-style infrastructure for UI strings. "Sign in", "Course catalog", "Submit assignment" — these live in .po files (Moodle), Transifex (Open edX), i18n-js (Canvas), YAML (Rails-based). A translator translates the file once. Done.
User-generated content is a different problem entirely. When a teacher authors a course in Russian, the title — "Введение в Послание к Римлянам" — is a row in courses.title. An English-speaking student opens the catalog and sees Cyrillic. The UI is translated. The content isn't.
| UI strings | User-generated content | |
|---|---|---|
| Examples | "Sign in", "Submit" | Course title, lesson body, quiz question |
| Source | Fixed catalog of values | Unbounded user input |
| Tool | gettext / Transifex / YAML | Runtime translation (Google, DeepL, Gemini) |
| When | Build / release time | Runtime (lazy or eager) |
| Storage | Locale file | Separate cache table |
| Common bug | None major | Treated like UI strings → only one language ever shown |
Moodle's workaround is the multilang filter — you wrap content in <span lang="ru">…</span><span lang="en">…</span> and a filter shows the matching one. This is a 2008 solution. It puts the entire burden on the teacher: they must author every piece of content twice, in every language. Most don't, and the platform falls back to "show whatever the teacher wrote first."
The shape of the right answer is a separate translation cache:
CREATE TABLE content_translations (
entity_type TEXT NOT NULL, -- 'course', 'lesson', 'quiz_question'
entity_id UUID NOT NULL,
field TEXT NOT NULL, -- 'title', 'description', 'body'
locale TEXT NOT NULL, -- 'ru', 'en', 'uk', 'es'
content TEXT NOT NULL,
source TEXT NOT NULL, -- 'human' | 'machine' | 'canonical'
cached_at TIMESTAMPTZ NOT NULL,
PRIMARY KEY (entity_type, entity_id, field, locale)
);
Teacher authors in one language. A translation worker fills in the others lazily (first request) or eagerly (on publish). The course-detail endpoint joins on (entity_type, entity_id, field, viewer_locale).
The architectural decision: content is not a UI string and cannot live in the same system. If your LMS uses gettext for "Submit assignment" and the same mechanism for course titles, that's the bug.
Mistake #2: No accommodation for length variance
English is dense. Russian runs ~25–30% longer for the same meaning. German is comparable. Finnish is worse. Arabic is shorter — and right-to-left, its own category.
Most LMS UIs are designed in English. Buttons that fit "Save" don't fit "Сохранить". Nav tabs that fit "Courses" wrap to two lines for "Курсы и обучение". Mobile breaks first.
Part of the fix is CSS:
/* Reserve space against the longer-language baseline */
.action-button {
min-width: 8rem;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
/* Tabular content: don't let translation reflow the grid */
.lesson-list-cell {
min-height: 4.5rem; /* fits 2-line Russian titles without shift */
}
But the real fix is testing every screen in your longest-content language, not just English. Most teams hire designers who only see the English mock and don't realize their "Continue" button is broken in Russian until a user reports it.
What helped us: a Storybook locale switcher defaulting to Russian (not English), and a Playwright snapshot suite that screenshots both locales. The first commit that breaks Russian layout is caught in CI, not by a user.
Mistake #3: Canonical content forced through translation
This is the bug that motivated our entire content-translation rewrite.
A teacher in Russian writes: "Послание к Римлянам 8:28 говорит, что все содействует ко благу". The course gets auto-translated to English. Naively you send the whole string to Google Translate or DeepL, and you get back: "Romans 8:28 says that everything works together for good." That's almost a real Bible verse — but it isn't quoting any actual translation. It's a translation of a translation.
You don't want that. You want the actual KJV (or NIV, or ESV) text of Romans 8:28 spliced in.
Same problem exists everywhere there's canonical content:
- Programming courses with code samples (don't translate the variables!)
- Math curricula with formulas
- Legal courses citing statutes
- Medical courses citing clinical-trial registries
- Literature courses with original-language quotes
Most LMSes don't separate "translatable prose" from "canonical artifact" — so when they auto-translate, the canonical content gets mangled or invented.
Our pattern: parse the source for canonical references, replace each with a placeholder token, translate the surrounding prose, then substitute the canonical text back from a separate lookup.
def translate_with_canonical_preservation(
text: str, source_lang: str, target_lang: str
) -> str:
# 1. Find canonical references in either language
refs = extract_bible_refs(text, lang=source_lang)
# [{"raw": "Послание к Римлянам 8:28", "book": "ROM", "chapter": 8, "verses": [28]}]
# 2. Replace each with a unique placeholder
placeholders = {}
for i, ref in enumerate(refs):
token = f"⟦CANON_{i}⟧"
placeholders[token] = ref
text = text.replace(ref["raw"], token, 1)
# 3. Translate the token-bearing string
translated = translate(text, source_lang, target_lang)
# 4. Substitute the canonical text back, looked up in the target translation
for token, ref in placeholders.items():
canonical_text = lookup_canonical(ref, target_lang) # KJV for en, Synodal for ru
translated = translated.replace(token, canonical_text)
return translated
How extract_bible_refs handles the cross-language book-name matrix
Detection is a regex over a normalized book-name dictionary. Each canonical book has an entry like:
BOOKS = {
"ROM": {
"en": ["Romans", "Rom", "Rom."],
"ru": ["Послание к Римлянам", "Римлянам", "Рим", "Рим."],
"uk": ["Послання до Римлян", "Римлян", "Рим"],
},
# ... 65 more books
}
The regex is built per request as (book_alias_1|book_alias_2|...)\s+(\d+):(\d+)(?:[-–](\d+))? so it accepts: Romans 8:28, Послание к Римлянам 8:28, Рим 8:28, Рим. 8:28, Romans 8:28-29, Romans 8:28a.
Sundry edge cases: chapter-only refs (Romans 8 — whole-chapter), letter suffixes (8:28a — first half of verse), em-dash vs hyphen, non-breaking spaces. Each lives in unit tests.
lookup_canonical pulls from a canonical text table keyed by (book, chapter, verse_start, verse_end, translation). Cache the final translated string keyed by (entity_id, field, locale) per Mistake #1.
This is ~400 lines we wouldn't have written if any of the LMSes we looked at had solved this for us.
What I want to hear back
These are the three patterns I keep seeing. They're not the only ones (cache invalidation on edits, RTL layout, Slavic plural forms — separate posts), but they're the ones every general-purpose LMS skips.
If you've shipped i18n in an LMS, education platform, or any content product:
- Do you separate UI strings from content? What's your storage shape?
- How do you handle length variance? Do you test the long-language layout in CI?
- Do you have canonical content that mustn't be translated? Code samples, equations, citations? How are you handling it?
Curious where teams have landed. The patterns above are our current best, not our last word.
The project that drove all of this is open source:
ArVaViT
/
equip
Free, open-source LMS for Bible schools, ministries, and nonprofit educational programs. React + FastAPI + Supabase.
Equip
A free, open-source learning management system built for Bible schools church ministries, and nonprofit educational programs
Live demo · Roadmap · Contributing · Changelog
Why this project?
Hundreds of small Bible schools, home churches, and missionary training programs around the world still manage courses on paper, WhatsApp, or spreadsheets. Commercial LMS platforms are expensive, overkill, or require technical expertise that volunteer-run organizations simply don't have.
Equip is designed to change that:
- Free forever — MIT-licensed, no paywalls, no "premium" tiers.
- Simple to deploy — one-click Vercel deploy with a free Supabase database. No Docker, no servers to manage.
- Built for small scale — optimized for 20-100 students, not enterprise pricing models.
- Contributor-friendly — clear docs, conventional commits, issue templates, and a welcoming community.
Features
| Area | What you get |
|---|---|
| Course authoring | Courses, modules, chapters, rich content blocks (TipTap editor with images, YouTube, callouts, audio) |
| Assessments | Multiple-choice, true/false, short-answer, and essay |
Top comments (0)