LingoComm is a Telegram bot that auto-translates every group message into each member's preferred language — text and voice — powered by Lingo.dev SDK. Live on Render, built to solve a daily frustration.
The Frustration That Started This
I'm a college student in India. My friend circle is... linguistically chaotic.
Swayam types in Hindi. Sakura replies in Japanese. Erwin writes in English. Diego sends voice notes in Spanish. Everyone sort of understands each other, but when someone wants to say something with nuance — jokes, sarcasm, heartfelt stuff — they switch to their native language. That's when everyone else gets lost.
Our group chat had an unspoken ritual:
- Someone sends a message in their language
- Three people copy it
- Three people open Google Translate
- Three people paste, read, forget context, scroll back up
- Someone replies — 40 seconds late — and the conversation has already moved on
I watched this happen every single day. And one evening after losing track of a conversation for the third time in ten minutes, I thought: what if the group chat itself just... translated?
Not a translation app. Not a browser extension. Not "click to translate." Just — you send a message, and everyone reads it in their own language. Automatically. Instantly. In the same chat thread.
That's LingoComm.
What Does LingoComm Actually Do?
Text translation:
- Someone sends a message in any language
- Bot detects the language, looks up every group member's preferred locale
- Translates the message into all required languages using Lingo.dev SDK
- Posts translations as a threaded reply — clean, contextual, no clutter
Voice translation:
- Reply to any voice/audio message with
/analyze - Bot transcribes speech (Deepgram Nova-2), translates the transcript (Lingo.dev), and generates audio playback in your language (Google TTS WaveNet)
- Interactive buttons: 📝 Original transcript, 🌐 Translated text, 🔊 Listen
User identity:
- Each user sets their preferred language once (
/lang ja,/lang hi, etc.) - Preference follows them across every group the bot is in
- First-time users get auto-detected based on what they type
Production features:
- Per-user cooldown + burst rate limiting
- Code blocks and URLs preserved through translation (placeholder extraction)
- Retry logic with exponential backoff for Telegram API failures
- HTML-safe escaping for all outgoing messages
- Auto-deleting
/langmessages in groups (keeps chat clean)
Architecture: How a Message Flows
The full lifecycle:
- Telegraf receives a message event
- Input guards skip bots, commands, short messages, URL-only/emoji-only content
- Rate limiter applies cooldowns (500ms per-user, 10-message burst window)
- Code/URL preservation — code blocks and links get replaced with placeholders before translation
- Language detection — Unicode script analysis + keyword heuristics (more on this below)
- Target resolution — query MongoDB for all group members, collect distinct locales, exclude source
-
Fan-out translation —
batchLocalizeText()sends one API call to Lingo.dev for all target locales - Threaded reply — translations posted as a reply to the original message
Tech Stack
| Layer | Tool | Why |
|---|---|---|
| Bot framework | Telegraf 4.16 | Event-driven, clean middleware API for Telegram |
| Translation engine | Lingo.dev SDK |
batchLocalizeText() for one-to-many fan-out in a single call |
| Database | MongoDB + Mongoose | Durable user preferences + group membership |
| Speech-to-text | Deepgram Nova-2 | Fast, accurate auto-language-detection STT |
| Text-to-speech | Google Cloud TTS | WaveNet neural voices for natural playback |
| Deployment | Render | Single web service running bot + Express API |
The Hard Part: Language Detection Without External APIs
Here's a decision I made early: don't call an external API just to detect language.
Why? In a group chat, messages arrive fast. If every message triggers a detection API call before translation even starts, latency doubles. And if that detection call hangs (which happened during testing — more on that later), the entire pipeline stalls.
So I built a local detection engine. Two layers:
Layer 1: Unicode Script Analysis
Most non-Latin languages have distinct Unicode ranges. Japanese has Hiragana/Katakana, Korean has Hangul, Arabic has its own block. This is deterministic, instant, and never wrong:
function detectByUnicodeScript(text) {
if (/[\u3040-\u30FF]/u.test(text)) return "ja"; // Japanese
if (/[\uAC00-\uD7AF]/u.test(text)) return "ko"; // Korean
if (/[\u0600-\u06FF]/u.test(text)) return "ar"; // Arabic
if (/[\u0400-\u04FF]/u.test(text)) return "ru"; // Russian
if (/[\u0B00-\u0B7F]/u.test(text)) return "or"; // Odia
if (/[\u0900-\u097F]/u.test(text)) return "hi"; // Hindi (Devanagari)
if (/[\u0980-\u09FF]/u.test(text)) return "bn"; // Bengali
if (/[\u0B80-\u0BFF]/u.test(text)) return "ta"; // Tamil
if (/[\u0C00-\u0C7F]/u.test(text)) return "te"; // Telugu
if (/[\u4E00-\u9FFF]/u.test(text)) return "zh"; // Chinese
return null;
}
10 scripts covered. Zero API calls. Sub-millisecond.
Layer 2: Keyword Heuristics (Including Hinglish)
But what about romanized text? A Hindi speaker typing in Latin script — "aaj mujhe bahut neend aa rahi hai" — looks like English to a script detector.
This is Hinglish, and my friend circle uses it constantly. So I built a keyword scoring engine with 40+ common Hindi words in Latin script:
const hinglishMarkers = [
"namaste",
"kaise",
"kya",
"kyu",
"hain",
"hai",
"nahi",
"mera",
"meri",
"tum",
"aap",
"hum",
"mujhe",
"bahut",
"bohot",
"yaar",
"bhai",
"karna",
"chalo",
"jaldi",
"kal",
"aaj",
"abhi",
"phir",
"samjho",
];
const score = hinglishMarkers.reduce(
(acc, word) => (new RegExp(`\\b${word}\\b`, "i").test(text) ? acc + 1 : acc),
0,
);
if (score >= 2) return "hi";
If two or more Hinglish markers appear, the message is classified as Hindi. Same pattern extends to 8 other languages (Spanish, French, German, Portuguese, Italian, Turkish, Indonesian, Vietnamese) with their own keyword sets.
Why this matters: most translation bots just pass everything through a cloud detection API and hope for the best. This approach is faster, offline-capable for common cases, and I can tune it based on what my actual users type.
Lingo.dev: The Core Translation Engine
After detection, the message goes to Lingo.dev for actual translation. Here's the initialization:
import { LingoDotDevEngine } from "lingo.dev/sdk";
const lingo = new LingoDotDevEngine({
apiKey: process.env.LINGODOTDEV_API_KEY,
});
And the core fan-out function that makes everything work:
export async function translateToMany(text, sourceLocale, targetLocales) {
const filtered = targetLocales.filter((l) => l !== sourceLocale);
if (filtered.length === 0) return {};
// Attempt 1: batch (single API call — fastest)
try {
const result = await Promise.race([
lingo.batchLocalizeText(text, {
sourceLocale: sourceLocale,
targetLocales: filtered,
}),
new Promise((_, reject) =>
setTimeout(() => reject(new Error("batch timeout")), 15000),
),
]);
// ... map results to locales
} catch (err) {
// Attempt 2: parallel individual calls (fallback)
const settled = await Promise.allSettled(
filtered.map((targetLocale) =>
Promise.race([
lingo.localizeText(text, { sourceLocale, targetLocale }),
new Promise((_, reject) =>
setTimeout(() => reject(new Error("timeout")), 10000),
),
]),
),
);
// ... collect results, use original text as fallback for failures
}
}
Two things I want to highlight:
Batch-first, parallel-fallback.
batchLocalizeText()sends one HTTP request for all target languages. If that fails (network hiccup, timeout), the function falls back toPromise.allSettled()with individuallocalizeText()calls. The user always gets something back.Explicit
sourceLocalealways. Early in development, I tried passingnullto let the SDK auto-detect. It hung. No error, no timeout, just... waiting. Once I switched to always providing an explicit source locale from my local detection engine, the pipeline became rock-solid. This was the single biggest reliability fix in the project.
Why Lingo.dev Specifically?
-
batchLocalizeText()is exactly the API shape a group chat bot needs. One message → many languages → one API call. Most translation APIs don't offer this. - The JavaScript SDK was clean to integrate in an event-driven Node.js architecture. No wrappers, no adapters.
- Fast enough for real-time chat. Translation comes back before the user scrolls past the original message.
Voice Pipeline: Speech → Text → Translation → Audio
Voice is where this project got ambitious.
The flow: download the audio file → Deepgram transcribes it with automatic language detection → Lingo.dev translates the transcript to the user's preferred language → Google Cloud TTS generates a WaveNet voice in that language → bot sends an interactive response with three buttons.
Results are cached in-memory for 1 hour with automatic cleanup. TTS is optional — if Google credentials aren't configured, the bot still works for transcription and translation, just without the audio playback button.
Data Model: One User, Many Groups
const UserSchema = new mongoose.Schema(
{
telegramId: { type: Number, required: true, unique: true },
locale: { type: String, default: "en" },
manuallySet: { type: Boolean, default: false },
groups: [{ type: String }],
messageCount: { type: Number, default: 0 },
},
{ timestamps: true },
);
The key insight: identity is global, membership is list-based.
A user sets their language once. That preference follows them to every group. The groups[] array tracks which groups they're active in, so the bot knows who to translate for in each group.
manuallySet is important — if true, the bot respects the user's explicit choice and never auto-overrides it. If false, the bot can update the detected language as the user types more messages.
Real Demo: 3 Users, 3 Languages, Zero Friction
Setup:
-
Swayam →
/lang hi(Hindi) -
Sakura →
/lang ja(Japanese) -
Erwin →
/lang en(English)
What happens:
| Who sends | What they type | Sakura sees | Erwin sees | Swayam sees |
|---|---|---|---|---|
| Swayam | "aaj ka din bahut acha raha" | 🇯🇵 今日はとても良い一日でした | 🇬🇧 Today was a really good day | (original) |
| Sakura | "明日映画を見に行きましょう" | (original) | 🇬🇧 Let's go see a movie tomorrow | 🇮🇳 कल चलो फिल्म देखने चलते हैं |
| Erwin | "Sounds great, what time?" | 🇯🇵 いいですね、何時? | (original) | 🇮🇳 बढ़िया, कितने बजे? |
No copy-paste. No app switching. No delay. The conversation just... flows.
Things That Broke (and How I Fixed Them)
1. The Hanging Detection Bug
What happened: Early on, I let Lingo.dev SDK auto-detect the source language by passing sourceLocale: null. It worked... usually. But every few messages, the call would just hang. No error, no timeout, no response. The bot would silently stop translating until I restarted it.
What I learned: Passing null as sourceLocale triggered an internal detection path in the SDK that didn't have a timeout. My user-facing experience was a bot that randomly "went silent."
The fix: Built the local Unicode + keyword detection engine described above, and always pass an explicit sourceLocale to every Lingo.dev call. The pipeline hasn't hung once since.
2. Ghost Users in Group Translation
What happened: A user would set their language, join a new group, and... their translations wouldn't appear. Other group members saw translations, but this user was invisible to the fan-out.
Root cause: The user's groups[] array didn't include the new group ID. Their preference existed globally, but the translation target resolution (User.find({ groups: groupId })) didn't find them.
The fix: I strengthened group registration at every entry point — /start in a group, /lang in a group, handleNewMember() on join, and even on first message (auto-registration). Now $addToSet: { groups: groupId } runs at every touchpoint.
3. Google TTS Credentials on Cloud
What happened: TTS worked perfectly on my laptop (using GOOGLE_APPLICATION_CREDENTIALS pointing to a JSON file). Deployed to Render — instant crash. The credentials file doesn't exist on cloud.
The fix: Added GOOGLE_CREDENTIALS_BASE64 support. The JSON credentials get base64-encoded into an env var, decoded at startup. And I made TTS fully optional — if credentials are missing, the voice pipeline still transcribes and translates, just without the "Listen" button.
if (process.env.GOOGLE_CREDENTIALS_BASE64) {
const credentialsJson = Buffer.from(
process.env.GOOGLE_CREDENTIALS_BASE64,
"base64",
).toString("utf8");
ttsClient = new googleTextToSpeech.TextToSpeechClient({
credentials: JSON.parse(credentialsJson),
});
}
Production Hardening
Things I added that aren't glamorous but keep the bot alive:
- Retry wrapper with exponential backoff (300ms → 600ms → 1200ms) for Telegram API calls
- 15-second timeout on batch translation, 10-second timeout on individual calls — if Lingo.dev is slow, the bot doesn't freeze
- Burst rate limiting: max 10 messages per 5-second window per user — prevents translation spam
-
Code block preservation: regex extracts
`code`andblocksbefore translation, restores after. Code should never be "translated." - URL preservation: same pattern — URLs get placeholder-swapped so they survive translation intact
- HTML escaping on all outgoing text to prevent parse_mode injection
- Graceful degradation: if MongoDB is down → users get defaults. If TTS is down → voice still transcribes. If batch API fails → parallel fallback kicks in.
What I'd Do Differently
Start with the data model, not the bot commands. The
telegramId + locale + groups[]schema was the best early decision I made. Everything else plugged in cleanly because the data model was right.Don't trust external detection for real-time use cases. Building local detection felt like over-engineering at first. It turned out to be the single biggest performance and reliability win.
Translation UX > translation quality. Even a slightly less accurate translation, delivered instantly in-thread, is more useful than a perfect translation that requires copy-paste-wait.
Solve your own problem. I built this because I was annoyed every day. That frustration gave me better instincts about what mattered (speed, threading, preserving context) than any feature spec could.
What's Next
- Usage-based cost optimization (cache frequent translations)
- Admin controls for large communities (moderation, analytics dashboards)
- Smarter detection for code-mixed languages beyond Hinglish
- Richer voice interaction (auto-translate voice in real-time, not just on-demand)
- Optional Discord bridge
Links
- GitHub: LingoComm repository
- LingoComm Live Link: Click Here
- Lingo.dev: https://lingo.dev
- LinkedIn: Swayam Jethi
I built LingoComm because translation shouldn't be a task — it should be invisible. Lingo.dev made that possible.


Top comments (1)
Great idea