Swayam Jethi

Posted on Feb 23

I Built a Real-Time Multilingual Telegram Bot with Lingo.dev

#telegraf #javascript #node #learning

LingoComm is a Telegram bot that auto-translates every group message into each member's preferred language — text and voice — powered by Lingo.dev SDK. Live on Render, built to solve a daily frustration.

The Frustration That Started This

I'm a college student in India. My friend circle is... linguistically chaotic.

Swayam types in Hindi. Sakura replies in Japanese. Erwin writes in English. Diego sends voice notes in Spanish. Everyone sort of understands each other, but when someone wants to say something with nuance — jokes, sarcasm, heartfelt stuff — they switch to their native language. That's when everyone else gets lost.

Our group chat had an unspoken ritual:

Someone sends a message in their language
Three people copy it
Three people open Google Translate
Three people paste, read, forget context, scroll back up
Someone replies — 40 seconds late — and the conversation has already moved on

I watched this happen every single day. And one evening after losing track of a conversation for the third time in ten minutes, I thought: what if the group chat itself just... translated?

Not a translation app. Not a browser extension. Not "click to translate." Just — you send a message, and everyone reads it in their own language. Automatically. Instantly. In the same chat thread.

That's LingoComm.

What Does LingoComm Actually Do?

Text translation:

Someone sends a message in any language
Bot detects the language, looks up every group member's preferred locale
Translates the message into all required languages using Lingo.dev SDK
Posts translations as a threaded reply — clean, contextual, no clutter

Voice translation:

Reply to any voice/audio message with /analyze
Bot transcribes speech (Deepgram Nova-2), translates the transcript (Lingo.dev), and generates audio playback in your language (Google TTS WaveNet)
Interactive buttons: 📝 Original transcript, 🌐 Translated text, 🔊 Listen

User identity:

Each user sets their preferred language once (/lang ja, /lang hi, etc.)
Preference follows them across every group the bot is in
First-time users get auto-detected based on what they type

Production features:

Per-user cooldown + burst rate limiting
Code blocks and URLs preserved through translation (placeholder extraction)
Retry logic with exponential backoff for Telegram API failures
HTML-safe escaping for all outgoing messages
Auto-deleting /lang messages in groups (keeps chat clean)

Architecture: How a Message Flows

The full lifecycle:

Telegraf receives a message event
Input guards skip bots, commands, short messages, URL-only/emoji-only content
Rate limiter applies cooldowns (500ms per-user, 10-message burst window)
Code/URL preservation — code blocks and links get replaced with placeholders before translation
Language detection — Unicode script analysis + keyword heuristics (more on this below)
Target resolution — query MongoDB for all group members, collect distinct locales, exclude source
Fan-out translation — batchLocalizeText() sends one API call to Lingo.dev for all target locales
Threaded reply — translations posted as a reply to the original message

Tech Stack

Layer	Tool	Why
Bot framework	Telegraf 4.16	Event-driven, clean middleware API for Telegram
Translation engine	Lingo.dev SDK	`batchLocalizeText()` for one-to-many fan-out in a single call
Database	MongoDB + Mongoose	Durable user preferences + group membership
Speech-to-text	Deepgram Nova-2	Fast, accurate auto-language-detection STT
Text-to-speech	Google Cloud TTS	WaveNet neural voices for natural playback
Deployment	Render	Single web service running bot + Express API

The Hard Part: Language Detection Without External APIs

Here's a decision I made early: don't call an external API just to detect language.

Why? In a group chat, messages arrive fast. If every message triggers a detection API call before translation even starts, latency doubles. And if that detection call hangs (which happened during testing — more on that later), the entire pipeline stalls.

So I built a local detection engine. Two layers:

Layer 1: Unicode Script Analysis

Most non-Latin languages have distinct Unicode ranges. Japanese has Hiragana/Katakana, Korean has Hangul, Arabic has its own block. This is deterministic, instant, and never wrong:

function detectByUnicodeScript(text) {
  if (/[\u3040-\u30FF]/u.test(text)) return "ja"; // Japanese
  if (/[\uAC00-\uD7AF]/u.test(text)) return "ko"; // Korean
  if (/[\u0600-\u06FF]/u.test(text)) return "ar"; // Arabic
  if (/[\u0400-\u04FF]/u.test(text)) return "ru"; // Russian
  if (/[\u0B00-\u0B7F]/u.test(text)) return "or"; // Odia
  if (/[\u0900-\u097F]/u.test(text)) return "hi"; // Hindi (Devanagari)
  if (/[\u0980-\u09FF]/u.test(text)) return "bn"; // Bengali
  if (/[\u0B80-\u0BFF]/u.test(text)) return "ta"; // Tamil
  if (/[\u0C00-\u0C7F]/u.test(text)) return "te"; // Telugu
  if (/[\u4E00-\u9FFF]/u.test(text)) return "zh"; // Chinese
  return null;
}

10 scripts covered. Zero API calls. Sub-millisecond.

Layer 2: Keyword Heuristics (Including Hinglish)

But what about romanized text? A Hindi speaker typing in Latin script — "aaj mujhe bahut neend aa rahi hai" — looks like English to a script detector.

This is Hinglish, and my friend circle uses it constantly. So I built a keyword scoring engine with 40+ common Hindi words in Latin script:

const hinglishMarkers = [
  "namaste",
  "kaise",
  "kya",
  "kyu",
  "hain",
  "hai",
  "nahi",
  "mera",
  "meri",
  "tum",
  "aap",
  "hum",
  "mujhe",
  "bahut",
  "bohot",
  "yaar",
  "bhai",
  "karna",
  "chalo",
  "jaldi",
  "kal",
  "aaj",
  "abhi",
  "phir",
  "samjho",
];

const score = hinglishMarkers.reduce(
  (acc, word) => (new RegExp(`\\b${word}\\b`, "i").test(text) ? acc + 1 : acc),
  0,
);
if (score >= 2) return "hi";

If two or more Hinglish markers appear, the message is classified as Hindi. Same pattern extends to 8 other languages (Spanish, French, German, Portuguese, Italian, Turkish, Indonesian, Vietnamese) with their own keyword sets.

Why this matters: most translation bots just pass everything through a cloud detection API and hope for the best. This approach is faster, offline-capable for common cases, and I can tune it based on what my actual users type.

Lingo.dev: The Core Translation Engine

Lingo.dev

After detection, the message goes to Lingo.dev for actual translation. Here's the initialization:

import { LingoDotDevEngine } from "lingo.dev/sdk";

const lingo = new LingoDotDevEngine({
  apiKey: process.env.LINGODOTDEV_API_KEY,
});

And the core fan-out function that makes everything work:

export async function translateToMany(text, sourceLocale, targetLocales) {
  const filtered = targetLocales.filter((l) => l !== sourceLocale);
  if (filtered.length === 0) return {};

  // Attempt 1: batch (single API call — fastest)
  try {
    const result = await Promise.race([
      lingo.batchLocalizeText(text, {
        sourceLocale: sourceLocale,
        targetLocales: filtered,
      }),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error("batch timeout")), 15000),
      ),
    ]);
    // ... map results to locales
  } catch (err) {
    // Attempt 2: parallel individual calls (fallback)
    const settled = await Promise.allSettled(
      filtered.map((targetLocale) =>
        Promise.race([
          lingo.localizeText(text, { sourceLocale, targetLocale }),
          new Promise((_, reject) =>
            setTimeout(() => reject(new Error("timeout")), 10000),
          ),
        ]),
      ),
    );
    // ... collect results, use original text as fallback for failures
  }
}

Two things I want to highlight:

Batch-first, parallel-fallback. batchLocalizeText() sends one HTTP request for all target languages. If that fails (network hiccup, timeout), the function falls back to Promise.allSettled() with individual localizeText() calls. The user always gets something back.
Explicit sourceLocale always. Early in development, I tried passing null to let the SDK auto-detect. It hung. No error, no timeout, just... waiting. Once I switched to always providing an explicit source locale from my local detection engine, the pipeline became rock-solid. This was the single biggest reliability fix in the project.

Why Lingo.dev Specifically?

batchLocalizeText() is exactly the API shape a group chat bot needs. One message → many languages → one API call. Most translation APIs don't offer this.
The JavaScript SDK was clean to integrate in an event-driven Node.js architecture. No wrappers, no adapters.
Fast enough for real-time chat. Translation comes back before the user scrolls past the original message.

Voice Pipeline: Speech → Text → Translation → Audio

Voice is where this project got ambitious.

The flow: download the audio file → Deepgram transcribes it with automatic language detection → Lingo.dev translates the transcript to the user's preferred language → Google Cloud TTS generates a WaveNet voice in that language → bot sends an interactive response with three buttons.

Results are cached in-memory for 1 hour with automatic cleanup. TTS is optional — if Google credentials aren't configured, the bot still works for transcription and translation, just without the audio playback button.

Data Model: One User, Many Groups

const UserSchema = new mongoose.Schema(
  {
    telegramId: { type: Number, required: true, unique: true },
    locale: { type: String, default: "en" },
    manuallySet: { type: Boolean, default: false },
    groups: [{ type: String }],
    messageCount: { type: Number, default: 0 },
  },
  { timestamps: true },
);

The key insight: identity is global, membership is list-based.

A user sets their language once. That preference follows them to every group. The groups[] array tracks which groups they're active in, so the bot knows who to translate for in each group.

manuallySet is important — if true, the bot respects the user's explicit choice and never auto-overrides it. If false, the bot can update the detected language as the user types more messages.

Real Demo: 3 Users, 3 Languages, Zero Friction

Setup:

Swayam → /lang hi (Hindi)
Sakura → /lang ja (Japanese)
Erwin → /lang en (English)

What happens:

Who sends	What they type	Sakura sees	Erwin sees	Swayam sees
Swayam	"aaj ka din bahut acha raha"	🇯🇵 今日はとても良い一日でした	🇬🇧 Today was a really good day	(original)
Sakura	"明日映画を見に行きましょう"	(original)	🇬🇧 Let's go see a movie tomorrow	🇮🇳 कल चलो फिल्म देखने चलते हैं
Erwin	"Sounds great, what time?"	🇯🇵 いいですね、何時？	(original)	🇮🇳 बढ़िया, कितने बजे?

No copy-paste. No app switching. No delay. The conversation just... flows.

Things That Broke (and How I Fixed Them)

1. The Hanging Detection Bug

What happened: Early on, I let Lingo.dev SDK auto-detect the source language by passing sourceLocale: null. It worked... usually. But every few messages, the call would just hang. No error, no timeout, no response. The bot would silently stop translating until I restarted it.

What I learned: Passing null as sourceLocale triggered an internal detection path in the SDK that didn't have a timeout. My user-facing experience was a bot that randomly "went silent."

The fix: Built the local Unicode + keyword detection engine described above, and always pass an explicit sourceLocale to every Lingo.dev call. The pipeline hasn't hung once since.

2. Ghost Users in Group Translation

What happened: A user would set their language, join a new group, and... their translations wouldn't appear. Other group members saw translations, but this user was invisible to the fan-out.

Root cause: The user's groups[] array didn't include the new group ID. Their preference existed globally, but the translation target resolution (User.find({ groups: groupId })) didn't find them.

The fix: I strengthened group registration at every entry point — /start in a group, /lang in a group, handleNewMember() on join, and even on first message (auto-registration). Now $addToSet: { groups: groupId } runs at every touchpoint.

3. Google TTS Credentials on Cloud

What happened: TTS worked perfectly on my laptop (using GOOGLE_APPLICATION_CREDENTIALS pointing to a JSON file). Deployed to Render — instant crash. The credentials file doesn't exist on cloud.

The fix: Added GOOGLE_CREDENTIALS_BASE64 support. The JSON credentials get base64-encoded into an env var, decoded at startup. And I made TTS fully optional — if credentials are missing, the voice pipeline still transcribes and translates, just without the "Listen" button.

if (process.env.GOOGLE_CREDENTIALS_BASE64) {
  const credentialsJson = Buffer.from(
    process.env.GOOGLE_CREDENTIALS_BASE64,
    "base64",
  ).toString("utf8");
  ttsClient = new googleTextToSpeech.TextToSpeechClient({
    credentials: JSON.parse(credentialsJson),
  });
}

Production Hardening

Things I added that aren't glamorous but keep the bot alive:

Retry wrapper with exponential backoff (300ms → 600ms → 1200ms) for Telegram API calls
15-second timeout on batch translation, 10-second timeout on individual calls — if Lingo.dev is slow, the bot doesn't freeze
Burst rate limiting: max 10 messages per 5-second window per user — prevents translation spam
Code block preservation: regex extracts `code` and blocks before translation, restores after. Code should never be "translated."
URL preservation: same pattern — URLs get placeholder-swapped so they survive translation intact
HTML escaping on all outgoing text to prevent parse_mode injection
Graceful degradation: if MongoDB is down → users get defaults. If TTS is down → voice still transcribes. If batch API fails → parallel fallback kicks in.

What I'd Do Differently

Start with the data model, not the bot commands. The telegramId + locale + groups[] schema was the best early decision I made. Everything else plugged in cleanly because the data model was right.
Don't trust external detection for real-time use cases. Building local detection felt like over-engineering at first. It turned out to be the single biggest performance and reliability win.
Translation UX > translation quality. Even a slightly less accurate translation, delivered instantly in-thread, is more useful than a perfect translation that requires copy-paste-wait.
Solve your own problem. I built this because I was annoyed every day. That frustration gave me better instincts about what mattered (speed, threading, preserving context) than any feature spec could.

What's Next

Usage-based cost optimization (cache frequent translations)
Admin controls for large communities (moderation, analytics dashboards)
Smarter detection for code-mixed languages beyond Hinglish
Richer voice interaction (auto-translate voice in real-time, not just on-demand)
Optional Discord bridge