Koushikxd

Posted on Feb 19 • Edited on Feb 28

Lingo Bolt: Removing the Language Barrier From OpenSource

#opensource #nextjs #ai #webdev

A few months ago I tried to contribute to a Japanese open source project. The codebase looked interesting, the issues seemed approachable, but everything was in Japanese. The README had a small english section but half the steps were wrong or missing. I spent an hour just trying to get it running locally and eventually gave up.

I'm a native english speaker. If I had that much trouble, what chance does someone in Brazil or Korea have when they're trying to contribute to a project thats entirely in english?

That's the problem I wanted to fix.

The Problem

Open source is supposed to be for everyone but in practice it runs almost entirely in english. This creates two groups of people who get left out:

Contributors who don't write english well. They might be great developers but they can't properly describe their bug, or they can't understand a maintainers review comment well enough to respond. So they stop trying.

Maintainers who get issues in other languages. Someone's project gets shared in a Brazilian developer community and suddenly there are 5 issues in Portuguese. The maintainer has no idea what they say so they close them or ignore them.

Both sides lose. The project loses contributors and the contributors feel unwelcome.

What I Built

Lingo Bolt is a web app plus a GitHub bot that tries to fix this. You sign in with GitHub, connect a repo, and you get a few tools that let you work with it in any language.

The GitHub Bot

This is the main feature and the part that requires zero effort from contributors. You install the lingo-bolt GitHub App on your account or org, configure your default language and thats it. The bot starts listening to your repos through GitHub webhooks.

How the webhook flow works

When GitHub sends an event to our webhook endpoint, the app verifies the signature and routes it to the right handler. There are three events we care about:

issues.opened         --> new issue was filed
pull_request.opened   --> new PR was opened
issue_comment.created --> someone commented on an issue or PR

Auto language detection and labeling

When a new issue comes in, the bot uses GPT-4o-mini to detect what language its written in and creates a label like lang:portuguese or lang:chinese and applies it to the issue automatically.

This is useful even without translation. Maintainers can filter issues by language label and at least know what they're dealing with.

Auto translation

If auto-translate is enabled, the bot detects the language of the content and if its different from the maintainer's default language, it translates it and posts a comment with the translated version. It skips posting if the detected language already matches the default, so you don't get noise when issues are already in the right language.

const detected = await detectLanguageCode(text);
if (detected === locale || detected === locale.split("-")[0]) return;

const translated = await engine.localizeText(text, {
  sourceLocale: detected,
  targetLocale: locale,
});

await postComment(octokit, owner, repo, issueNumber,
  `**Auto-translated to ${defaultLanguage}:**\n\n${translated}`
);

Manual commands

Beyond the automatic stuff, anyone can trigger the bot manually by mentioning @lingo-bolt in a comment:

@lingo-bolt translate to spanish    --> translates the issue body to Spanish
@lingo-bolt summarize               --> summarizes in the maintainer's default language
@lingo-bolt summarize in french     --> summarizes in French specifically

The summarize command is really useful for long threads. It uses GPT to write a concise summary of the issue, then if the target language isn't english it translates that summary using Lingo.dev before posting it as a comment.

const { text: summary } = await generateText({
  model: openai("gpt-4o-mini"),
  prompt: `Summarize the following GitHub issue/PR content concisely. Keep it clear and actionable.\n\n${text}`,
});

if (locale !== "en") {
  finalText = await engine.localizeText(summary, {
    sourceLocale: "en",
    targetLocale: locale,
  });
}

Per-repo overrides

The bot dashboard lets you configure settings at both the installation level and per individual repo. So if you have an org with 10 repos but only want auto-translate on 3 of them, you can set that up without it affecting the rest.

RAG Indexing + AI Chat

Once you connect a repo, you can index it. The app clones it, chunks the code, embeds it using text-embedding-3-small, and stores it in Qdrant. After that you can chat with the repo in any language.

But its more than just a Q&A bot. The chat has tools that can fetch live data from GitHub — open issues, pull requests, issue details. So you can ask things like:

"show me open issues that look easy to fix"
"what does issue #42 need, explain it to me"
"how does the authentication work in this codebase"
"where should I start if I want to add dark mode"

And because the chat also has access to the indexed codebase, it can point you to the exact files and functions relevant to whatever issue you're looking at. You ask about an issue, it finds the relevant code, explains whats happening, and tells you where to look.

For someone who's not confident in english, being able to ask all of this in their own language and get answers that reference actual code in the repo removes two blockers at once: the language barrier and the "i dont know where to start" barrier that most new contributors hit.

const SUGGESTED_PROMPTS = [
  { key: "chat.prompt.showOpenIssues" },
  { key: "chat.prompt.recommendEasyIssue" },
  { key: "chat.prompt.listOpenPrs" },
  { key: "chat.prompt.searchEntryPoint" },
];

MCP Server (IDE Integration)

This one I'm quite happy with. Lingo Bolt exposes an MCP server so you can use its tools directly inside Cursor, VS Code, or any IDE that supports MCP. No separate build step needed — it runs as a remote HTTP endpoint inside the web app at /api/mcp.

Add this to opencode/opencode.json or .cursor/mcp.json:

{
  "mcpServers": {
    "lingo-bolt": {
      "type": "remote",
      "url": "http://localhost:3000/api/mcp"
    }
  }
}

Then you can ask your IDE things like:

Show me open issues for facebook/react in Spanish
Get issue #42 from vercel/next.js in Hindi
Translate the README of expressjs/express to Japanese
Search the codebase of my-org/my-repo for authentication logic

The available tools are:

Tool	What it does
`list_issues`	List issues with titles translated to your language
`get_issue`	Fetch a full issue with comments, translated
`translate_doc`	Translate any file (README, CONTRIBUTING, etc.)
`translate_text`	Translate arbitrary text between languages
`search_codebase`	Semantic search across an indexed repo
`get_onboarding`	Fetch AI-generated onboarding docs for a repo

The list_issues and get_issue tools auto-detect the repo from your current git directory if you don't specify owner/repo. So inside a repo you can just say "show me open issues in Japanese" and it figures out the rest.

There's also a standalone stdio-based MCP package (packages/mcp) if you prefer not running the web app. Same tools, same idea, just runs as a local process.

AI Onboarding Docs

When you index a repo, the app reads through the codebase using RAG and generates proper onboarding documentation — what the project does, how to set it up, the key concepts. You can then translate those docs into any supported language.

So a new contributor from Japan can get onboarding docs in Japanese, based on the actual code and not just whatever the original author wrote (or didn't write).

const ragResults = await Promise.all(
  RAG_QUERIES.map((query) => queryRepository({ query, repositoryId, limit: 8 }))
);

if (locale !== "en") {
  const translated = await engine.localizeText(accumulated, {
    sourceLocale: "en",
    targetLocale: locale,
    fast: true,
  });
}

Markdown Translation

A lot of repos have multiple markdown files beyond just the README — contributing guides, architecture docs, api references. Lingo Bolt clones the repo, finds all the markdown files, and translates the whole set using the Lingo.dev SDK. You can download them individually or as a zip.

The Whole App UI is Translated

The app UI adapts to your preferred language. Instead of shipping JSON translation files for every language, all UI strings are stored in english and translated on demand using engine.localizeObject() from the Lingo.dev SDK. The result is cached so it only fetches once per session.

const translated = await engine.localizeObject(UI_MESSAGES_EN, {
  sourceLocale: "en",
  targetLocale: body.targetLocale,
  fast: true,
});

This means we support 15 languages without maintaining any translation files manually.

Architecture

User
 |
Next.js App
 |
 +-- tRPC routers (repo, bot, user)
 |
 +-- Route handlers
      |
      +-- /api/mcp              --> MCP server (IDE integration)
      +-- /api/repository/index --> clone + embed chunks --> Qdrant
      +-- /api/chat             --> RAG + GitHub tools + GPT stream
      +-- /api/onboarding/generate  --> RAG + GPT stream
      +-- /api/onboarding/translate --> Lingo.dev
      +-- /api/markdown/translate   --> clone + Lingo.dev
      +-- /api/ui/translate         --> Lingo.dev localizeObject
      +-- /api/webhook          --> GitHub App events
           |
           +-- issues.opened        --> auto-label, auto-translate
           +-- pull_request.opened  --> auto-label, auto-translate
           +-- issue_comment.created --> @lingo-bolt command parsing

PostgreSQL (repos, docs, translations, bot installations)
Qdrant (code chunks for semantic search)

How Lingo.dev Is Used

Feature	SDK method
GitHub issue/PR translation (bot)	`localizeText`
AI chat responses	`localizeText`
Onboarding doc translation	`localizeText`
Markdown file translation	`localizeText`
MCP tool responses	`localizeText`
UI string translation	`localizeObject`