DEV Community

HarinezumIgel
HarinezumIgel

Posted on

Speaking the Corpus's Language: How Multilingual RAG Stays Coherent Across Turns

This article assumes you already run a multi-turn RAG pipeline and have query rewriting enabled.

  • Multilingual RAG breaks in more places than just query translation
  • Query rewriting can re‑introduce the wrong language
  • Treat rewritten queries as raw input and re‑detect + re‑translate
  • Cheap detection + caching makes this practical

Most RAG tutorials quietly assume this works. In production, it fails in ways that look like “bad retrieval” but are actually language drift.

  • A German user asks a question in German.
  • The corpus is in English.
  • A query rewriter pulls English entity names out of earlier answers.

Somewhere in the middle, translation has to happen — twice.

This article walks through the two translation passes that keep a multilingual RAG pipeline working correctly across multiple turns, why each one is necessary, and what breaks without them. It also documents the places where things still go wrong — because they do.


The setup

A typical local RAG stack:

  • Embeddings + vector store: An embedding model primarily optimized for English, but with usable multilingual similarity + Chroma DB
  • Retrieval: Hybrid (vector + BM25), fused with Reciprocal Rank Fusion
  • Generation: local LLM via Ollama
  • Query rewriting: instruction-style rewriting of follow-up queries into self-contained questions

The corpus is English. Users are not.

That single asymmetry — English internals, multilingual edges — forces translation into more places than people expect.


Pass 1 — Translating the user query (every turn)

A user might ask:

Was für Tiere sind beschrieben?

Embeddings, BM25, and the corpus all operate in English. Sending the raw German query straight to retrieval yields poor results for a simple reason: there is no overlap signal.

Each turn therefore runs:

detect language → translate to English → retrieve
Enter fullscreen mode Exit fullscreen mode

Language detection happens on every turn, not once per session. Users frequently code-switch inside a single query, especially when technical terms or entity names appear in previous answers.

The detected source language is stored in the session so the final LLM call can be instructed to answer in the user's language.

Here is what that first turn looks like in the running system:

💬 Your query>  what animals are described?

🔵 UserQuery      Original user query: 'what animals are described?'
🔵 LangDetect     Detected language: English (en) — confidence: 100%
🔵 QueryRewrite   No conversation history — skipping rewrite
🔵 FinalQuery     Final query for retrieval: 'what animals are described?' (unchanged)

💡 ### Answer
   The following distinct animal species are described in the context:
   - Hedgehogs, Cats, Fish, Dogs, Lions, Apes (Gorillas and Orangutans),
     Elephants, Pferde (Horses)

   ### Sources
   Hedgehogs.pdf · Cats.md · Fish.txt · Dogs.png · Lions.pptx
   Apes.docx · Elephants.jpg · Pferde.pdf
Enter fullscreen mode Exit fullscreen mode

No rewrite happens on turn 1 — there is no history yet, so the query goes straight through.


Pass 2 — Re-translation after rewriting

Multi-turn RAG breaks on pronouns. A query rewrite step fixes that, but introduces a subtler problem.

When entity names are copied from English history into a non-English sentence template, the rewritten query can be mixed-language:

Sind elephants, hedgehogs Säugetiere?

That output is internally consistent. It is also unusable by an English-only retriever.

A concrete turn-by-turn:

1. User (DE):       Was für Tiere sind beschrieben?
2. Translated:      What animals are described?
3. Assistant (EN):  Cats, horses, hedgehogs, dogs, fish, apes, elephants, lions.
4. User (EN):       Are they mammals?
5. Rewritten:       Are Hedgehogs, Cats, Fish, Dogs, Lions, Gorillas,
                    Orangutans, Elephants, and Horses mammals?
6. Re-translated:   (already English — pass short-circuits)
7. Retrieval:       Are Hedgehogs, Cats, Fish, Dogs, Lions, Gorillas,
                    Orangutans, Elephants, and Horses mammals?
Enter fullscreen mode Exit fullscreen mode

Step 5 is what makes retrieval work. The pronoun "they" is gone; explicit entity names from the previous answer replace it. Here is the log for turn 2:

💬 Your query>  are they mammals?

🔵 UserQuery      Original user query: 'are they mammals?'
🔵 LangDetect     Detected language: English (en) — confidence: 100%
🔵 QueryRewrite   Rewriting query using 1 history turns
🔵 QueryRewrite   'are they mammals?' → 'are Hedgehogs, Cats, Fish, Dogs,
↳                 Lions, Gorillas, Orangutans, Elephants, and Horses mammals?'
🔵 QueryRewrite   Final query: are Hedgehogs, Cats, Fish, Dogs, Lions,
↳                 Gorillas, Orangutans, Elephants, and Horses mammals?
🔵 LangDetect     Detected language: English (en) — confidence: 100%
🔵 FinalQuery     Final query for retrieval: 'are Hedgehogs, Cats, Fish,
↳                 Dogs, Lions, Gorillas, Orangutans, Elephants, and Horses mammals?'
                  (was: 'are they mammals?')

💡 ### Answer
   Hedgehogs, Cats, Dogs, Lions, Gorillas, Orangutans, and Elephants are
   mammals. The context is silent on whether Horses are explicitly stated
   as mammals in the provided text.
Enter fullscreen mode Exit fullscreen mode

The rewriter resolves the coreference correctly. Retrieval gets a self-contained query with real entity names.

The pattern holds across deeper turns too. On turn 3, "these mammals" gets resolved against the confirmed mammal list from the previous answer:

💬 Your query>  give me details you find about these mammals

🔵 QueryRewrite   Rewriting query using 2 history turns
🔵 QueryRewrite   'give me details you find about these mammals' →
↳                 'give me details about Hedgehogs, Cats, Dogs, Lions,
                  Gorillas, Orangutans, and Elephants'
Enter fullscreen mode Exit fullscreen mode

Why the second translate pass is worth running

It is reasonable to ask whether a second translation pass is really necessary. After all, the user query was already translated to English before the rewriter ran — so the rewriter should have only English material to work with.

In practice, two things keep this from holding:

  1. The rewriter's context is mixed. It reads recent stored turns as USER: … / ASSISTANT: … pairs, which means it sees the user's original-language wording and the assistant's English answers. Whichever side it copies from, the result can land in either language.
  2. Rewriting is not translation. Rewriting models are tuned to resolve references and stay close to the user's phrasing, not to enforce a target language. When entity names from one language are dropped into a sentence template from another, the output is faithful to its inputs but no longer monolingual.

A short detect-and-translate pass on the rewritten query is cheap when the output is already English (detection runs; translation skips) and rescues the mixed-language case when it is not.

The fix is simple but often skipped:

Treat rewritten queries as raw input and run language detection and translation again.

Known failure mode: the rewriter over-triggers

The rewriter assumes that any query arriving alongside chat history must contain a reference worth resolving. Small LLMs do not reliably detect the absence of coreference, so a self-contained query — one that never actually pointed at anything — can still get entity-substituted from history.

Concrete example. A new chat session starts. The user's first message is "Was für Tiere sind beschrieben?" ("What animals are described?"). One history turn exists from a previous session whose answer listed hedgehogs, cats, fish, and so on. (This is a realistic condition in any setup where history is stored externally — shared session stores, reconnecting clients, or persistent memory features — rather than being cleared when a browser tab closes.) The rewriter reads that turn, finds entity candidates, and produces:

What animals are Hedgehogs, Cats, Fish, Dogs, Lions, Apes, and Elephants?
Enter fullscreen mode Exit fullscreen mode

The query is now semantically broken: instead of asking the corpus what animals it mentions, it asks whether a specific hardcoded list of animals are animals. The LLM answers as a taxonomy question and ignores the documents entirely.

Mitigations and their tradeoffs

Approach Prevents over-triggering? Downsides
use_chat_context: false ✅ fully — no history, no rewrite Loses all coreference resolution
max_history_turns: 1 ⚠️ partially — only if the bad turn falls outside the 1-turn window If the prior answer is the immediately preceding turn, the material is still there
Jaccard overlap gate ⚠️ partially — very similar queries (overlap ≥ threshold) are skipped False-skips valid coreference on translated queries: German "welches sind Säugetiere?""What are mammals?" after the translation model drops the determiner → overlap ≈ 0 → the gate fires and skips a query that genuinely needs rewriting
Explicit follow-up phrasing ✅ user-side — "which of those are mammals?" always has a clear referent Requires user discipline

The Jaccard gate was the original gating mechanism in this codebase and was removed precisely because of the translated-query false-skip problem. It solves the over-triggering case but breaks the core use case the rewriter exists for. There is no free lunch here; the right setting depends on whether your users mix sessions or always start fresh.


The full pipeline

If you don’t read flowcharts fluently, you can skip this and continue—the conclusions don’t depend on it.

Multilingual RAG pipeline

The three blue nodes are translation steps. The rewrite node (green) sits between them — it is what makes pass 2 necessary.


Putting it together

In the worst case, a single multi-turn query involves:

  • one user-query translation (Pass 1)
  • an LLM rewrite using stored history
  • an optional re-translation of the rewritten query (Pass 2)
  • the final LLM answer, returned in the user's original language

Careful detection and caching keep the total overhead manageable. When the query and the rewritten query are both already English, both translation calls short-circuit immediately.


The retrieval–generation asymmetry

The two translation passes described here keep the query in a language the vector store and reranker can match against. They say nothing about the documents.

A corpus document in a language different from English (say, a PDF written entirely in German) will still be retrieved correctly — the embedding model handles multilingual similarity — but the generation LLM may silently skip or ignore that chunk when composing its answer in English. The retrieval log will show the chunk selected; the answer will omit the information.

This is visible in practice: with all other test documents in English and one German document (Pferde.pdf), the query "what animals are described?" retrieves three Pferde.pdf chunks above threshold, yet the answer lists every other animal and omits horses entirely.

The root cause is the asymmetry between:

  • Retrieval — handled by embedding and reranking models that were trained multilingually and score cross-lingual similarity well
  • Generation — handled by the chat LLM, which is given chunks in whatever language they were indexed in and is expected to synthesize them into a coherent answer

The two fixes are:

Approach Effect Cost
Translate documents to English at ingestion time Retrieval and generation both work Permanent content transformation; original language lost in storage
Add an explicit instruction to the system prompt to handle multilingual chunks LLM is reminded to use all chunks regardless of language Costs tokens; not guaranteed; LLM may still miss content

The translation passes in this article do not solve this — they fix the query path. The document path is a separate problem.


Takeaways

  • Detect first; translate only when needed — language detection on every turn, not once per session
  • Do not assume rewriting preserves language — rewritten queries must be treated as raw input
  • The rewriter's context is mixed-language by nature — users write in one language, answers reference entities from another
  • A second translate pass is cheap — it is a no-op for English output and a correctness fix for mixed output

Pronoun and coreference rewriting is essential for multi-turn chat — without it, follow-up questions like "are they mammals?" lose their referent the moment they are translated. The two translation passes described above are what keep that rewrite step working once users stop typing in English.


Source code

The complete implementation is available here:

github.com/HarinezumIgel/RAG-LCC

If you find the approach useful and think it could help other people building multilingual RAG systems, leaving a star on the repository can make it easier for them to discover it.

Top comments (1)

Collapse
 
harinezumigel profile image
HarinezumIgel

Author here. Happy to answer questions or clarify details.