This article assumes you already run a multi-turn RAG pipeline and have query rewriting enabled.
- Multilingual RAG breaks in more places than just query translation
- Query rewriting can re‑introduce the wrong language
- Treat rewritten queries as raw input and re‑detect + re‑translate
- Cheap detection + caching makes this practical
Most RAG tutorials quietly assume this works. In production, it fails in ways that look like “bad retrieval” but are actually language drift.
- A German user asks a question in German.
- The corpus is in English.
- A query rewriter pulls English entity names out of earlier answers.
Somewhere in the middle, translation has to happen — twice.
This article walks through the two translation passes that keep a multilingual RAG pipeline working correctly across multiple turns, why each one is necessary, and what breaks without them. It also documents the places where things still go wrong — because they do.
The setup
A typical local RAG stack:
- Embeddings + vector store: An embedding model primarily optimized for English, but with usable multilingual similarity + Chroma DB
- Retrieval: Hybrid (vector + BM25), fused with Reciprocal Rank Fusion
- Generation: local LLM via Ollama
- Query rewriting: instruction-style rewriting of follow-up queries into self-contained questions
The corpus is English. Users are not.
That single asymmetry — English internals, multilingual edges — forces translation into more places than people expect.
Pass 1 — Translating the user query (every turn)
A user might ask:
Was für Tiere sind beschrieben?
Embeddings, BM25, and the corpus all operate in English. Sending the raw German query straight to retrieval yields poor results for a simple reason: there is no overlap signal.
Each turn therefore runs:
detect language → translate to English → retrieve
Language detection happens on every turn, not once per session. Users frequently code-switch inside a single query, especially when technical terms or entity names appear in previous answers.
The detected source language is stored in the session so the final LLM call can be instructed to answer in the user's language.
Here is what that first turn looks like in the running system:
💬 Your query> what animals are described?
🔵 UserQuery Original user query: 'what animals are described?'
🔵 LangDetect Detected language: English (en) — confidence: 100%
🔵 QueryRewrite No conversation history — skipping rewrite
🔵 FinalQuery Final query for retrieval: 'what animals are described?' (unchanged)
💡 ### Answer
The following distinct animal species are described in the context:
- Hedgehogs, Cats, Fish, Dogs, Lions, Apes (Gorillas and Orangutans),
Elephants, Pferde (Horses)
### Sources
Hedgehogs.pdf · Cats.md · Fish.txt · Dogs.png · Lions.pptx
Apes.docx · Elephants.jpg · Pferde.pdf
No rewrite happens on turn 1 — there is no history yet, so the query goes straight through.
Pass 2 — Re-translation after rewriting
Multi-turn RAG breaks on pronouns. A query rewrite step fixes that, but introduces a subtler problem.
When entity names are copied from English history into a non-English sentence template, the rewritten query can be mixed-language:
Sind elephants, hedgehogs Säugetiere?
That output is internally consistent. It is also unusable by an English-only retriever.
A concrete turn-by-turn:
1. User (DE): Was für Tiere sind beschrieben?
2. Translated: What animals are described?
3. Assistant (EN): Cats, horses, hedgehogs, dogs, fish, apes, elephants, lions.
4. User (EN): Are they mammals?
5. Rewritten: Are Hedgehogs, Cats, Fish, Dogs, Lions, Gorillas,
Orangutans, Elephants, and Horses mammals?
6. Re-translated: (already English — pass short-circuits)
7. Retrieval: Are Hedgehogs, Cats, Fish, Dogs, Lions, Gorillas,
Orangutans, Elephants, and Horses mammals?
Step 5 is what makes retrieval work. The pronoun "they" is gone; explicit entity names from the previous answer replace it. Here is the log for turn 2:
💬 Your query> are they mammals?
🔵 UserQuery Original user query: 'are they mammals?'
🔵 LangDetect Detected language: English (en) — confidence: 100%
🔵 QueryRewrite Rewriting query using 1 history turns
🔵 QueryRewrite 'are they mammals?' → 'are Hedgehogs, Cats, Fish, Dogs,
↳ Lions, Gorillas, Orangutans, Elephants, and Horses mammals?'
🔵 QueryRewrite Final query: are Hedgehogs, Cats, Fish, Dogs, Lions,
↳ Gorillas, Orangutans, Elephants, and Horses mammals?
🔵 LangDetect Detected language: English (en) — confidence: 100%
🔵 FinalQuery Final query for retrieval: 'are Hedgehogs, Cats, Fish,
↳ Dogs, Lions, Gorillas, Orangutans, Elephants, and Horses mammals?'
(was: 'are they mammals?')
💡 ### Answer
Hedgehogs, Cats, Dogs, Lions, Gorillas, Orangutans, and Elephants are
mammals. The context is silent on whether Horses are explicitly stated
as mammals in the provided text.
The rewriter resolves the coreference correctly. Retrieval gets a self-contained query with real entity names.
The pattern holds across deeper turns too. On turn 3, "these mammals" gets resolved against the confirmed mammal list from the previous answer:
💬 Your query> give me details you find about these mammals
🔵 QueryRewrite Rewriting query using 2 history turns
🔵 QueryRewrite 'give me details you find about these mammals' →
↳ 'give me details about Hedgehogs, Cats, Dogs, Lions,
Gorillas, Orangutans, and Elephants'
Why the second translate pass is worth running
It is reasonable to ask whether a second translation pass is really necessary. After all, the user query was already translated to English before the rewriter ran — so the rewriter should have only English material to work with.
In practice, two things keep this from holding:
-
The rewriter's context is mixed. It reads recent stored turns as
USER: … / ASSISTANT: …pairs, which means it sees the user's original-language wording and the assistant's English answers. Whichever side it copies from, the result can land in either language. - Rewriting is not translation. Rewriting models are tuned to resolve references and stay close to the user's phrasing, not to enforce a target language. When entity names from one language are dropped into a sentence template from another, the output is faithful to its inputs but no longer monolingual.
A short detect-and-translate pass on the rewritten query is cheap when the output is already English (detection runs; translation skips) and rescues the mixed-language case when it is not.
The fix is simple but often skipped:
Treat rewritten queries as raw input and run language detection and translation again.
Known failure mode: the rewriter over-triggers
The rewriter assumes that any query arriving alongside chat history must contain a reference worth resolving. Small LLMs do not reliably detect the absence of coreference, so a self-contained query — one that never actually pointed at anything — can still get entity-substituted from history.
Concrete example. A new chat session starts. The user's first message is "Was für Tiere sind beschrieben?" ("What animals are described?"). One history turn exists from a previous session whose answer listed hedgehogs, cats, fish, and so on. (This is a realistic condition in any setup where history is stored externally — shared session stores, reconnecting clients, or persistent memory features — rather than being cleared when a browser tab closes.) The rewriter reads that turn, finds entity candidates, and produces:
What animals are Hedgehogs, Cats, Fish, Dogs, Lions, Apes, and Elephants?
The query is now semantically broken: instead of asking the corpus what animals it mentions, it asks whether a specific hardcoded list of animals are animals. The LLM answers as a taxonomy question and ignores the documents entirely.
Mitigations and their tradeoffs
| Approach | Prevents over-triggering? | Downsides |
|---|---|---|
use_chat_context: false |
✅ fully — no history, no rewrite | Loses all coreference resolution |
max_history_turns: 1 |
⚠️ partially — only if the bad turn falls outside the 1-turn window | If the prior answer is the immediately preceding turn, the material is still there |
| Jaccard overlap gate | ⚠️ partially — very similar queries (overlap ≥ threshold) are skipped | False-skips valid coreference on translated queries: German "welches sind Säugetiere?" → "What are mammals?" after the translation model drops the determiner → overlap ≈ 0 → the gate fires and skips a query that genuinely needs rewriting |
| Explicit follow-up phrasing | ✅ user-side — "which of those are mammals?" always has a clear referent | Requires user discipline |
The Jaccard gate was the original gating mechanism in this codebase and was removed precisely because of the translated-query false-skip problem. It solves the over-triggering case but breaks the core use case the rewriter exists for. There is no free lunch here; the right setting depends on whether your users mix sessions or always start fresh.
The full pipeline
If you don’t read flowcharts fluently, you can skip this and continue—the conclusions don’t depend on it.
The three blue nodes are translation steps. The rewrite node (green) sits between them — it is what makes pass 2 necessary.
Putting it together
In the worst case, a single multi-turn query involves:
- one user-query translation (Pass 1)
- an LLM rewrite using stored history
- an optional re-translation of the rewritten query (Pass 2)
- the final LLM answer, returned in the user's original language
Careful detection and caching keep the total overhead manageable. When the query and the rewritten query are both already English, both translation calls short-circuit immediately.
The retrieval–generation asymmetry
The two translation passes described here keep the query in a language the vector store and reranker can match against. They say nothing about the documents.
A corpus document in a language different from English (say, a PDF written entirely in German) will still be retrieved correctly — the embedding model handles multilingual similarity — but the generation LLM may silently skip or ignore that chunk when composing its answer in English. The retrieval log will show the chunk selected; the answer will omit the information.
This is visible in practice: with all other test documents in English and one German document (Pferde.pdf), the query "what animals are described?" retrieves three Pferde.pdf chunks above threshold, yet the answer lists every other animal and omits horses entirely.
The root cause is the asymmetry between:
- Retrieval — handled by embedding and reranking models that were trained multilingually and score cross-lingual similarity well
- Generation — handled by the chat LLM, which is given chunks in whatever language they were indexed in and is expected to synthesize them into a coherent answer
The two fixes are:
| Approach | Effect | Cost |
|---|---|---|
| Translate documents to English at ingestion time | Retrieval and generation both work | Permanent content transformation; original language lost in storage |
| Add an explicit instruction to the system prompt to handle multilingual chunks | LLM is reminded to use all chunks regardless of language | Costs tokens; not guaranteed; LLM may still miss content |
The translation passes in this article do not solve this — they fix the query path. The document path is a separate problem.
Takeaways
- Detect first; translate only when needed — language detection on every turn, not once per session
- Do not assume rewriting preserves language — rewritten queries must be treated as raw input
- The rewriter's context is mixed-language by nature — users write in one language, answers reference entities from another
- A second translate pass is cheap — it is a no-op for English output and a correctness fix for mixed output
Pronoun and coreference rewriting is essential for multi-turn chat — without it, follow-up questions like "are they mammals?" lose their referent the moment they are translated. The two translation passes described above are what keep that rewrite step working once users stop typing in English.
Source code
The complete implementation is available here:
github.com/HarinezumIgel/RAG-LCC
If you find the approach useful and think it could help other people building multilingual RAG systems, leaving a star on the repository can make it easier for them to discover it.

Top comments (1)
Author here. Happy to answer questions or clarify details.