I built a bilingual bot. It had an identity crisis every twenty messages.
A retail chain needed customer support for Bangladeshi and expat customers. The requirement sounded simple. The bot should speak English and Bangla fluently and switch languages when the customer switches.
I thought this would be trivial. Modern models handle multilingual text easily. I built it in two days, tested it, and everything looked fine.
Then we deployed it.
Week One: The Complaints
Customers started reporting strange behavior.
The bot replied in Banglish. It mixed English and Bangla in the same sentence. Sometimes it answered in Bangla even though the user never left English.
I pulled the logs.
And it was worse than I expected.
What Was Happening
Conversations started clean.
The user asked in English. The bot replied in English. Prices, availability, all correct.
Then suddenly, mid-conversation, the bot would respond like this.
Half English. Half Bangla. One sentence switching languages mid-thought.
In Bangla conversations, English words leaked in awkwardly. In English conversations, Bangla sentences appeared out of nowhere.
Users were confused. Some asked why the bot changed languages. Others just left.
The Pattern
The problem didn’t appear immediately.
For the first ten messages, everything looked normal.
After fifteen or twenty turns, the bot started mixing languages. Sentence structure from one language, vocabulary from another. It felt like the bot was slowly forgetting what language it was supposed to use.
That was the clue.
Why This Happened
My prompt said to match the customer’s language. That sounded reasonable.
But I ignored how context actually works.
The bot’s context window wasn’t just the user messages. It also included retrieved product data, policy text, and internal notes.
And my knowledge base was full of Bangla.
Product names were stored in Bangla. Policy titles were Bangla. Internal descriptions mixed both languages.
Every retrieval injected Bangla text into the context, even when the user spoke only English.
After enough turns, the context became a soup of English and Bangla. The model assumed the conversation itself was code-switching.
So it started code-switching too.
My First Failed Fix
I tried forcing the bot to always respond in the user’s current language.
That didn’t work. By the time the context was polluted, the bot couldn’t reliably tell what the user’s language was anymore.
My Second Failed Fix
I tried detecting the language of the last user message and responding only in that language.
That helped, but it broke something else.
Bangladeshi users naturally mix English words into Bangla sentences. Words like “price,” “stock,” and “size” are normal. The detector flagged these as mixed input and forced English responses, which felt unnatural.
The bot wasn’t wrong. But it wasn’t right either.
The Real Solution: Language State Management
I stopped relying on implicit detection and added explicit state.
Every response now starts by determining the dominant language of the user’s last message. Borrowed words are ignored. Only structure and grammar matter.
That decision sets a language state for the conversation.
Once the language is set, everything else follows it.
Knowledge base content is retrieved and translated into the target language before it ever reaches the response generator. Raw mixed-language text is never injected directly.
The response is generated strictly in one language. One hundred percent English or one hundred percent Bangla. No exceptions.
Before sending, the response is checked. If any unintended language leakage is detected, it is rewritten.
What Changed
English conversations stayed English, even after twenty messages.
Bangla conversations stayed Bangla, with natural borrowed English terms where appropriate.
If a user switched languages mid-conversation, the bot detected it and switched cleanly on the next reply.
The bot stopped drifting.
Cleaning the Knowledge Base
Fixing the prompt wasn’t enough.
I restructured the knowledge base so every field had language-specific values. Product names, policies, and descriptions were stored separately for English and Bangla.
Now the bot could retrieve clean data without polluting its context.
The Results
Before the fix, language mixing complaints were constant. Conversations were abandoned because users found the bot confusing. Satisfaction was low.
After the fix, complaints dropped almost to zero. Conversations became smoother. Users praised the bilingual experience instead of criticizing it.
The bot finally sounded confident instead of confused.
What I Learned
Context pollution is real. Every word you inject affects future responses.
Multilingual capability doesn’t mean language control. You must manage it explicitly.
Knowledge base structure matters as much as prompt wording.
Code-switching needs rules. Some mixing is natural. Random mixing is not.
State tracking beats vague instructions every time.
The Principle
Multilingual AI needs explicit language management, not implicit hope.
Don’t assume the model will figure it out. Track the language. Enforce it. Verify it.
Your Turn
Have you seen multilingual bots start mixing languages unexpectedly? How do you manage language state in bilingual systems? What rules do you use for code-switching?
Written by FARHAN HABIB FARAZ
Senior Prompt Engineer and Team Lead at PowerInAI
Building bilingual AI that remembers which language it’s speaking
Tags: multilingual, codeswitching, bangla, bilingualai, languageprocessing, promptengineering
Top comments (0)