Saquib Shahid

Posted on May 10

I Tested Gemma 4 and GPT-4o-mini on Indian Language Tasks — The Results Surprised Me

#devchallenge #gemmachallenge #gemma #ai

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I asked GPT-4o-mini to explain artificial intelligence in Bengali, as if teaching a 15-year-old student.

It sent me back the original Bengali prompt. Word for word.

That single moment — a major AI model completely failing a language spoken by 230 million people — is what this article is about.

India has 1.4 billion people, 22 officially recognised languages, and hundreds of dialects. Yet nearly every AI benchmark is English-first. When developers talk about model quality, they are almost always talking about performance on Western language tasks. So I ran a direct, practical comparison: Gemma 4 (31B via OpenRouter) vs GPT-4o-mini (via ChatGPT) — five prompts, five real Indian-language scenarios, zero cherry-picking.

Here is what actually happened.

The Setup

I used Gemma 4 31B through OpenRouter's free tier and GPT-4o-mini through the standard ChatGPT free account. Same prompts, same conditions, both tested within the same hour to avoid any model update differences. I evaluated each response on three things: script correctness (did it use the right writing system), linguistic naturalness (does this sound like how people actually speak), and cultural fit (does it understand Indian context, not just translate words).

I also want to be upfront: I am a native Hindi speaker, and I used my own judgment for Hindi/Hinglish responses. For Bengali, I evaluated based on script correctness and structural coherence — and that Bengali failure in Prompt 5 is visible to anyone, native speaker or not.

Prompt 1: Hinglish — the real test of Indian language understanding

The prompt:

Mujhe ek chhoti si poem likhni hai apne best friend ke liye jo kal foreign ja raha hai. Poem mein thoda emotion ho, thoda humor bhi. Help karo yaar, Hindi-English mix mein theek hai.

This was my first real test. Hinglish — the casual Hindi-English code-switching that dominates Indian WhatsApp conversations, college campuses, and everyday speech — is arguably the most important register for any model serving Indian users. It is not "broken Hindi." It is a distinct, expressive communication style with its own rhythm.

GPT-4o-mini's response:

Tu ja raha hai door, par yaari wahi rahegi,
Time zone alag hoga, par bakchodi daily chalegi. 😄
Kal tak jo chai pe duniya solve karte the,
Ab tu foreign jaake "bro accent" mein bolega shayad re.
...
Distance sirf maps pe hota hai mere bhai,
Best friend toh wahi rehta hai… chahe duniya ke kisi bhi side chala jaye. ❤️

Gemma 4's response:

Kal ek naya chapter shuru hoga teri life ka,
Tu ja raha hai door, apne sapno ki raah par.
Proud hoon main tera, ki tune yeh mukaam paaya,
Par dil thoda udaas hai, kyunki tu door chala gaya.
...
Fly high, stay humble, and shine like a star,
Hamesha yaad rakhna, I'm always here, no matter how far.

Verdict: Both models handled Hinglish naturally and confidently — neither switched to formal Hindi or Devanagari, and neither produced stilted translated-sounding text. The difference is tone. GPT-4o-mini leaned into banter and humour ("bakchodi daily chalegi", "bro accent" mein bolega). Gemma 4 went emotional and poetic. Both are genuinely good Hinglish.

This is a tie, but it reveals something important: both models have absorbed enough real Indian internet text to produce natural code-switching. That was not guaranteed.

Prompt 2: Formal Hindi — where the gap becomes obvious

The prompt:

Please write a professional email in Hindi to a senior colleague (addressing them as "Aap") requesting a 30-minute meeting to discuss a project update. The tone should be respectful but not overly formal — natural office Hindi.

This is a practical task that millions of Indian office workers face daily. The key test: does the model understand that "formal Hindi" means Devanagari script, proper honorifics, and a structured format — not a transliteration?

GPT-4o-mini's response:

Namaste Sir/Ma'am,
Aasha karta hoon aap theek honge.
Main project ke latest updates aur kuch important points discuss karna chahta tha. Agar aapke schedule mein possible ho, toh kya hum is week 30-minute ki ek meeting schedule kar sakte hain?
Dhanyavaad.

It wrote the entire email in Roman script. This is a significant failure. If you send a "Hindi email" written in Roman transliteration to a senior colleague in an Indian corporate environment, it does not read as professional — it reads as an informal WhatsApp message. The structure is fine. The language is fine. But the script choice undermines the entire purpose.

Gemma 4's response:

Subject: Project [Project Name] के updates के संबंध में मीटिंग के लिए अनुरोध

Dear [Colleague's Name] जी,

आशा है कि आप ठीक होंगे।

मैं आपसे Project [Project Name] की current progress के बारे में चर्चा करना चाहता/चाहती हूँ। मुझे कुछ updates साझा करने हैं और आगे के steps के लिए आपका feedback और मार्गदर्शन चाहिए।

क्या आपके पास इस हफ्ते 30 मिनट का समय होगा? आप अपनी सुविधा के अनुसार कोई भी दिन और समय बता दें, मैं उस हिसाब से calendar invite भेज दूँगा/दूँगी।

धन्यवाद

Proper Devanagari script. A subject line. The natural Hindi-English hybrid that Indian offices actually use ("current progress", "feedback" embedded in Hindi sentences). The gender-aware "दूँगा/दूँगी" without being prompted. This is what a real professional Hindi email looks like.

Verdict: Gemma 4 wins clearly. It understood that "formal Hindi" is a script and register choice, not just a tone choice.

Prompt 3: News Translation — and an unexpected insight

The prompt:

Translate the following paragraph into Hindi. The translation should sound natural to a native Hindi speaker, not like a word-for-word translation: "The Supreme Court today dismissed the petition challenging the new data protection law, stating that the government had followed due process. The ruling is expected to impact millions of internet users across the country."

Both models produced accurate, readable translations. But Gemma 4 did something GPT-4o-mini did not: it explained its linguistic choices.

GPT-4o-mini:

आज सुप्रीम कोर्ट ने नए डेटा प्रोटेक्शन कानून को चुनौती देने वाली याचिका खारिज कर दी। कोर्ट ने कहा कि सरकार ने इस कानून को लागू करने में सभी जरूरी प्रक्रियाओं का पालन किया है। माना जा रहा है कि इस फैसले का असर देशभर के करोड़ों इंटरनेट उपयोगकर्ताओं पर पड़ेगा।

Clean. Accurate. Natural.

Gemma 4:

सुप्रीम कोर्ट ने आज नए डेटा प्रोटेक्शन कानून को चुनौती देने वाली याचिका को यह कहते हुए खारिज कर दिया कि सरकार ने निर्धारित कानूनी प्रक्रिया का पालन किया है। इस फैसले का असर देश भर के करोड़ों इंटरनेट यूज़र्स पर पड़ने की संभावना है।

Then it added something remarkable — a breakdown of why specific word choices were made:

"खारिज कर दिया" — standard legal term for "dismissed" in Hindi journalism
"निर्धारित कानूनी प्रक्रिया" — captures "due process" professionally rather than literally
"करोड़ों" — noting that "millions" in English maps to "crores" in Indian cultural reference, not "लाख"

That last observation is genuinely sharp. A direct translation of "millions" to "दस लाख" would be technically correct and feel completely wrong to a native reader. Gemma 4 caught that without being asked.

Verdict: Gemma 4 wins — same translation quality, but with deeper linguistic self-awareness.

Prompt 4: Cultural context — the WhatsApp wedding message

The prompt:

Write a heartfelt wedding congratulations message in Hindi for a close friend. It should feel warm and genuine — the kind of message an Indian person would actually send on WhatsApp, not a translated greeting card.

GPT-4o-mini gave one warm, natural Hinglish message. Good tone, genuine feel, appropriate for close friends.

Gemma 4 gave three options — labelled "Deeply Emotional", "Sweet and Simple", and "Playful/Desi" — with a tip section that included culturally specific advice: use "Mere bhai" for a guy friend, "Meri jaan" for a girl friend, and add a personal memory line like recalling who said they'd get married first.

The "Desi" option included the phrase "finally tu bhi 'phass' gaya" — which is authentic Indian friend-group humour that no greeting card would ever produce. That level of cultural specificity is hard to fake.

Verdict: Gemma 4 wins — not just for giving more options, but for demonstrating that it understands how Indian friendships actually communicate.

Prompt 5: Bengali — and the failure that says everything

The prompt was written in Bengali script, asking both models to explain AI to a 15-year-old student in simple Bengali.

GPT-4o-mini's response: It returned the original Bengali prompt, with the English translation below it. No explanation. No attempt. Just the input, echoed back.

This is not a minor miss. Bengali is the seventh most spoken language in the world. It is the official language of West Bengal and Bangladesh. Asking an AI model to respond in Bengali is not an exotic edge case — it is a basic requirement for serving a massive, real user population.

Gemma 4's response was a detailed, properly structured Bengali explanation covering what AI is, how machine learning works (using the analogy of teaching a child to recognise cats), and real-world examples including YouTube recommendations and ChatGPT. Written entirely in proper Bengali script. Age-appropriate. Clear.

Here is a direct excerpt from Gemma 4's Bengali response:

এটি কীভাবে কাজ করে? চলো একটা উদাহরণের মাধ্যমে বুঝি: মনে করো, তুমি একটা ছোট বাচ্চাকে শেখাচ্ছ যে 'বিড়াল' দেখতে কেমন। তুমি তাকে অনেকগুলো বিড়ালের ছবি দেখালে এবং বললে, "দেখো, এটা বিড়াল।" AI-এর কাজ করার পদ্ধতিও ঠিক তেমন।

A cat-recognition analogy to explain machine learning, in Bengali, for a teenager. That is genuinely good pedagogy.

Verdict: Gemma 4 wins by default — GPT-4o-mini did not attempt the task.

The Overall Scorecard

Task	Gemma 4	GPT-4o-mini
Hinglish poem	Natural, emotional	Natural, humorous
Formal Hindi email	Devanagari script, structured ✅	Roman script only ❌
News translation	Accurate + linguistic insight ✅	Accurate only
WhatsApp wedding message	3 culturally aware options ✅	1 good option
Bengali explanation	Excellent, full response ✅	Failed — echoed prompt ❌

Gemma 4: 4 wins, 1 tie. GPT-4o-mini: 0 wins, 1 tie, 2 clear failures.

Why does Gemma 4 perform better here?

This is informed speculation, not a technical deep-dive.

Google has been building Indian language products for years — Google Translate, Google Search in Indic scripts, Google Assistant in Hindi. That corpus and institutional knowledge almost certainly shaped Gemma's training data in ways a model primarily optimised for English-language internet text would not replicate.

Gemma 4's choice to use Devanagari without being asked, its "करोड़ों vs लाख" observation, and its ability to produce three contextually distinct registers in a single response all point to a model that has deeply absorbed how Indian languages actually function — not just what Indian language text looks like.

The practical verdict

Use Gemma 4 when:

Your users communicate in Hindi, Bengali, or other Indic languages
You need correct script output, not just Roman transliteration
Cultural nuance matters — greetings, office communication, social messages
You are building any product for Indian users

GPT-4o-mini still holds its own for:

Casual Hinglish content — it is genuinely good here
English-first tasks with occasional Hindi
Contexts where Roman script is acceptable

The most important finding is not that one model "won." It is that the gap is large enough to matter for real product decisions. If you are building a Hindi chatbot, a regional language assistant, or anything targeting Indian users, defaulting to GPT-4o-mini because it is familiar could mean your users get their own prompts echoed back at them.

That is not a UX problem. That is a trust problem.

Getting started with Gemma 4

All of this testing was done at zero cost:

Gemma 4 31B: Free via OpenRouter — no credit card required
Gemma 4 2B/4B: Available on Google AI Studio for free
Run locally: Download from Hugging Face and run with ollama

The Gemma 4 Challenge runs through May 24 — if this sparked an idea for something you want to build for Indian language users, now is the time.

All responses in this article were captured from actual model runs. Nothing was edited or paraphrased — you are reading what the models actually produced.

DEV Community

I Tested Gemma 4 and GPT-4o-mini on Indian Language Tasks — The Results Surprised Me

The Setup

Prompt 1: Hinglish — the real test of Indian language understanding

Prompt 2: Formal Hindi — where the gap becomes obvious

Prompt 3: News Translation — and an unexpected insight

Prompt 4: Cultural context — the WhatsApp wedding message

Prompt 5: Bengali — and the failure that says everything

The Overall Scorecard

Why does Gemma 4 perform better here?

The practical verdict

Getting started with Gemma 4

Top comments (0)