DEV Community

Cover image for Maatru: An agentic Telugu literacy app for kids, built with Gemma 4
Avinash Seethalam
Avinash Seethalam

Posted on

Maatru: An agentic Telugu literacy app for kids, built with Gemma 4

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I'm from Hyderabad, India. My wife and I both grew up speaking Telugu — it's our mother tongue. We have two daughters, three and one and a half. They're at a daycare where the teachers cover English letters and rhymes, sometimes Hindi, but not Telugu. And here's the part that's harder to admit: my wife and I haven't read or written Telugu in any real way for years. We can speak it fluently, but we lost the script somewhere along the way. When I mentioned this to colleagues at work, several of them said the same thing about their own kids. This is what Maatru is for.

Maatru is a small app for parents like me who can't comfortably teach the script themselves. A kid taps Start and hears a Telugu letter spoken aloud. Four letter buttons appear on screen and they tap the one that matches the sound. After five letters the session ends and they see a "Great job" card. Later, a parent can open a separate dashboard — gated by a PIN — and read a short English paragraph about what their kid practiced that day, which letters they got right, and which ones they're still working on.

My first design had the kid write the letter on paper and photograph it; Gemma 4 would compare it to the target and give feedback. I tested that capability on Day 1 before building anything around it. The moment I knew was when Gemma 4 confidently misread అ — a clean, typed vowel on a white background — as completely different characters: ౦ on cloud, ని locally. It got 1 of 20 right on Gemma 4 E4B running locally and 4 of 20 on the 31B variant via OpenRouter — both models failed even on typed reference, which should have been the easy case. The vision capability wasn't there yet for Indic scripts, at least not reliably enough to be the foundation of a literacy tool.

What I ended up with uses Gemma 4 for the two things it does reliably for Telugu — generating curriculum content in plain text, and making pedagogical decisions about what a specific kid should practice next. The kid loop itself doesn't call the model. The parent dashboard does, once per visit. I'll walk through both in the 'How I Used Gemma 4' section below.

Demo

Code

GitHub logo ai-with-avinash / maatru

Telugu literacy app for kids. Built with an agentic Gemma 4 planner that adapts to each child's progress.

Maatru

A Telugu literacy companion for kids whose parents can speak the language but lost the script.

What this is

Maatru is a small web app for parents like me — Telugu speakers raising kids in English-medium schools, who can't comfortably teach the Telugu writing system because we've lost it ourselves. A kid taps Start, hears a Telugu letter spoken aloud, and taps the matching letter from four options. After five letters they see a "Great job" card. A separate parent dashboard (PIN-gated) shows what the kid practiced — in English — alongside the AI's pedagogical reasoning for why those letters were chosen.

Built for the dev.to Gemma 4 Challenge: Build with Gemma 4.

Architecture at a glance

Two layers, deliberately separated:

  • Layer 1 (kid loop): Deterministic, zero Gemma 4 calls during a session. Renders from a cached SessionPlan. Sub-second response on every tap.
  • Layer 2 (planner): One agentic…

How I Used Gemma 4

I shipped on Gemma 4 31B Dense, accessed via OpenRouter's free tier (model string: google/gemma-4-31b-it:free). I'd tested the E4B variant locally first — running through Ollama on an M4 MacBook Air with 16GB of RAM — and the latency wasn't workable for a kid-facing flow. E4B took 8 to 46 seconds on generation prompts, with two hard timeouts at 60s. The 31B Dense via OpenRouter wasn't fast either (median 5.3s, p95 15.4s in a 50-request stress test), but its function-calling reliability is what made the rest of the architecture buildable: in a smoke test, it produced valid tool calls on 8 out of 8 invocations across two runs. The free tier comes with constraints — 20 requests per minute hard cap and roughly 36% upstream 502s from Google AI Studio — and the architecture was designed around them from day one.

Gemma 4 runs in exactly two places. When a parent opens the dashboard, one call generates an English-language summary of what their kid did that day — practiced letters, where they were confident, where they struggled, what to expect next. When a kid taps Start, one agentic call decides what the session will contain: which letters to target, what distractors to pair them with, what feedback strings to show on correct and wrong taps, and a short paragraph explaining the pedagogical reasoning. The kid loop itself, between session start and end-card, makes zero calls to Gemma 4.

The architectural decision that shapes everything else is bundling. The naive design would have called Gemma 4 once per kid interaction — one call to generate three distractors when a letter appears, another to produce feedback when the kid taps. For a 5-letter session that's roughly fourteen Gemma 4 calls, each taking 5-15 seconds on the free tier and any of which could fail with a 502. Instead, the planner makes one agentic call at session start and returns a SessionPlan that bundles everything the kid loop needs for the whole session: per-step distractors, per-step feedback variants, the session-level reasoning. The kid loop then runs deterministically from the cached plan, and Gemma 4 stays out of the critical path entirely. Roughly fourteen calls becomes one.

The planner is given three read-only SQLite tools — get_recent_sessions, get_letter_accuracy, get_curriculum — and Gemma 4 calls them via function calling before proposing a plan. The output is a SessionPlan that bundles the kid loop's content and a short paragraph of reasoning that explains the pedagogical choice. That reasoning is visible to the parent on the dashboard. After session 1 (a cold start: "starting with the first five foundational vowels"), session 2 read the history and adapted on its own: "The child did well in the first session but struggled slightly with ఇ. We are reinforcing the first five vowels with medium difficulty and introducing ఊ to expand their knowledge." That second paragraph wasn't templated — it was Gemma 4 reading two sessions of attempts and making a teaching call.

The planner call is wrapped in a retry-with-backoff (1s, 3s, 9s; three attempts) that handles 429s, 502s, timeouts, and malformed tool calls. If retries exhaust or the model returns something unparseable, the planner falls back to a deterministic curriculum heuristic — same SessionPlan shape, hand-written letter selection logic, no AI involved. The dashboard knows the difference: planner-driven sessions show their reasoning paragraph; fallback sessions hide it. A 45-second server-side timeout caps the whole thing so the kid never waits more than ~46 seconds from tapping Start. During verification, the fallback path fires often enough — sometimes 25-40% of the time under heavy upstream instability — that I treat it as a normal operating mode, not an exception.

Top comments (0)