This is a submission for the Gemma 4 Challenge: Build with Gemma 4
My 6-year-old asks me four hundred questions a day — about clouds, his shadow, whether ants have birthdays. I love it, but I can't always stop what I'm doing, and the usual fallbacks (Google, YouTube, a generic chatbot) are either too dense, too distracting, or too unsafe to hand a small child. Curio Kid is the app I built so my son can keep asking — and actually get warm, kid-friendly answers — without me worrying about what he sees next.
What I Built
Curio Kid is a kid-safe Android app where a child asks anything — by typing, snapping a photo, attaching an image, or just talking — and gets a warm, age-appropriate answer from Luna, an AI tutor powered by Gemma 4. Answers are short on purpose: 2–5 sentences, an everyday analogy (Lego, swings, fruit), and a follow-up question to keep the curiosity loop running.
Designing it for my own kid forced some opinionated choices:
- He can't reliably read or type yet, but he can talk and point a camera. Voice and camera are first-class inputs, not afterthoughts.
- He will absolutely test the safety rails. Kids ask wild things ("what happens if I drink poison?", "why do people fight in wars?") — Luna has to handle them gracefully every single time.
- I want to know what he's curious about, not spy on him. Hence the Curiosity Digest — a daily themed summary, not a chat log.
What makes it more than "yet another chatbot wrapper":
- Multimodal input — text, gallery image, live camera, on-device speech-to-text.
-
Safety as a hard requirement — locked-down system prompt + Gemini safety thresholds pinned to
LOW_AND_ABOVEacross harassment, hate, sexually explicit, and dangerous content; unsafe topics get a fixed redirect to "a trusted adult." - Parent Dashboard — PIN-gated, with a one-tap Curiosity Digest: themes, highlights with quotes, dinner-table conversation starters, and an "anything to flag?" section.
-
Privacy-first — API key + PIN in
EncryptedSharedPreferences(AES-256); question history in a local Room DB, excluded from cloud backup; the only network call is to the model endpoint with the user's own key. - Three interchangeable Gemma 4 back-ends — not every family phone can host a multi-gigabyte model on-device, so Google AI Studio (default, free tier, multimodal), OpenRouter, and a scaffolded on-device path are all swappable from Settings.
- Output cleaning — Gemma 4 sometimes thinks out loud ("Final Polish:", "Let me revise…"); a post-processor strips those leaks so the child only sees the final answer.
Demo
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/Home.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/1i.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/2i.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/3i.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/4i.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/5i.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/6i.png
https://raw.githubusercontent.com/sann3/curio-kid/main/demo/final.mp4
Code
GitHub: github.com/sann3/curio-kid.
How I Used Gemma 4
Curio Kid exposes two Gemma 4 variants in the model picker, and the choice is intentional.
gemma-4-26b-a4b-it — 26B Mixture-of-Experts (default)
The daily driver. A kid-facing chat app needs three things at once: multimodal, fast first-token latency, and smart enough to teach. MoE hits all three — only a slice of experts fires per token, so latency feels ~4B-class while depth stays 26B-class. In practice:
- A child holding up a beetle to the camera gets an answer in a couple of seconds, not ten.
- Streaming starts almost instantly, so chat bubbles fill in live (and incidentally dodge the Gemini SDK's hard-coded 80s socket timeout — Curio Kid uses
generateContentStreamfor exactly this reason). - The 256K context window means the whole day's history fits into a single Curiosity Digest call — no RAG, no summarisation tricks.
- Same model handles "Why is the sky blue?" and a photo of a moth.
Dense is overkill for "explain photosynthesis in three sentences"; E2B/E4B don't yet match 31B-class reasoning on the harder "why" questions kids love. MoE is the right middle.
gemma-4-31b-it — 31B Dense (optional "thinker" mode)
For genuinely hard questions ("Why do mirrors flip left-and-right but not up-and-down?"). Slower and pricier per call, but noticeably better on multi-step or counterintuitive reasoning. Same persona, same safety, same UI — just a heavier brain when the curiosity warrants it.
Why not E2B / E4B by default?
On-device is fully wired up via MediaPipe LLM Inference — Settings → On-device downloads a vision-capable Gemma 4 .task (resumable, sha256-checked, metered-network aware) and runs it through a process-wide LlmInference singleton with addImage for the camera path. But cloud stays the default because:
- Not every phone can run Gemma 4 locally. Multi-GB models need RAM and storage the hand-me-down tablet a kid actually uses doesn't have. Gating first launch behind "Pixel 8 Pro + 1.6 GB cellular download" defeats the point.
- Quality > offline for a six-year-old. Being told "the moon is made of cheese" by an under-cooked tiny model is worse than waiting two seconds over Wi-Fi.
So Google AI Studio is the zero-friction default, OpenRouter is the alt-cloud, and on-device is one Settings tap away for capable phones — same LlmBackend interface, same prompts, same cleaner.
Where Gemma 4 actually does the work
-
The chat. Multimodal
(image + history + question) → kid-friendly paragraph. The system prompt is strict (2–5 sentences, analogies, ≤2 emojis, one follow-up, no markdown) and Gemma 4 follows it remarkably well. - Safety reasoning. Instead of a blocklist, Luna reasons about whether a topic is age-appropriate and produces a fixed redirect line — Gemma 4 is instruction-faithful enough to honour a "ONLY reply with this exact sentence" clause while still engaging naturally with the 99% of fine questions.
- The Curiosity Digest. Day's transcript → structured markdown summary (themes / highlights / conversation starters / flags) in one shot — long-context + structured-output, no orchestration framework.
Bits I had to engineer around Gemma 4's quirks
-
Chain-of-thought leakage. Gemma 4 occasionally emits "Final Polish:" / "Self-Correction:" / "Let me rewrite…" before its real answer.
cleanLunaReply(LunaAI.kt) detects anchors, drops planning paragraphs, and strips markdown emphasis — without nuking legit phrases like "Let me think of a fun example!". -
MAX_TOKENSstops. The Gemini SDK throwsResponseStoppedExceptioninstead of returning partial text; I catch it on both one-shot and streaming paths and surface what already arrived. -
80s socket timeout. Hard-coded in the Kotlin SDK with no
RequestOptionsoverride. Streaming resets the read timer per chunk, so slow first-byte doesn't kill the request. -
Friendly errors. One
friendlyError()mapper turns every 4xx/5xx/safety/quota/network failure into one short, kid-readable sentence ("Wow, so many questions today! Let's wait a minute and try again."), while logging the raw exception to a debug ring buffer.
Gemma 4 unlocked something I couldn't have shipped a year ago: a multimodal, instruction-faithful, locally-routable model smart enough to teach a six-year-old about black holes, safe enough to hand to that six-year-old, and efficient enough to be the default tier of a free app.
Thanks to the DEV team and Google for the challenge!
Top comments (3)
This is a brilliant and highly practical application of Gemma 4! Any parent can relate to the "400 questions a day" phase. Building a custom local tutor is an amazing way to satisfy a child's curiosity safely. Great job on the execution and good luck in the challenge!
Thank you for your feedback
Really like how you handled the agent orchestration approach. We've been thinking about it differently — How did you validate these results beyond the initial test?
Great stuff — followed for more! 👍