This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
AULA is a complete AI tutoring platform that runs Google's Gemma 4 entirely inside the browser — no server, no account, no internet required after the first 1.5 GB download. It is designed for the 65+ million Latin American students living in areas where reliable internet is the exception, not the norm.
The premise is simple: if Gemma 4 can run on a Raspberry Pi 5, it can run on a teacher's laptop in rural Boyacá, Colombia. With WebGPU and MediaPipe, this is now possible — and AULA is what that looks like as a finished product.
The problem AULA solves
In Latin America, ~40% of students live with unreliable, capped, or non-existent connectivity. ChatGPT, Gemini, Khan Academy's AI tutor — all require a stable connection. The very tools that could close the global education gap are inaccessible exactly where they are needed most.
AULA flips this: the AI runs on the student's device, not on a server thousands of miles away.
What AULA does — offline (100% local, Gemma 4 E2B)
After loading once, these features work with WiFi off, in airplane mode, in a rural school with no signal:
- 🎓 Conversational tutor — chat with Gemma 4 in natural language. Full LaTeX rendering for math and science. ~15 tokens/sec on a mid-range laptop GPU.
- 🧮 Scientific calculator that teaches — visual keypad with trig functions, exponents, roots. Gemma 4 doesn't just solve. It explains the why.
- 🎙️ Voice tutoring (bidirectional) — ask by speaking, listen to the response. Optional hands-free mode chains them together.
- 🦉 Socratic mode — Gemma 4 stops giving answers and only asks guiding questions. Pedagogy-first.
- 🤔 "Explain it simpler" — three escalating reformulation levels on demand.
- 💡 Conceptual error detection — Gemma 4 diagnoses which concept the student misunderstood, not just "wrong, try again".
- 📚 Persistent study sessions in IndexedDB. No cloud sync ever.
- ♿ Accessibility first — high contrast, large text, easy reading mode (for dyslexia), auto-read responses.
- 🌍 Spanish ↔ English — full i18n. System prompts translate, not just the labels.
- 🏆 Local gamification — XP, levels, streak, achievements. All in the browser.
What AULA does — Cloud Boost (optional, Gemma 4 26B-A4B)
For features that require strict structured output (which is beyond what a 2B-parameter model can do reliably), AULA routes through the user's own free Google AI Studio API key:
- ✍️ Handwritten whiteboard — draw equations with finger or mouse, Gemma 4 reads and solves.
- 📷 Photo OCR + reasoning — point camera at a printed exercise, get a step-by-step solution.
- ♾️ Infinite adaptive practice — exercises that never repeat, with difficulty calibrated dynamically.
- 🎯 Interactive student quiz — self-assessment with scoring and per-error conceptual review.
- 👩🏫 Teacher mode with PDF export — generate quizzes, export student/teacher PDFs ready to print.
- 🎨 SVG illustrations — Gemma 4 generates educational diagrams.
- 🗺️ Mermaid mind maps — concept diagrams rendered interactively, downloadable as PNG/SVG.
Critical: Cloud Boost is always opt-in. AULA never sends data without an explicit API key configured by the user. The core educational experience never requires the internet.
Demo
🎥 Watch the 2-minute walkthrough: https://youtu.be/d0jN8Kw_Cz4
🔗 Live demo: https://aula.run (or local: pnpm dev -p 3100 after cloning)
Key screenshots
Chat tutor running 100% locally with full LaTeX rendering

Mermaid mind maps generated by Gemma 4 — click to enlarge, download as PNG

SVG illustrations — educational diagrams generated by Gemma 4

Scientific calculator that explains, powered locally

Teacher mode with PDF export — ready for classroom

Accessibility built-in: high contrast mode

Code
🔗 Repository: https://github.com/jpablortiz96/aula
The repo includes a comprehensive README with architecture diagrams, hardware benchmarks across devices (Raspberry Pi 5 to RTX 3050 to MacBook M3), full tech stack documentation, and a roadmap for v1.1 through v3.0.
License: MIT
How I Used Gemma 4
AULA uses a dual-engine architecture with intentional model selection for each tier:
| Model | Variant | Where it runs | What it powers |
|---|---|---|---|
| Gemma 4 E2B-IT | ~1.5 GB (q4f16 quantized) | Browser, via MediaPipe + WebGPU | All offline features |
| Gemma 4 26B-A4B-IT | Cloud (MoE) | Gemini API | Structured-output features |
Why Gemma 4 E2B for local
The E2B variant is the only Gemma 4 model that fits realistically on consumer hardware while preserving the multimodal capability path. It runs at:
- ~15 tokens/sec on an NVIDIA RTX 3050 laptop
- ~20-25 tokens/sec on a MacBook M3
- ~7 tokens/sec on a Raspberry Pi 5 (CPU fallback)
This range covers every realistic device a Latin American student or teacher might have access to — from a $80 SBC to a school laptop. The 31B Dense model would never fit in a browser tab; the 26B MoE requires server-grade resources. E2B is the only viable choice for the rural offline use case, and that's exactly why I picked it.
Why Gemma 4 26B-A4B for cloud-enhanced features
Some features in AULA require strict structured output: JSON for quiz exercises, syntactically-valid Mermaid for mind maps, coherent SVG for illustrations. Small models are unreliable for this — they're brilliant at conversation but tend to add prose around JSON, produce malformed SVG, or break Mermaid syntax.
Rather than fight this limitation or hide it, AULA makes the routing explicit and visible to the user. Every screen shows which engine answered: green badge for local, blue badge for cloud. The 26B-A4B variant gives me near-31B quality at substantially lower latency thanks to its mixture-of-experts architecture — ideal for short structured outputs.
Technical challenges I solved
1. transformers.js was not viable on NVIDIA Optimus laptops.
My first prototype used transformers.js + WebGPU. On an RTX 3050, I got 2 tokens/sec because dispatch was routing through the iGPU. Migrating to MediaPipe's WebGPU delegate unlocked 14-16 tokens/sec on the same hardware — a 7x improvement. MediaPipe is Google's official runtime for Gemma 4 on edge, and the difference is real.
2. Concurrency on LlmInference is exclusive.
A single MediaPipe LlmInference instance processes one prompt at a time. When /chat and /practice competed for the same singleton, the model locked with Previous invocation or loading is still ongoing. I implemented a FIFO queue with abort propagation across navigations, plus a forceReset() recovery path.
3. Gemma 4 26B does not support streamGenerateContent reliably.
This took an afternoon of DevTools debugging to identify: calling :streamGenerateContent returned 400, while :generateContent (no streaming) worked perfectly. The fix was creating a separate cloudNoStream.ts helper for Practice, Illustrator, and Mermaid — features that don't benefit from streaming anyway since the user is waiting for one complete response.
4. Easy Reading Mode is more than a CSS toggle.
For students with dyslexia or reading difficulties, AULA changes both the visual presentation (letter spacing, line height, max-width) and the system prompt sent to Gemma 4 ("Short sentences. Simple vocabulary. One idea per line."). This is the kind of accessibility that AI uniquely enables — the model adapts its output style, not just the typography.
What Gemma 4 unlocked that wasn't possible 18 months ago
Browser-native inference at this quality was genuinely impossible until WebGPU stabilized. AULA is only buildable in 2026. The combination of Gemma 4's open weights, WebGPU's GPU access, and MediaPipe's optimized runtime is what makes a Pi-friendly AI tutor a real thing, not a thought experiment.
For 65 million students in Latin America who have been excluded from the AI revolution, this matters more than I can describe in this post.
Tech stack: Next.js 15, TypeScript strict, Tailwind v4, MediaPipe LLM Inference, WebGPU, Gemini API (REST + SSE), Zustand, IndexedDB, jsPDF, Mermaid, tesseract.js, Web Speech API.
Built solo in 11 days for the DEV.to Gemma 4 Challenge.
AULA is open source under MIT. Fork it, run it in your school, contribute to it. If you're a teacher in a low-connectivity region and want help deploying AULA, open an issue on GitHub.
🇨🇴 Made in LATAM, for the students the world forgot.
Top comments (0)