This is a submission for the Gemma 4 Challenge: Build with Gemma 4
My cousin has a learning disability.
Not the kind people notice immediately. She holds a conversation fine. She laughs at the right moments. She is sharp in ways that matter.
But put her in front of a dense webpage, a medical article, a GitHub README, a LinkedIn thread and something shifts. The words blur. The structure overwhelms. She closes the tab and calls me.
For two years, I was her human filter for the internet.
The Problem Nobody Talks About
The internet assumes you can:
- Read fast
- Parse dense structure
- Context-switch without losing the thread
- Understand jargon on sight
A lot of people cannot. And nobody is building for them.
My cousin is not alone. People with dyslexia, ADHD, processing disorders, low digital literacy, and non-native English speakers all hit the same wall every day. They just hit it quietly.
I got tired of being the workaround. So I built Aura.
What Aura Is
Aura is a Chrome extension that puts Gemma 4 directly on every webpage.
No tab switching. No copy-pasting into ChatGPT. No context lost.
You click the floating orb. A panel slides in. You pick what you need:
- Summarize Page — get the key points in seconds
- Explain Code — understand what it does and why
- Draft Reply — reply to LinkedIn messages that match the tone of the conversation
- Create Post — turn any article into LinkedIn post ideas
- Highlight & Ask — select any text on the page and ask Aura anything about it
The AI lives on the page with you. You never leave.
The Demo
Why I Migrated from Llama to Gemma 4
I originally built Aura with Llama 3.1 8B via Cloudflare Workers AI.
It worked. Responses came back. Features ran.
But when I swapped to Gemma 4 31B, I felt the difference in the first response.
Llama told me what the code did. Gemma 4 told me why it was written that way.
Llama drafted a generic professional reply. Gemma 4 read the tone of the conversation and matched it.
For a general tool, that gap is a nice-to-have. For a tool built for people who struggle with comprehension that gap is everything.
Why Gemma 4 31B Specifically
Gemma 4 comes in three variants. I did not pick 31B by default. I picked it deliberately.
| Model | Why I didn't pick it |
|---|---|
| 2B / 4B | Too shallow for the reasoning depth Aura needs across wildly different content types |
| 26B MoE | Great for edge inference but Aura needs consistent quality across all content types, not specialized routing |
| 31B Dense | ✅ Full parameter activation. Maximum reasoning quality. Consistent across every content type. |
Here is why dense architecture matters for Aura specifically:
MoE models route tokens through specialized subnetworks they activate only some parameters depending on the input. That is efficient. But Aura handles a GitHub README, a LinkedIn thread, a medical article, and a Stack Overflow answer sometimes in the same session.
Dense models activate all parameters for every token. Gemma 4 31B does not guess which expert to wake up. It brings everything it knows to every single interaction.
For a tool where the content changes every tab and the user cannot afford an inconsistent experience that consistency is not optional.
The Technical Implementation
Aura is plain HTML, CSS, and JavaScript. No framework. No backend. No server.
The Gemma 4 API call lives directly in the content script:
const GEMMA_API_URL = 'https://generativelanguage.googleapis.com/v1beta/models/gemma-4-31b-it:generateContent';
async function callNova(prompt) {
const systemTurn = {
role: 'user',
parts: [{ text: `You are Aura, a helpful AI assistant in a browser extension.
The user is on: ${window.location.href}.
Page context:\n\n${currentPageContent}\n\n
Respond concisely and directly. Never introduce yourself.
Never mention you are an AI. Output only the final answer.` }]
};
const systemAck = {
role: 'model',
parts: [{ text: 'Understood.' }]
};
const response = await fetch(`${GEMMA_API_URL}?key=${GEMMA_API_KEY}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
contents: [systemTurn, systemAck, ...history, ...messages],
generationConfig: { temperature: 0.7 },
thinkingConfig: { thinkingBudget: 0 }
}),
});
const data = await response.json();
return data.candidates?.[0]?.content?.parts?.[0]?.text || 'No response.';
}
A few things worth noting:
The system turn pattern — Gemma 4's API does not have a native system role. I simulate it by injecting a user turn with the system context, followed by a model acknowledgment. This grounds the model before the actual conversation starts.
thinkingBudget: 0 — Gemma 4 31B is a reasoning model. Left unconstrained, it outputs its full reasoning trace — tasks, constraints, drafts, self-checks — before the final answer. Setting thinkingBudget: 0 suppresses that and returns only the final output to the user.
Page content extraction — Aura reads the page using a priority selector chain before falling back to document.body.innerText, capped at 4000 characters to stay within context limits.
Conversation history — Follow-up chat is supported. Every user and model turn is stored in conversationHistory and injected into the next request, giving Aura memory within a session.
What Changed When I Migrated
| Feature | Llama 3.1 8B | Gemma 4 31B |
|---|---|---|
| Page summarization | ✅ Decent bullets | ✅ Structured, context-aware |
| Code explanation | ✅ Describes what code does | ✅ Explains why it was written that way |
| Reply drafting | ⚠️ Generic professional tone | ✅ Matches the tone of the actual conversation |
| LinkedIn post creation | ⚠️ Template-like output | ✅ Distinct voice per post |
| Highlight & Ask | ✅ Works | ✅ Deeper reasoning on complex selections |
The migration took under 10 minutes. One endpoint. One model string. The quality difference was not subtle.
Who This Is Actually For
- People with dyslexia who need content restructured instantly
- People with ADHD who lose the thread switching tabs
- Non-native English speakers navigating professional content
- Elderly users overwhelmed by dense web pages
- Anyone the internet was not designed for
My cousin does not need a faster browser. She needs the information to come to her in a form she can hold.
Aura does that. Gemma 4 31B makes it good enough to actually help.
What Is Next
The next version adds multimodal support sending a page screenshot alongside the text so Gemma 4 can reason about charts, diagrams, and images, not just words.
My cousin once sent me a screenshot of a medical form she could not understand. I read it to her over the phone.
Aura will eventually do that too.
Links
🔗 GitHub: https://github.com/Valentinetemi/Aura
Aura was originally built for the Airia AI Agents Hackathon. The Gemma 4 migration was done for this challenge and honestly, it should have been Gemma from the start.
Top comments (0)