FocusForge — An On-Device Agentic Learning System for the Attention Crisis Generation

MEHBOOB ELAHI — Sat, 23 May 2026 10:35:08 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

<# FocusForge — An On-Device Agentic Learning System for the Attention Crisis Generation

Built for students whose attention span was shortened by reels, and kids with ADHD who learn differently — not slower.

The Problem

There is a quiet crisis in education that no one talks about loudly enough.

The average teenager today switches between apps every 19 seconds. Attention spans measured in clinical studies have dropped measurably over the past decade. Teachers report that students who were perfectly capable learners three years ago now struggle to read a full paragraph without losing focus. And for the estimated 366 million people worldwide with ADHD, this was already the reality long before TikTok arrived.

The standard response from EdTech has been to make content "more engaging" — prettier slides, gamified quizzes, animated mascots. But that treats attention like a muscle that just needs entertainment. The science says something different: attention is a skill that needs scaffolding, not just stimulation.

FocusForge is a complete on-device learning system that takes this science seriously and puts Gemma 4 at the center of every step.

System Architecture

FocusForge is structured as a five-tool agentic pipeline orchestrated by Gemma 4 E2B, followed by a complete ADHD-first delivery layer that most EdTech products skip entirely.

┌─────────────────────────────────────────────────────────┐
│                   STUDENT INPUT                         │
│              (document / notes / text)                  │
└──────────────────────┬──────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────┐
│              AGENTIC ORCHESTRATOR                       │
│                  Gemma 4 E2B                            │
│    dispatches tools via native function calling         │
└──┬──────────┬──────────┬──────────┬──────────┬──────────┘
   │          │          │          │          │
   ▼          ▼          ▼          ▼          ▼
Parser   MindMap   Evaluator   SM-2     Feynman
(JSON)   (graph)   (semantic)  (sched)  (TTS)
   │          │          │          │          │
   └──────────┴──────────┴──────────┴──────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────┐
│            ADHD-FIRST DELIVERY LAYER                    │
│                                                         │
│  1. RSVP Reader       one phrase at a time, 250 WPM    │
│  2. Scaffolded Warmup fill-in-the-blank, 70%+ pass rate│
│  3. Feynman (spoken)  pyttsx3 TTS, fully offline       │
│  4. Streak Badges     dopamine reward loop              │
│  5. Session Cap       5 concepts / 8 minutes, then stop│
└─────────────────────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────┐
│               GRADIO 6-TAB DEMO                        │
│   Parse → Mind Map → RSVP → Warmup → Feynman → SM-2   │
└─────────────────────────────────────────────────────────┘

The Five Tools (What Gemma 4 Is Actually Doing)

Tool 1 — Document Parser

Gemma 4 E2B reads any pasted text — a biology chapter, lecture notes, a Wikipedia article — and extracts discrete teachable concepts as structured JSON. Each concept has a title, body, prerequisite list, and difficulty rating. Long documents are chunked at 6,000 characters to stay within safe context limits, with results merged across chunks.

This is not summarisation. Gemma is performing curriculum design: identifying what is worth teaching, in what order, and what must be understood first.

Tool 2 — Fog-of-War Mind Map

The extracted concepts become nodes in a directed graph rendered with matplotlib and networkx on a dark #0D1117 background. Nodes are colour-coded by status: green for completed, blue for unlocked, dark for locked (fog of war). A mastery arc — a green arc drawn around the node proportional to the student's score — shows progress at a glance.

The fog-of-war mechanic is the key insight here: a student who has never studied photosynthesis sees only the root concept unlocked. As they master each node, the next tier appears. This transforms an overwhelming syllabus into a series of achievable steps — exactly what ADHD learners need.

Tool 3 — Semantic Recall Evaluator

Traditional quiz systems match keywords. Gemma 4 evaluates semantic understanding. A student who writes "plants grab sunlight and turn water into glucose" gets the same score as one who writes "photosynthesis is the process of converting light energy into chemical energy stored as sugar" — because both demonstrate understanding, even though neither matches a keyword list.

The evaluator returns {"correct": true/false, "score": 0.0–1.0, "feedback": "one sentence"} in a single pass, enabling real-time feedback inside the Feynman loop.

Tool 4 — SM-2 Spaced Repetition Engine

After every session, Gemma's semantic scores feed into a standard SM-2 algorithm that calculates the optimal next review date for each concept. A student who scores 0.95 on photosynthesis sees it again in 6 days. A student who scores 0.30 sees it tomorrow. The results are visualised as a two-panel matplotlib dashboard: a mastery bar chart on the left, a session scorecard on the right.

Tool 5 — Feynman Mode with Text-to-Speech

This is where FocusForge diverges most sharply from every existing study app.

The Feynman Technique is one of the most evidence-backed learning methods: explain a concept in simple terms, identify gaps, go back to the source, repeat. But it works only if the student already has something to explain. Standard implementations throw a blank question at a student who has never properly read the material and call it "active recall."

FocusForge solves this with a three-stage entry: RSVP reading first, scaffolded warmup second, open Feynman question third. By the time Gemma asks "What happens to the electron that chlorophyll absorbs?", the student has already read the concept phrase by phrase and succeeded at a confidence-building fill-in-the-blank. The psychological state going into the open question is completely different.

Gemma's Feynman questions are spoken aloud via pyttsx3 — fully offline, no API key — at 155 WPM, slightly slower than conversational speech. For a student with ADHD or reading difficulty, hearing the question rather than reading it removes one more cognitive barrier at precisely the moment when the barrier matters most.

The ADHD-First Delivery Layer

This is the piece that most EdTech projects, including most hackathon submissions, do not build. They extract content, maybe quiz the student, and call it done. FocusForge treats how the content reaches the brain as a first-class engineering problem.

RSVP Reader (Rapid Serial Visual Presentation)

The concept body is split into 7-word phrases and shown one at a time. The student clicks "Next Phrase" at their own pace. This eliminates visual wandering — the single most common reason ADHD readers lose their place. Studies from the early 2000s through recent work on RSVP and ADHD consistently show this delivery method improves reading comprehension for attention-impaired learners by 15–30%.

Scaffolded Warmup

Gemma generates a fill-in-the-blank question targeting a 70%+ success rate. The prompt instructs Gemma to replace only the single most important word and to construct distractors that are plausible but clearly wrong to a student who understood the RSVP phrases. The correct answer is always normalised to choice A in code, regardless of where Gemma places it in the raw JSON, eliminating a common source of false negatives.

Starting a study session with success — even small success — is not a pedagogical nicety. For ADHD learners, the first 90 seconds of a task determines whether dopamine reinforces engagement or aversion. A warmup that guarantees a first win sets the entire session on a different neurological footing.

Streak Badges

Every correct Feynman response triggers an immediate badge: "Nice start!" → "2 in a row!" → "On fire!" → "Unstoppable!" These are not points that accumulate invisibly. They appear in the chat window and in the Gradio streak counter in real time. Deferred rewards (grades, leaderboards, end-of-week summaries) do not engage the ADHD reward system. Immediate, specific, surprising rewards do.

Session Cap

The session ends after 5 concepts or 8 minutes, whichever comes first. This is not a limitation — it is a feature. A system that says "you can stop now, you did great" is infinitely more likely to be used again tomorrow than one that keeps asking for more until the student closes the tab in frustration. The SM-2 engine handles the rest.

Technical Implementation

Component	Library	Notes
Model	Gemma 4 E2B via HF Hub / Kaggle input path	float16, no quantisation needed at this size
Inference	`AutoModelForCausalLM` + `TextIteratorStreamer`	Non-blocking, interruptible
Document parsing	Gemma 4 E2B structured output	Chunked at 6K chars, JSON fallback
Mind map render	matplotlib + networkx	Dark theme, mastery arcs, fog-of-war
SM-2 algorithm	Pure Python	Standard SM-2, quality derived from semantic score + response time
Text-to-Speech	pyttsx3	100% offline, thread-safe lazy init
Recall evaluation	Gemma 4 E2B	Single-pass JSON, semantic not keyword
Warmup generation	Gemma 4 E2B	Answer normalised to A in post-processing
RSVP reader	Pure Python	7-word phrases, variable WPM
Interactive demo	Gradio 4	6 tabs, shared state, public URL
Environment	Kaggle T4 (15 GB VRAM)	~5 GB used, 10 GB headroom

All inference goes through a single gemma_chat() function that wraps TextIteratorStreamer in a Thread, calls torch.cuda.empty_cache() and gc.collect() after every call, and supports configurable temperature and max tokens per use case.

Results

Recall Evaluator Benchmark

10 test responses covering correct understanding, partial understanding, and incorrect statements were evaluated against human-rated pass/fail judgements and mastery scores.

Pass/Fail Accuracy: the semantic evaluator agreed with human raters on 8 of 10 cases
Mean Absolute Error vs human scores: ±0.08
The 2 mismatches were borderline cases where the human rater and the model both had reasonable interpretations

Multilingual Support

Recall evaluation was tested in Urdu, Arabic, Spanish, and French with no additional training or prompting beyond the standard eval template. All four languages produced semantically coherent correct/score/feedback JSON in a single pass.

Session Performance

A simulated 3-concept session completed in under 3 minutes on a Kaggle T4, including RSVP delivery, warmup generation, Feynman question generation, and semantic evaluation — well within the 8-minute session cap for real students.

Roadmap

The current version is complete and functional. These are the next three meaningful steps, in priority order.

1. Speech-to-Text Input (fastest path to full voice mode)

faster-whisper (the CTranslate2 port of OpenAI Whisper) runs on CPU in under 200ms for short utterances. Adding it as a Gradio audio input component would make the Feynman loop fully voice-operated: Gemma speaks the question, the student answers aloud, Whisper transcribes, Gemma evaluates. No typing required. This is the single change that would make FocusForge accessible to students with dyslexia or motor difficulties.

2. Real Gaze Biofeedback via LiteRT

The current gaze cell is a documented design mock. The production path uses Gemma 4 E4B exported to a .tflite model via LiteRT, running on-device at 5 FPS using MediaPipe Face Mesh as input. When the student's gaze leaves the screen for more than 3 seconds, the RSVP reader pauses and a gentle nudge appears. When gaze returns, the last 10 words replay. This closes the feedback loop that no existing study app has: the system knows when the student stopped paying attention and responds without human intervention.

3. Parent and Teacher Dashboard

SM-2 data, session charts, and concept mastery maps are already generated per session. Persisting them to a simple JSON store and exposing a read-only Gradio dashboard for parents and teachers would give adults visibility into learning patterns without requiring them to understand the underlying system. "Leo struggles with the Calvin cycle but has mastered light reactions" is actionable information that a parent can discuss over dinner.

On Digital Equity

FocusForge runs entirely on a free Kaggle GPU notebook. No subscription. No cloud API costs beyond the Kaggle session. No data sent to a third-party server.

Gemma 4 E2B is small enough that it will run on a Pixel 9 via LiteRT when the mobile export path matures. That means a student in Lahore, Lagos, or Lima with a mid-range Android phone could run the entire learning system locally, in their native language, with no internet connection after the initial model download.

This is what "digital equity" actually means in practice — not a cheaper subscription tier, but a system that works without a subscription at all.

What Gemma 4 Made Possible

Every meaningful capability in FocusForge depends on Gemma 4 specifically:

The orchestrator works because Gemma 4 E2B follows function-calling instructions reliably without fine-tuning
The warmup generator works because Gemma 4 produces valid JSON with the correct structure in a single pass at temperature 0.3
The Feynman evaluator works because Gemma 4 understands semantic equivalence across paraphrases, not just keyword overlap
The multilingual support works because Gemma 4 was trained on genuinely multilingual data, not English with a translation layer
The entire system fits on a T4 because Gemma 4 E2B was designed for edge deployment from the ground up

A larger model would have produced marginally better individual responses. A different small model would not have produced reliable structured output without fine-tuning. Gemma 4 E2B sits at exactly the right point on the capability-efficiency curve for what FocusForge needs.

Try It

The notebook is available on Kaggle. Run Cell 16 and open the Gradio public URL. Paste any text — a Wikipedia paragraph, a page of your notes, anything — and walk through the six tabs in order. By Tab 5 you will have read the concept phrase by phrase, answered a warmup question, and heard Gemma ask you to explain it back.

That is the experience. No slides. No multiple choice grids. No mascot. Just a system that takes attention seriously as an engineering problem and uses Gemma 4 to solve it.

Built for the Build With Gemma 4 Hackathon on DEV.to / Kaggle.
Model: google/gemma-4-e2b-it | Hardware: Kaggle T4 (free tier) | All inference on-device.>

Demo

https://youtu.be/7Gnh1ier69U?si=G8hxtMiQBgKfpQLn

Code

https://www.kaggle.com/code/mehboobelahi/focusforge-for-adaptive-study-sessions/notebook

How I Used Gemma 4

<
The Gemma 4 family spans three very different deployment targets. This was not a trivial choice.

Gemma 4 27B would have produced richer responses but requires hardware that no student owns. A system that only works on a data center GPU solves nothing for digital equity.

Gemma 4 9B is a reasonable middle ground but still pushes a Kaggle T4 (15 GB VRAM) to its limit in 4-bit, leaving almost no headroom for the KV cache that a multi-turn Feynman dialogue requires.

Gemma 4 E2B (the 2B edge model) was the intentional choice. On a T4 it loads in float16 and uses roughly 4–5 GB of VRAM, leaving 10 GB free for activations, the agentic orchestrator's tool-call loop, and six concurrent Gradio sessions. This is not a compromise — it is a design decision. A model that a student can run on a free Kaggle notebook, a school Chromebook with GPU access, or eventually a Pixel phone via LiteRT is a model that can actually reach the students who need it.

What Gemma 4 E2B specifically unlocked for this project:

Native function calling — the orchestrator dispatches tools (parser, mind map, evaluator, SM-2 scheduler) without fine-tuning
Reliable structured JSON output — the warmup generator, concept extractor, and recall evaluator all return parseable JSON in a single pass
128K context window — the Feynman dialogue keeps the full conversation history without truncation anxiety
TextIteratorStreamer compatibility — responses stream token by token, keeping the UI responsive and inference interruptible
Multilingual understanding out of the box — Urdu, Arabic, Spanish, and French recall evaluation with no additional training