Memoria - A Local AI Reading Companion Powered by Gemma 4

Santhosh L — Sat, 23 May 2026 13:09:21 +0000

Memoria — A Local AI Reading Companion Powered by Gemma 4

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Reading long books can be difficult even for people who love reading.

Readers forget characters, lose track of earlier events, struggle with dense prose, or return to a book after a break and feel disconnected from the story. For readers with ADHD, memory difficulties, cognitive fatigue, or accessibility needs, this becomes even harder.

Memoria is a local AI reading companion powered by Gemma 4 that helps readers stay connected to books through spoiler-safe recaps, contextual Q&A, character memory, speaker attribution, and text simplification — all while running locally on the user’s machine.

The app combines an EPUB reader with AI-powered reading support features including:

Spoiler-safe chapter recaps
Character memory tracking
Speaker attribution for dialogue
Contextual book Q&A
Passage explanations
Text simplification for difficult prose
Retrieval-based memory of earlier chapters

Everything runs locally using Gemma 4 through llama.cpp, so readers do not need a paid AI subscription or constant internet access.

Demo

Features shown in the demo

Uploading and processing EPUB books
AI-generated chapter recaps
Character tracking across chapters
Context-aware Q&A
Highlight-to-explain workflow
Text simplification for difficult passages
Spoiler-safe retrieval limited to completed chapters

Code

GitHub Repository: https://github.com/Santhoshl2312/Gemma_book_reader

Main technologies used

Gemma 4 E2B
llama.cpp
FastAPI
SQLite
ChromaDB
Vanilla JavaScript
HTML/CSS

How I Used Gemma 4

Memoria uses Gemma 4 as the core local reasoning engine for the entire reading experience.

I used the Gemma 4 E2B model through a local llama.cpp OpenAI-compatible server, allowing the application to run fully offline without relying on cloud APIs.

Why Gemma 4 E2B?

I specifically chose Gemma 4 E2B because it was the best fit for a responsive local reading assistant.

The project needed:

Fast inference speeds
Low VRAM usage
Good reasoning quality
Reliable structured outputs
Practical local deployment on consumer hardware

Gemma 4 E2B delivered the right balance between speed and capability, making it possible to provide near real-time responses for recaps, contextual Q&A, text simplification, and chapter processing while still running locally through llama.cpp.

This was especially important because the app performs many smaller AI tasks continuously in the background while the user reads.

What Gemma 4 Powers

Spoiler-Safe Recaps

Gemma summarizes chapter chunks into structured summaries and key events that help readers quickly reconnect with the story.

Character Memory

The model updates persistent character descriptions and remembers important events tied to each character across chapters.

Speaker Attribution

Gemma helps identify ambiguous dialogue speakers when rule-based systems fail.

Contextual Q&A

Readers can ask questions about the story, and Gemma answers using chapter-aware retrieval that avoids future spoilers.

Text Simplification

Selected passages can be rewritten into clearer modern English while preserving meaning and tone.

Technical Architecture

The frontend is a lightweight EPUB reader built with vanilla HTML, CSS, and JavaScript. It handles book uploads, chapter navigation, reading controls, themes, typography settings, and the AI interaction panel.

The backend is built with FastAPI and SQLite. It manages books, chapters, summaries, embeddings, character memory, retrieval, and streaming responses.

The AI stack runs fully locally using llama.cpp:

Gemma 4 E2B runs as the local chat and reasoning model
Nomic embeddings power semantic retrieval
ChromaDB stores vector embeddings per book
Background processing pipelines analyze chapters incrementally

The app processes books chapter-by-chapter instead of trying to load entire novels into context at once. Intermediate artifacts like summaries, character memory, embeddings, and speaker metadata are stored and reused throughout the reading experience.

This pipeline-first design makes the system faster, more grounded, and more practical for long-form reading.

Spoiler-Safe Retrieval

One of the biggest design goals was preventing accidental spoilers.

When a reader asks a question, Memoria retrieves only information from chapters the user has already completed. The retrieval system filters vector search results using reading progress before sending context to Gemma 4.

This allows the app to help readers remember earlier story details without revealing future events.

Challenges

Handling Long Books

Full novels are too large to send directly into a local model context window. I solved this by chunking chapters into smaller sections while carrying forward rolling summaries and character memory.

Structured Output Reliability

Local models sometimes wrap JSON outputs in extra formatting or explanations. To make the pipeline reliable, prompts were heavily constrained and the backend extracts valid JSON blocks safely before processing.

Speaker Attribution

Dialogue attribution in fiction is difficult because speakers are often implied instead of explicitly named. I used a hybrid approach where rules handle obvious cases while Gemma handles ambiguous dialogue using broader context.

Fully Local Deployment

The project depends on multiple services including Gemma 4, embedding models, Python environments, and vector databases. I automated the setup process using launcher scripts so the app can be started locally with minimal manual configuration.

Why Local AI Matters

One of the main goals of this project was accessibility and digital equity.

Readers should not need:

expensive subscriptions
cloud AI services
constant internet access
external data collection

By combining Gemma 4 with llama.cpp and local retrieval, Memoria creates a fully local AI reading companion that respects reader privacy while remaining accessible on consumer hardware.

This makes the project useful not only for individual readers, but also for classrooms, libraries, care settings, and offline learning environments.

Conclusion

Memoria demonstrates how Gemma 4 can power practical, privacy-friendly accessibility tools beyond chatbots.

Instead of replacing reading, the goal is to support readers — helping them stay connected to stories, remember context, and reduce cognitive load while preserving the experience of reading itself.

By combining Gemma 4 E2B, llama.cpp, retrieval, and structured processing pipelines, Memoria turns static EPUB books into adaptive reading experiences that can run entirely offline.

DEV Community: Santhosh L

Memoria - A Local AI Reading Companion Powered by Gemma 4

Memoria — A Local AI Reading Companion Powered by Gemma 4

What I Built

Demo

Features shown in the demo

Code

Main technologies used

How I Used Gemma 4

Why Gemma 4 E2B?

What Gemma 4 Powers

Spoiler-Safe Recaps

Character Memory

Speaker Attribution

Contextual Q&A

Text Simplification

Technical Architecture

Spoiler-Safe Retrieval

Challenges

Handling Long Books

Structured Output Reliability

Speaker Attribution

Fully Local Deployment

Why Local AI Matters

Conclusion