<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Santhosh L</title>
    <description>The latest articles on DEV Community by Santhosh L (@santhosh2312).</description>
    <link>https://dev.to/santhosh2312</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3922941%2F16d595ea-9b1f-420e-bf55-a2c7e9fbad48.png</url>
      <title>DEV Community: Santhosh L</title>
      <link>https://dev.to/santhosh2312</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/santhosh2312"/>
    <language>en</language>
    <item>
      <title>Memoria - A Local AI Reading Companion Powered by Gemma 4</title>
      <dc:creator>Santhosh L</dc:creator>
      <pubDate>Sat, 23 May 2026 13:09:21 +0000</pubDate>
      <link>https://dev.to/santhosh2312/memoria-a-local-ai-reading-companion-powered-by-gemma-4-46l3</link>
      <guid>https://dev.to/santhosh2312/memoria-a-local-ai-reading-companion-powered-by-gemma-4-46l3</guid>
      <description>&lt;h1&gt;
  
  
  Memoria — A Local AI Reading Companion Powered by Gemma 4
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Reading long books can be difficult even for people who love reading.&lt;/p&gt;

&lt;p&gt;Readers forget characters, lose track of earlier events, struggle with dense prose, or return to a book after a break and feel disconnected from the story. For readers with ADHD, memory difficulties, cognitive fatigue, or accessibility needs, this becomes even harder.&lt;/p&gt;

&lt;p&gt;Memoria is a local AI reading companion powered by Gemma 4 that helps readers stay connected to books through spoiler-safe recaps, contextual Q&amp;amp;A, character memory, speaker attribution, and text simplification — all while running locally on the user’s machine.&lt;/p&gt;

&lt;p&gt;The app combines an EPUB reader with AI-powered reading support features including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spoiler-safe chapter recaps&lt;/li&gt;
&lt;li&gt;Character memory tracking&lt;/li&gt;
&lt;li&gt;Speaker attribution for dialogue&lt;/li&gt;
&lt;li&gt;Contextual book Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;Passage explanations&lt;/li&gt;
&lt;li&gt;Text simplification for difficult prose&lt;/li&gt;
&lt;li&gt;Retrieval-based memory of earlier chapters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything runs locally using Gemma 4 through llama.cpp, so readers do not need a paid AI subscription or constant internet access.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/54wqOLpItHk"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Features shown in the demo
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Uploading and processing EPUB books&lt;/li&gt;
&lt;li&gt;AI-generated chapter recaps&lt;/li&gt;
&lt;li&gt;Character tracking across chapters&lt;/li&gt;
&lt;li&gt;Context-aware Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;Highlight-to-explain workflow&lt;/li&gt;
&lt;li&gt;Text simplification for difficult passages&lt;/li&gt;
&lt;li&gt;Spoiler-safe retrieval limited to completed chapters&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;GitHub Repository: &lt;a href="https://github.com/Santhoshl2312/Gemma_book_reader" rel="noopener noreferrer"&gt;https://github.com/Santhoshl2312/Gemma_book_reader&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Main technologies used
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 E2B&lt;/li&gt;
&lt;li&gt;llama.cpp&lt;/li&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;SQLite&lt;/li&gt;
&lt;li&gt;ChromaDB&lt;/li&gt;
&lt;li&gt;Vanilla JavaScript&lt;/li&gt;
&lt;li&gt;HTML/CSS&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;Memoria uses Gemma 4 as the core local reasoning engine for the entire reading experience.&lt;/p&gt;

&lt;p&gt;I used the Gemma 4 E2B model through a local llama.cpp OpenAI-compatible server, allowing the application to run fully offline without relying on cloud APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Gemma 4 E2B?
&lt;/h3&gt;

&lt;p&gt;I specifically chose Gemma 4 E2B because it was the best fit for a responsive local reading assistant.&lt;/p&gt;

&lt;p&gt;The project needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast inference speeds&lt;/li&gt;
&lt;li&gt;Low VRAM usage&lt;/li&gt;
&lt;li&gt;Good reasoning quality&lt;/li&gt;
&lt;li&gt;Reliable structured outputs&lt;/li&gt;
&lt;li&gt;Practical local deployment on consumer hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 E2B delivered the right balance between speed and capability, making it possible to provide near real-time responses for recaps, contextual Q&amp;amp;A, text simplification, and chapter processing while still running locally through llama.cpp.&lt;/p&gt;

&lt;p&gt;This was especially important because the app performs many smaller AI tasks continuously in the background while the user reads.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Gemma 4 Powers
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Spoiler-Safe Recaps
&lt;/h4&gt;

&lt;p&gt;Gemma summarizes chapter chunks into structured summaries and key events that help readers quickly reconnect with the story.&lt;/p&gt;

&lt;h4&gt;
  
  
  Character Memory
&lt;/h4&gt;

&lt;p&gt;The model updates persistent character descriptions and remembers important events tied to each character across chapters.&lt;/p&gt;

&lt;h4&gt;
  
  
  Speaker Attribution
&lt;/h4&gt;

&lt;p&gt;Gemma helps identify ambiguous dialogue speakers when rule-based systems fail.&lt;/p&gt;

&lt;h4&gt;
  
  
  Contextual Q&amp;amp;A
&lt;/h4&gt;

&lt;p&gt;Readers can ask questions about the story, and Gemma answers using chapter-aware retrieval that avoids future spoilers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Text Simplification
&lt;/h4&gt;

&lt;p&gt;Selected passages can be rewritten into clearer modern English while preserving meaning and tone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Architecture
&lt;/h2&gt;

&lt;p&gt;The frontend is a lightweight EPUB reader built with vanilla HTML, CSS, and JavaScript. It handles book uploads, chapter navigation, reading controls, themes, typography settings, and the AI interaction panel.&lt;/p&gt;

&lt;p&gt;The backend is built with FastAPI and SQLite. It manages books, chapters, summaries, embeddings, character memory, retrieval, and streaming responses.&lt;/p&gt;

&lt;p&gt;The AI stack runs fully locally using llama.cpp:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 E2B runs as the local chat and reasoning model&lt;/li&gt;
&lt;li&gt;Nomic embeddings power semantic retrieval&lt;/li&gt;
&lt;li&gt;ChromaDB stores vector embeddings per book&lt;/li&gt;
&lt;li&gt;Background processing pipelines analyze chapters incrementally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The app processes books chapter-by-chapter instead of trying to load entire novels into context at once. Intermediate artifacts like summaries, character memory, embeddings, and speaker metadata are stored and reused throughout the reading experience.&lt;/p&gt;

&lt;p&gt;This pipeline-first design makes the system faster, more grounded, and more practical for long-form reading.&lt;/p&gt;




&lt;h2&gt;
  
  
  Spoiler-Safe Retrieval
&lt;/h2&gt;

&lt;p&gt;One of the biggest design goals was preventing accidental spoilers.&lt;/p&gt;

&lt;p&gt;When a reader asks a question, Memoria retrieves only information from chapters the user has already completed. The retrieval system filters vector search results using reading progress before sending context to Gemma 4.&lt;/p&gt;

&lt;p&gt;This allows the app to help readers remember earlier story details without revealing future events.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Handling Long Books
&lt;/h3&gt;

&lt;p&gt;Full novels are too large to send directly into a local model context window. I solved this by chunking chapters into smaller sections while carrying forward rolling summaries and character memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Output Reliability
&lt;/h3&gt;

&lt;p&gt;Local models sometimes wrap JSON outputs in extra formatting or explanations. To make the pipeline reliable, prompts were heavily constrained and the backend extracts valid JSON blocks safely before processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speaker Attribution
&lt;/h3&gt;

&lt;p&gt;Dialogue attribution in fiction is difficult because speakers are often implied instead of explicitly named. I used a hybrid approach where rules handle obvious cases while Gemma handles ambiguous dialogue using broader context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully Local Deployment
&lt;/h3&gt;

&lt;p&gt;The project depends on multiple services including Gemma 4, embedding models, Python environments, and vector databases. I automated the setup process using launcher scripts so the app can be started locally with minimal manual configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Local AI Matters
&lt;/h2&gt;

&lt;p&gt;One of the main goals of this project was accessibility and digital equity.&lt;/p&gt;

&lt;p&gt;Readers should not need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;expensive subscriptions&lt;/li&gt;
&lt;li&gt;cloud AI services&lt;/li&gt;
&lt;li&gt;constant internet access&lt;/li&gt;
&lt;li&gt;external data collection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining Gemma 4 with llama.cpp and local retrieval, Memoria creates a fully local AI reading companion that respects reader privacy while remaining accessible on consumer hardware.&lt;/p&gt;

&lt;p&gt;This makes the project useful not only for individual readers, but also for classrooms, libraries, care settings, and offline learning environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Memoria demonstrates how Gemma 4 can power practical, privacy-friendly accessibility tools beyond chatbots.&lt;/p&gt;

&lt;p&gt;Instead of replacing reading, the goal is to support readers — helping them stay connected to stories, remember context, and reduce cognitive load while preserving the experience of reading itself.&lt;/p&gt;

&lt;p&gt;By combining Gemma 4 E2B, llama.cpp, retrieval, and structured processing pipelines, Memoria turns static EPUB books into adaptive reading experiences that can run entirely offline.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
