DEV Community

Cover image for Building NousyBooks: How I Used 6 Gemini Models to Create Personalized Children's Storybooks
Vinay Guda
Vinay Guda Subscriber

Posted on

Building NousyBooks: How I Used 6 Gemini Models to Create Personalized Children's Storybooks

Their face. Their story. Their book.


Every parent knows the magic of a bedtime story. But what if the hero looked exactly like your child?

As a parent of a 7-year-old daughter, I've spent years exploring children's books with her — from Usborne's lift-the-flap books to interactive ebook apps. One thing became clear: children light up most when the story feels like theirs. Not a generic hero — them.

Existing personalized book services cost $50-100+, take weeks to deliver, and use crude template swaps. The result feels manufactured, not magical.

Then Google released Nano Banana 2 (gemini-3.1-flash-image-preview) with character consistency — the ability to maintain a person's likeness across multiple generated images using reference photos. The idea clicked instantly. I could build a tool where a child's actual face appears on every page, in any art style, generated in minutes.

That's how NousyBooks was born.

Try it live: https://nousybooks-hackathon-218423701961.us-central1.run.app
Source code: https://github.com/vinayguda/nousybooks-hackathon

NousyBooks Landing Page


What NousyBooks Does

NousyBooks generates personalized children's storybooks end-to-end:

  1. Upload photos of your child as character references
  2. Talk to Nousy — a voice assistant that guides story creation through natural conversation
  3. Choose a language — 14 languages supported, from English to Telugu to Japanese
  4. Generate — Gemini writes the story, paints character-consistent illustrations, and narrates it as an audiobook
  5. Export — Download a print-ready PDF, video with karaoke subtitles, or share via link

The entire flow — from photo upload to finished storybook — takes about 3-5 minutes.

Landing Page — Features Overview

Landing Page — How It Works

Landing Page — Art Styles

Book Creation

Story Creation Dashboard — Characters, Style & Topic Ready

Generation in Progress

Reading Your Story

Reading View — Page 1

Audio Narration with Word Highlighting

Your Library

Story Library Grid


The Architecture: 6 Gemini Models, One Pipeline

The most interesting part of NousyBooks is how it orchestrates six different Gemini model capabilities into a single seamless experience:

Phase Model What It Does
Voice Assistant gemini-2.5-flash-native-audio (Live API) "Nousy" — guides story creation with 12 function tools
Topic Generation gemini-2.5-flash Auto-generates story ideas from character details
Story Writing gemini-2.5-flash JSON schema-enforced narrative with per-page visual prompts
Illustration gemini-3.1-flash-image-preview Character-consistent image generation
Image Editing gemini-3.1-flash-image-preview Natural language illustration edits
Audiobook gemini-2.5-flash-preview-tts Narration with real-time word highlighting

All AI calls happen client-side via the @google/genai SDK — no backend AI server needed.

https://github.com/vinayguda/nousybooks-hackathon?tab=readme-ov-file#ai-generation-pipeline


The Hard Part: Character Consistency

The biggest technical challenge was making the child look the same on every page. Early attempts produced completely different characters — different clothing, hair color, even skin tone.

I solved this with what I call the Anchor Image Pattern:

  1. Page 1 generates first, using only the child's reference photos
  2. Pages 2-4 generate in parallel, each receiving the original references PLUS page 1's illustration as an "anchor"
  3. The anchor image carries an explicit consistency instruction: "The character MUST have the same face, hair, skin tone, and clothing as shown in this anchor image"

This was the breakthrough. No fine-tuning. No LoRA. Just reference photos + an anchor image + explicit instructions. The result: your child looks like themselves on every page — in watercolor, 3D animation, anime, paper cutout, any style.

This pattern could be useful beyond storybooks for any sequential AI illustration task.

https://github.com/vinayguda/nousybooks-hackathon?tab=readme-ov-file#character-consistency-pattern

Character Consistency Across Pages

Building the Voice Assistant

Nousy is powered by the Gemini Live API with native audio — bidirectional streaming over WebSocket with raw PCM audio (16kHz input, 24kHz output).

The voice assistant has 12 function calling tools that control the entire app:

  • addCharacter / removeCharacter — manage story characters
  • selectArtStyle — choose from 8 illustration styles
  • setStoryTopic — set the narrative theme
  • setStoryLanguage — switch between 14 languages
  • setPageCount — customize story length (2-26 pages)
  • startGeneration — trigger the full pipeline
  • editIllustration / editPageText — refine the result
  • And more...

Say "Add a character named Shreya, she's a 7-year-old who loves butterflies" and Nousy creates the character. Say "Make it in Hindi" and it switches the language. When you're ready, just say "Let's generate the story" and the entire pipeline kicks off.

The trickiest part was handling race conditions — the WebSocket could disconnect during connection setup, leaving the UI in a stale "listening" state. I solved this with a sessionRef guard pattern that validates the connection is still active before updating state.


Multi-Language: 14 Languages, One Codebase

Stories can be generated and narrated in 14 languages: English, Spanish, French, Hindi, Mandarin, Japanese, Korean, Arabic, Portuguese, German, Italian, Russian, Telugu, and Tamil.

The TTS narration auto-detects the story's language for native pronunciation. Word-level highlighting works seamlessly across all scripts, including Indic languages like Telugu and Tamil.

I had incredible fun generating storybooks in my native language, Telugu. Hearing my daughter's story narrated in Telugu with proper pronunciation was a genuinely emotional moment.


How I Built It

The Stack

  • Frontend: React 19 + TypeScript 5.8 + Vite 6 + Tailwind CSS 4
  • AI: Gemini API via @google/genai SDK (all client-side)
  • Backend: Express server (static files + runtime config injection)
  • Auth & Storage: Supabase (auth, database, file storage)
  • Deployment: Google Cloud Run + GitHub Actions CI/CD
  • Prototyping: Google AI Studio + Antigravity (code export)

The Process

I started in Google AI Studio to prototype — testing story generation prompts, image generation with reference photos, and TTS. AI Studio let me validate the concept before writing code.

Once it worked, I exported with Antigravity and built the full application using Claude Code as a development partner. The entire development was AI-assisted — what people call "vibe coding." The key insight: AI assistance works best when you have a clear vision of WHAT to build and let the AI help with HOW.

Key Technical Decisions

  • JSON Schema Enforcement — Story generation uses responseMimeType: "application/json" with a strict schema, guaranteeing valid structured output every time
  • Supabase Storage — Images stored as files, not base64, reducing story load from ~14MB to ~2KB
  • Runtime Config Injection — API keys injected via window.__CONFIG__ at request time, never baked into the JS bundle
  • Exponential Backoff — All image generation wrapped in retry logic for production resilience
  • Pipelined TTS — Next page's audio pre-generates while current page plays, eliminating delays

Challenges

Model Deprecation Mid-Buildgemini-2.0-flash got deprecated and started returning 404 errors. Had to quickly migrate to gemini-2.5-flash. Lesson: build with model flexibility in mind.

Rate Limits — Parallel generation of 4 illustrations often hit rate limits. Tuning retry delays to balance speed vs. reliability took iteration.

First Cloud Deployment — This was my first time deploying anything to Google Cloud. Learning Docker multi-stage builds, Cloud Run, and CI/CD was a significant learning curve — but now I have a fully automated pipeline that deploys on every push to main.


My Daughter's Reaction

The moment that made this project worth it: when I generated a story starring my daughter, she was thrilled. She immediately wanted more. She asked to try different art styles. She wanted stories in Telugu so she could share them with her grandparents.

Seeing her flip through a PDF storybook where she was the hero — that's the magic of NousyBooks.


What's Next

  • Age-appropriate modes — Adjust vocabulary and complexity for toddlers vs. early readers
  • Story templates — Pre-built structures for common themes
  • Print-on-demand — Ship physical hardcover books
  • Collaborative storytelling — Parent + child brainstorm together with the voice assistant
  • Object references — Upload photos of toys, pets, and places to appear in illustrations

Try It

Live App: https://nousybooks-hackathon-218423701961.us-central1.run.app
GitHub: https://github.com/vinayguda/nousybooks-hackathon

Built for the Gemini Live Agent Challenge. Built for my daughter. Built for every parent who wants to give their child a story where they're the hero.


Built with Google AI Studio, Antigravity, Gemini API, Google Cloud Run, and Claude Code.

By Vinay Guda

Top comments (0)