Vinay Guda

Posted on Mar 16

Building NousyBooks: How I Used 6 Gemini Models to Create Personalized Children's Storybooks

#geminiliveagentchallenge #geminiapi

Their face. Their story. Their book.

Every parent knows the magic of a bedtime story. But what if the hero looked exactly like your child?

As a parent of a 7-year-old daughter, I've spent years exploring children's books with her — from Usborne's lift-the-flap books to interactive ebook apps. One thing became clear: children light up most when the story feels like theirs. Not a generic hero — them.

Existing personalized book services cost $50-100+, take weeks to deliver, and use crude template swaps. The result feels manufactured, not magical.

Then Google released Nano Banana 2 (gemini-3.1-flash-image-preview) with character consistency — the ability to maintain a person's likeness across multiple generated images using reference photos. The idea clicked instantly. I could build a tool where a child's actual face appears on every page, in any art style, generated in minutes.

That's how NousyBooks was born.

Try it live: https://nousybooks-hackathon-218423701961.us-central1.run.app
Source code: https://github.com/vinayguda/nousybooks-hackathon

What NousyBooks Does

NousyBooks generates personalized children's storybooks end-to-end:

Upload photos of your child as character references
Talk to Nousy — a voice assistant that guides story creation through natural conversation
Choose a language — 14 languages supported, from English to Telugu to Japanese
Generate — Gemini writes the story, paints character-consistent illustrations, and narrates it as an audiobook
Export — Download a print-ready PDF, video with karaoke subtitles, or share via link

The entire flow — from photo upload to finished storybook — takes about 3-5 minutes.

Book Creation

Reading Your Story

Your Library

The Architecture: 6 Gemini Models, One Pipeline

The most interesting part of NousyBooks is how it orchestrates six different Gemini model capabilities into a single seamless experience:

Phase	Model	What It Does
Voice Assistant	`gemini-2.5-flash-native-audio` (Live API)	"Nousy" — guides story creation with 12 function tools
Topic Generation	`gemini-2.5-flash`	Auto-generates story ideas from character details
Story Writing	`gemini-2.5-flash`	JSON schema-enforced narrative with per-page visual prompts
Illustration	`gemini-3.1-flash-image-preview`	Character-consistent image generation
Image Editing	`gemini-3.1-flash-image-preview`	Natural language illustration edits
Audiobook	`gemini-2.5-flash-preview-tts`	Narration with real-time word highlighting

All AI calls happen client-side via the @google/genai SDK — no backend AI server needed.

https://github.com/vinayguda/nousybooks-hackathon?tab=readme-ov-file#ai-generation-pipeline

The Hard Part: Character Consistency

The biggest technical challenge was making the child look the same on every page. Early attempts produced completely different characters — different clothing, hair color, even skin tone.

I solved this with what I call the Anchor Image Pattern:

Page 1 generates first, using only the child's reference photos
Pages 2-4 generate in parallel, each receiving the original references PLUS page 1's illustration as an "anchor"
The anchor image carries an explicit consistency instruction: "The character MUST have the same face, hair, skin tone, and clothing as shown in this anchor image"

This was the breakthrough. No fine-tuning. No LoRA. Just reference photos + an anchor image + explicit instructions. The result: your child looks like themselves on every page — in watercolor, 3D animation, anime, paper cutout, any style.

This pattern could be useful beyond storybooks for any sequential AI illustration task.

https://github.com/vinayguda/nousybooks-hackathon?tab=readme-ov-file#character-consistency-pattern

Building the Voice Assistant

Nousy is powered by the Gemini Live API with native audio — bidirectional streaming over WebSocket with raw PCM audio (16kHz input, 24kHz output).

The voice assistant has 12 function calling tools that control the entire app:

addCharacter / removeCharacter — manage story characters
selectArtStyle — choose from 8 illustration styles
setStoryTopic — set the narrative theme
setStoryLanguage — switch between 14 languages
setPageCount — customize story length (2-26 pages)
startGeneration — trigger the full pipeline
editIllustration / editPageText — refine the result
And more...

Say "Add a character named Shreya, she's a 7-year-old who loves butterflies" and Nousy creates the character. Say "Make it in Hindi" and it switches the language. When you're ready, just say "Let's generate the story" and the entire pipeline kicks off.

The trickiest part was handling race conditions — the WebSocket could disconnect during connection setup, leaving the UI in a stale "listening" state. I solved this with a sessionRef guard pattern that validates the connection is still active before updating state.

Multi-Language: 14 Languages, One Codebase

Stories can be generated and narrated in 14 languages: English, Spanish, French, Hindi, Mandarin, Japanese, Korean, Arabic, Portuguese, German, Italian, Russian, Telugu, and Tamil.

The TTS narration auto-detects the story's language for native pronunciation. Word-level highlighting works seamlessly across all scripts, including Indic languages like Telugu and Tamil.

I had incredible fun generating storybooks in my native language, Telugu. Hearing my daughter's story narrated in Telugu with proper pronunciation was a genuinely emotional moment.

How I Built It

The Stack

Frontend: React 19 + TypeScript 5.8 + Vite 6 + Tailwind CSS 4
AI: Gemini API via @google/genai SDK (all client-side)
Backend: Express server (static files + runtime config injection)
Auth & Storage: Supabase (auth, database, file storage)
Deployment: Google Cloud Run + GitHub Actions CI/CD
Prototyping: Google AI Studio + Antigravity (code export)

The Process

I started in Google AI Studio to prototype — testing story generation prompts, image generation with reference photos, and TTS. AI Studio let me validate the concept before writing code.

Once it worked, I exported with Antigravity and built the full application using Claude Code as a development partner. The entire development was AI-assisted — what people call "vibe coding." The key insight: AI assistance works best when you have a clear vision of WHAT to build and let the AI help with HOW.

Key Technical Decisions

JSON Schema Enforcement — Story generation uses responseMimeType: "application/json" with a strict schema, guaranteeing valid structured output every time
Supabase Storage — Images stored as files, not base64, reducing story load from ~14MB to ~2KB
Runtime Config Injection — API keys injected via window.__CONFIG__ at request time, never baked into the JS bundle
Exponential Backoff — All image generation wrapped in retry logic for production resilience
Pipelined TTS — Next page's audio pre-generates while current page plays, eliminating delays

Challenges

Model Deprecation Mid-Build — gemini-2.0-flash got deprecated and started returning 404 errors. Had to quickly migrate to gemini-2.5-flash. Lesson: build with model flexibility in mind.

Rate Limits — Parallel generation of 4 illustrations often hit rate limits. Tuning retry delays to balance speed vs. reliability took iteration.

First Cloud Deployment — This was my first time deploying anything to Google Cloud. Learning Docker multi-stage builds, Cloud Run, and CI/CD was a significant learning curve — but now I have a fully automated pipeline that deploys on every push to main.

My Daughter's Reaction

The moment that made this project worth it: when I generated a story starring my daughter, she was thrilled. She immediately wanted more. She asked to try different art styles. She wanted stories in Telugu so she could share them with her grandparents.

Seeing her flip through a PDF storybook where she was the hero — that's the magic of NousyBooks.

What's Next

Age-appropriate modes — Adjust vocabulary and complexity for toddlers vs. early readers
Story templates — Pre-built structures for common themes
Print-on-demand — Ship physical hardcover books
Collaborative storytelling — Parent + child brainstorm together with the voice assistant
Object references — Upload photos of toys, pets, and places to appear in illustrations

Try It

Live App: https://nousybooks-hackathon-218423701961.us-central1.run.app
GitHub: https://github.com/vinayguda/nousybooks-hackathon

Built for the Gemini Live Agent Challenge. Built for my daughter. Built for every parent who wants to give their child a story where they're the hero.

Built with Google AI Studio, Antigravity, Gemini API, Google Cloud Run, and Claude Code.

By Vinay Guda

DEV Community