DEV Community

Cover image for How We Built Alicia Storybook with Gemini + Google Cloud
Sylus Abel
Sylus Abel

Posted on

How We Built Alicia Storybook with Gemini + Google Cloud

How We Built Alicia Storybook with Gemini + Google Cloud

Alicia Storybookis a multimodal AI storytelling app that helps kids and first-time writers turn ideas into complete, illustrated stories.

This content was created specifically for the purpose of entering the Gemini Live Agent Challenge hackathon.

Why we built Alicia

Kids often have incredible imagination but struggle with the blank page. Most tools either feel too rigid or generate everything for the user, which removes creative ownership.

We wanted to build an experience where AI acts like a supportive co-creator:

  • helping users write page by page,
  • giving contextual feedback,
  • generating matching illustrations,
  • and keeping momentum through a full 12-page story arc.

What Alicia does

Alicia guides users through a complete storytelling workflow:

  1. Story setup (character, setting, idea)
  2. AI-assisted page writing
  3. Live coaching (text + voice)
  4. Illustration generation per page
  5. Completion of a 12-page storybook draft

The key product principle: AI should amplify creativity, not replace it.

How we built it

Frontend

  • Next.js + React + TypeScript
  • Tailwind/shadcn for UI

AI and Models

  • Gemini for story coaching and generation
  • Gemini image generation for page illustrations
  • Gemini Live for real-time conversational interaction

Google Cloud Services

  • Firebase Authentication (sign-in)
  • Firestore (story/project/user state)
  • Firebase Storage (generated images)
  • Firebase-backed infrastructure for secure app workflows

Product architecture (high level)

  • Client app handles onboarding, editor, creator flow, and live UX
  • API routes orchestrate Gemini requests for chat/image/live token flows
  • Firestore persists story state and progression
  • Storage persists generated visual assets
  • Auth protects user-level workflows and project ownership

Challenges we faced

1) Balancing freedom with structure

We wanted users to feel unlimited creativity, but also needed enough guardrails so they could actually finish a story.

2) Prompt consistency across modes

Text coaching, illustration generation, and live interactions must feel coherent. We iterated heavily on prompt structure to keep tone and outcomes aligned.

3) Cost and access control

Generative features can be expensive if abused. We added a gated reviewer flow so premium AI-heavy routes are controlled while still enabling judges to test the full experience.

4) Endings and pacing

A lot of users can start stories but struggle to land them. We tuned coaching to push a satisfying page-12 ending (or a clear continuation hook).

What worked really well

  • Immediate visual reward after writing (generate illustration per page)
  • Structured 12-page flow that keeps users moving
  • Multimodal interaction loop (write → coach → generate → reflect)
  • Strong “I made this” feeling because user authorship stays central

What we learned

  • Prompt design is product design.
  • Fast feedback loops are everything in creative tools.
  • Clear narrative boundaries increase completion rates.
  • Multimodal UX needs careful state handling and fallback logic.

What’s next

  • Stronger server-side entitlement checks on all AI routes
  • Better analytics on completion and learning progress
  • Collaboration mode (parent/teacher/co-author workflows)
  • Export and publishing improvements for storybook sharing

Final note

Building Alicia showed us how powerful Gemini + Google Cloud can be when used to support human creativity instead of replacing it.

If you share this build post on social media, use: #GeminiLiveAgentChallenge

Top comments (0)