I just wrapped up my submission for the Google Nano Banana Hackathon, and I'm incredibly excited to share what I built: NanoManga Studio. It's an AI-powered web app that lets you generate entire, visually-consistent manga stories from a simple idea.
The biggest problem with AI image generation for storytelling is consistency. How do you make sure your hero has the same hairstyle on page 3 as they did on page 1? I decided to tackle this head-on.
🚀 Live Demo: nanomanga-studio.vercel.app
💻 GitHub Repo (Stars are appreciated! ⭐): github.com/Abubakr-Alsheikh/nanomanga-studio
The Tech Stack
I wanted a modern, fast, and type-safe stack that would let me iterate quickly for the hackathon.
- Framework: Next.js 15 (App Router)
- UI: shadcn/ui & Tailwind CSS
- State Management: Simple React
useState
lifted to the root component. - AI: Google AI JavaScript SDK (
@google/generative-ai
) - Deployment: Vercel
The Core Innovation: Giving the AI a "Visual Memory"
The magic of this project is in the multi-modal prompting. Instead of just sending text, I created a rich context package for the gemini-2.5-flash-image-preview
(or "Nano Banana") model for every new page generation.
Here's the breakdown of the fetch
call from the PageGenerator
component:
// file: app/components/page-generator.tsx
const handleGeneratePage = async () => {
// ... state checks and loading indicators
// 1. Get previous pages and selected assets for this scene
const previousPages = pages.slice(0, currentPageNumber - 1);
const selectedAssets = allAssets.filter(asset => selectedAssetIds.has(asset.id));
// 2. Craft a highly specific text prompt
const fullPrompt = `
**Manga Page Generation**
**Page Number:** ${currentPageNumber}
**Page Description:** ${pagePrompt} // e.g., "Panel 1: Close-up on Kenji..."
**INSTRUCTIONS FOR IMAGE REFERENCES:**
- The FIRST ${previousPages.length} images are previous pages for continuity.
- The REMAINING ${selectedAssets.length} images are specific assets for THIS page.
`.trim();
// 3. Assemble the visual context array (THE KEY PART!)
// We extract the base64 data from our data URLs
const pageImages = previousPages.map(page => page.imageUrl.split(',')[1]);
const assetImages = selectedAssets.map(asset => asset.imageUrl.split(',')[1]);
// Previous pages go FIRST to establish context
const baseImages = [...pageImages, ...assetImages];
// 4. Make the API call
const response = await fetch('/api/generate', {
method: 'POST',
body: JSON.stringify({ prompt: fullPrompt, baseImages }),
});
// ... handle response
};
By explicitly telling the AI how to interpret the sequence of images, it can maintain character appearance, clothing, and even damage across multiple pages.
AI as an Art Director and Story Editor
Before even generating images, I use gemini-2.5-flash
with persona-based prompting to structure the entire project.
Story Planning: I ask the AI to act as a "master manga editor" and return a complete story plan in a strict JSON format. This plan includes character descriptions, environments, and a page-by-page plot that follows a classic narrative arc.
-
Asset Design: When inspiring asset ideas, the AI takes on two roles:
- A "character concept artist" that designs a full-body character sheet on a neutral background.
- A "background artist" that designs an atmospheric, character-free environment shot.
This ensures the generated assets are clean and perfect for use as references later on.
What I Learned
This project was a blast. It hammered home that the future of generative AI isn't just about single, powerful prompts. It's about building systems that maintain context, create feedback loops, and allow for true human-AI collaboration. The multi-modal capabilities of models like Gemini are the key to unlocking this.
I'd love for you to try it out and see what you can create! Let me know what you think in the comments. What would you build with this kind of "visual memory"?
If you want to read my full technical write-up on Kaggle:
https://www.kaggle.com/competitions/banana/writeups/nanomanga-studio
Happy coding!
Top comments (0)