DEV Community

Cover image for I Built an AI Manga Creator with Next.js and Gemini's "Visual Memory"
Abubakr Alsheikh
Abubakr Alsheikh

Posted on

I Built an AI Manga Creator with Next.js and Gemini's "Visual Memory"

I just wrapped up my submission for the Google Nano Banana Hackathon, and I'm incredibly excited to share what I built: NanoManga Studio. It's an AI-powered web app that lets you generate entire, visually-consistent manga stories from a simple idea.

The biggest problem with AI image generation for storytelling is consistency. How do you make sure your hero has the same hairstyle on page 3 as they did on page 1? I decided to tackle this head-on.

🚀 Live Demo: nanomanga-studio.vercel.app
💻 GitHub Repo (Stars are appreciated! ⭐): github.com/Abubakr-Alsheikh/nanomanga-studio

The Tech Stack

I wanted a modern, fast, and type-safe stack that would let me iterate quickly for the hackathon.

  • Framework: Next.js 15 (App Router)
  • UI: shadcn/ui & Tailwind CSS
  • State Management: Simple React useState lifted to the root component.
  • AI: Google AI JavaScript SDK (@google/generative-ai)
  • Deployment: Vercel

The Core Innovation: Giving the AI a "Visual Memory"

The magic of this project is in the multi-modal prompting. Instead of just sending text, I created a rich context package for the gemini-2.5-flash-image-preview (or "Nano Banana") model for every new page generation.

Here's the breakdown of the fetch call from the PageGenerator component:

// file: app/components/page-generator.tsx

const handleGeneratePage = async () => {
  // ... state checks and loading indicators

  // 1. Get previous pages and selected assets for this scene
  const previousPages = pages.slice(0, currentPageNumber - 1);
  const selectedAssets = allAssets.filter(asset => selectedAssetIds.has(asset.id));

  // 2. Craft a highly specific text prompt
  const fullPrompt = `
    **Manga Page Generation**
    **Page Number:** ${currentPageNumber}
    **Page Description:** ${pagePrompt} // e.g., "Panel 1: Close-up on Kenji..."

    **INSTRUCTIONS FOR IMAGE REFERENCES:**
    - The FIRST ${previousPages.length} images are previous pages for continuity.
    - The REMAINING ${selectedAssets.length} images are specific assets for THIS page.
  `.trim();

  // 3. Assemble the visual context array (THE KEY PART!)
  // We extract the base64 data from our data URLs
  const pageImages = previousPages.map(page => page.imageUrl.split(',')[1]);
  const assetImages = selectedAssets.map(asset => asset.imageUrl.split(',')[1]);

  // Previous pages go FIRST to establish context
  const baseImages = [...pageImages, ...assetImages];

  // 4. Make the API call
  const response = await fetch('/api/generate', {
    method: 'POST',
    body: JSON.stringify({ prompt: fullPrompt, baseImages }),
  });

  // ... handle response
};
Enter fullscreen mode Exit fullscreen mode

By explicitly telling the AI how to interpret the sequence of images, it can maintain character appearance, clothing, and even damage across multiple pages.

AI as an Art Director and Story Editor

Before even generating images, I use gemini-2.5-flash with persona-based prompting to structure the entire project.

  • Story Planning: I ask the AI to act as a "master manga editor" and return a complete story plan in a strict JSON format. This plan includes character descriptions, environments, and a page-by-page plot that follows a classic narrative arc.

  • Asset Design: When inspiring asset ideas, the AI takes on two roles:

    • A "character concept artist" that designs a full-body character sheet on a neutral background.
    • A "background artist" that designs an atmospheric, character-free environment shot.

This ensures the generated assets are clean and perfect for use as references later on.

What I Learned

This project was a blast. It hammered home that the future of generative AI isn't just about single, powerful prompts. It's about building systems that maintain context, create feedback loops, and allow for true human-AI collaboration. The multi-modal capabilities of models like Gemini are the key to unlocking this.

I'd love for you to try it out and see what you can create! Let me know what you think in the comments. What would you build with this kind of "visual memory"?

If you want to read my full technical write-up on Kaggle:
https://www.kaggle.com/competitions/banana/writeups/nanomanga-studio

Happy coding!

react #nextjs #ai #google #webdev #typescript #hackathon #generativeai

Top comments (0)