DEV Community

Cover image for How We Built an AI Manga Studio with Google Gemini in a Week
Olamiposi
Olamiposi

Posted on • Edited on

How We Built an AI Manga Studio with Google Gemini in a Week

We created this piece of content for the purposes of entering the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge

Enpitsu (鉛筆) — AI Manga Studio

Created for the Gemini Live Agent Challenge · #GeminiLiveAgentChallenge


The Idea

Manga is one of the world's most expressive storytelling formats — but creating it requires years of artistic training. We wanted to build something that lets anyone with a story idea produce a real, illustrated manga with consistent characters and cinematic black-and-white panels.

The result is Enpitsu (鉛筆 — Japanese for pencil): a full AI manga studio powered by Google Gemini. You type a story idea, pick a genre, and Enpitsu generates a complete manga — script, character art, and illustrated panels — exported as a PDF.


The Pipeline: Four Steps From Prompt to Manga

Step 1 — Story Script

The first step uses Gemini 2.5 Flash with structured JSON output. We pass the genre and story prompt and get back a full manga script: title, Japanese title, synopsis, characters with visual descriptions, and per-panel scene descriptions with dialogue.

The key Gemini feature here is response_mime_type: "application/json" with a Pydantic response_schema — Gemini returns valid, directly-usable JSON every time, no fragile parsing needed.

response = await client.aio.models.generate_content(
    model="gemini-2.5-flash",
    contents=user_prompt,
    config=GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        response_mime_type="application/json",
        response_schema=StoryResponse,
    ),
)
Enter fullscreen mode Exit fullscreen mode

Step 2 — Character Model Sheets (Settei)

For each character, Gemini's image models generate a professional settei (設定) — the character reference sheets used in real anime production. Each sheet shows front, 3/4, and side views plus emotion expressions, with clean linework on white.

We implemented a three-model fallback chain across available Gemini image preview models so generation degrades gracefully rather than failing.


Step 3 — Panel Generation (The Hard Part)

Generating a single panel is easy. Generating 20+ panels where the same character looks consistent across all of them — that's the hard problem in AI manga generation.

Our solution: pass every character's settei sheet as a multimodal image reference in every single panel generation call, labelled as either "IN THIS PANEL" or "reference only." We also pass the previous panel for visual continuity. This is the core technique that makes Enpitsu work.

for char_name, sheet_bytes in character_sheets.items():
    contents.append(types.Part.from_bytes(data=sheet_bytes, mime_type="image/png"))
    if char_name in present_set:
        contents.append(types.Part.from_text(
            text=f"[CHARACTER REFERENCE — IN THIS PANEL] {char_name} — match this design EXACTLY."
        ))
    else:
        contents.append(types.Part.from_text(
            text=f"[CHARACTER REFERENCE — NOT IN PANEL] {char_name} — provided for style consistency."
        ))
Enter fullscreen mode Exit fullscreen mode

The result is panels where characters stay recognizable from page 1 to page 10.


Step 4 — Reader & PDF Export

The completed manga is displayed in a reader UI and exported as a PDF using html2canvas + jsPDF.


Streaming With SSE

Generating 20+ panels takes time. Instead of making users stare at a spinner, we used Server-Sent Events (SSE) so both character sheets and panels stream to the UI as they're generated — users watch their manga being drawn in real time.

The FastAPI backend yields SSE events from an async generator:

async def event_stream():
    previous_panel_bytes: bytes | None = None

    for page in request.pages:
        previous_panel_bytes = None  # reset at each new page

        for panel in page.panels:
            png_bytes = await generate_panel(
                panel=panel,
                character_sheets=character_sheet_bytes,
                previous_panel_bytes=previous_panel_bytes,
                ...
            )
            b64 = base64.b64encode(png_bytes).decode("ascii")
            event = PanelGenerationEvent(
                page_number=page.page_number,
                panel_number=panel.panel_number,
                image_base64=b64,
                status="complete",
            )
            previous_panel_bytes = png_bytes
            yield f"data: {event.model_dump_json()}\n\n"

    yield "data: [DONE]\n\n"

return StreamingResponse(
    event_stream(),
    media_type="text/event-stream",
    headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
Enter fullscreen mode Exit fullscreen mode

Two things worth noting: previous_panel_bytes is threaded through each call so Gemini can see the previous panel and maintain visual continuity. And X-Accel-Buffering: no is essential when running behind a reverse proxy (Cloud Run, Render) — without it, the proxy buffers the stream and the real-time effect disappears.


Generated Output Examples

Here's a look at what Enpitsu actually produces end-to-end.

Character Model Sheets (Settei)

Gemini generates professional reference sheets showing front, 3/4, and side views alongside expression samples — exactly as used in anime production pipelines. Here's the settei for Alien Minion, one of the antagonists from a sci-fi shōnen story:

Alien Minion character model sheet

Panel Artwork

And here's what the storyboard step produces — full manga pages with speech bubbles, screentone shading, speed lines, and dynamic panel layouts:

KRAKOOM storyboard page

Sketch-to-Manga Conversion

One feature we didn't cover in the main pipeline: Enpitsu can take your own hand-drawn sketches and convert them into polished manga artwork while preserving your original composition. Here's a page before and after:

Hand-drawn storyboard (before) Converted manga page (after)
KRAKOOM storyboard page converted manga page (after)

The model preserves the two-panel split, the character poses, and the battle damage — while adding screentone shading, sharp linework, and manga-style weight. You bring the rough composition; Gemini handles the inking.

The Sketch Flow in Action

Upload your sketches and the agent gets to work — extracting characters,
generating settei sheets, then converting each sketch to polished manga.

Planning & Settei
Sketch flow - character extraction and settei

Results across three sketches:

Sketch conversion - battle scene Sketch conversion - smoker girl
Sketch conversion - glasses girl

Rough pencil lines in, publication-ready manga out.


Tech Stack

Layer Technology
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS 4
Backend Python, FastAPI, Uvicorn
Deployment Google Cloud Run (backend), Render (frontend)
AI Google Gemini 2.5 Flash + Gemini Image Models (Google GenAI SDK)
Auth Firebase Authentication + Firebase Admin SDK
Export html2canvas + jsPDF

What We Learned

  1. Gemini's multimodal input is a powerful consistency tool. Treating character sheets as "visual anchors" passed to every generation call is a reusable pattern for any project needing consistent AI characters — and it extends naturally to new input modes like sketch-to-manga, where user drawings become the anchor instead.
  2. Structured JSON output is underrated. response_schema with Pydantic means zero post-processing of Gemini's text output.
  3. SSE is the right protocol for streaming AI results — simpler than WebSockets for server-to-client streaming, and it works cleanly for both character sheet generation and panel generation as independent streams.
  4. Auto mode changes the product entirely. Letting users skip the step-by-step flow and generate a full manga in one click felt like a different app — and it opened up the question of how much creative control users actually want vs. how much they want magic.

What's Next

  • Project persistence — save and revisit generated manga from a personal dashboard
  • Panel regeneration — re-roll individual panels you're not happy with without redoing the whole manga

Source code: github.com/MobileMage/manga-gen


Top comments (13)

Collapse
 
theycallmeswift profile image
Swift

This is awesome. Do you have examples of generated outputs? Would be nice to include in the post if you have time to grab some!

Collapse
 
system625 profile image
Olamiposi • Edited

Thanks so much! I've updated the post with a dedicated examples section, you can see the character model sheets Gemini generates, full storyboard pages with screentone and speech bubbles, and a sketch-to-manga before/after. The sketch conversion is honestly one of my favourite parts: you draw the rough composition and Gemini handles the inking, screentones, and line weight.

Collapse
 
theycallmeswift profile image
Swift

Wow! It's even better than I was expecting! Great work

Thread Thread
 
system625 profile image
Olamiposi

Thank you very much!🚀

Collapse
 
alfie_9536 profile image
Ronnie

"This is amazing work! I love how you handled character consistency with Gemini’s multimodal input — the sketch-to-manga feature is especially impressive. Really inspiring approach for anyone wanting to generate full manga with AI."

Collapse
 
system625 profile image
Olamiposi

Thanks so much! The multimodal input was honestly the key insight, passing character sheets as image references on every panel call is what made consistency actually work. Glad it's inspiring!

Collapse
 
wvanheemstra profile image
Willem van Heemstra

Hi, great stuff you are building! Your GitHub repository is not to be found anymore, unfortunately… Can you share, please?

Collapse
 
system625 profile image
Olamiposi

Hello, thank you!
Sorry about that, it was set to private initally. You can check it now

Collapse
 
mikewirth profile image
Michael Wirth

Very creative use of Gemini! Did you consider using a platform to make calling multiple AI models easier?

Collapse
 
system625 profile image
Olamiposi

Thanks! We did consider orchestration frameworks like LangChain early on, but decided against them for this project. Since we're going deep on a single provider (Gemini) rather than swapping models, the abstraction layer didn't buy us much — and the Google GenAI SDK's native async support and multimodal Part API were exactly what we needed for the character sheet consistency technique. Going direct also meant one less dependency and easier debugging when image generation behaved unexpectedly. For a multi-provider setup it'd be a different call though!

Collapse
 
finalsudoku profile image
Final

This really resonated with me.

Collapse
 
system625 profile image
Olamiposi

Really appreciate you saying that! It was a fun week of building, hope it sparks some ideas for your own projects.

Collapse
 
shane_coughlan profile image
Shane Coughlan

It was very interesting to read about some of the transformations and adjustments for consistency you used. I wanted to make one suggestion: adding a license to the GitHub code so that it is clear if and how others can use it. I didn’t see a license when I checked it out, but please do let me know if I missed it. MIT seems to be a popular default for many AI-related projects.