Olamiposi

Posted on Mar 13 • Edited on Mar 16

How We Built an AI Manga Studio with Google Gemini in a Week

#ai #devchallenge #gemini #showdev

We created this piece of content for the purposes of entering the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge

Enpitsu (鉛筆) — AI Manga Studio

Created for the Gemini Live Agent Challenge · #GeminiLiveAgentChallenge

The Idea

Manga is one of the world's most expressive storytelling formats — but creating it requires years of artistic training. We wanted to build something that lets anyone with a story idea produce a real, illustrated manga with consistent characters and cinematic black-and-white panels.

The result is Enpitsu (鉛筆 — Japanese for pencil): a full AI manga studio powered by Google Gemini. You type a story idea, pick a genre, and Enpitsu generates a complete manga — script, character art, and illustrated panels — exported as a PDF.

The Pipeline: Four Steps From Prompt to Manga

Step 1 — Story Script

The first step uses Gemini 2.5 Flash with structured JSON output. We pass the genre and story prompt and get back a full manga script: title, Japanese title, synopsis, characters with visual descriptions, and per-panel scene descriptions with dialogue.

The key Gemini feature here is response_mime_type: "application/json" with a Pydantic response_schema — Gemini returns valid, directly-usable JSON every time, no fragile parsing needed.

response = await client.aio.models.generate_content(
    model="gemini-2.5-flash",
    contents=user_prompt,
    config=GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        response_mime_type="application/json",
        response_schema=StoryResponse,
    ),
)

Step 2 — Character Model Sheets (Settei)

For each character, Gemini's image models generate a professional settei (設定) — the character reference sheets used in real anime production. Each sheet shows front, 3/4, and side views plus emotion expressions, with clean linework on white.

We implemented a three-model fallback chain across available Gemini image preview models so generation degrades gracefully rather than failing.

Step 3 — Panel Generation (The Hard Part)

Generating a single panel is easy. Generating 20+ panels where the same character looks consistent across all of them — that's the hard problem in AI manga generation.

Our solution: pass every character's settei sheet as a multimodal image reference in every single panel generation call, labelled as either "IN THIS PANEL" or "reference only." We also pass the previous panel for visual continuity. This is the core technique that makes Enpitsu work.

for char_name, sheet_bytes in character_sheets.items():
    contents.append(types.Part.from_bytes(data=sheet_bytes, mime_type="image/png"))
    if char_name in present_set:
        contents.append(types.Part.from_text(
            text=f"[CHARACTER REFERENCE — IN THIS PANEL] {char_name} — match this design EXACTLY."
        ))
    else:
        contents.append(types.Part.from_text(
            text=f"[CHARACTER REFERENCE — NOT IN PANEL] {char_name} — provided for style consistency."
        ))

The result is panels where characters stay recognizable from page 1 to page 10.

Step 4 — Reader & PDF Export

The completed manga is displayed in a reader UI and exported as a PDF using html2canvas + jsPDF.

Streaming With SSE

Generating 20+ panels takes time. Instead of making users stare at a spinner, we used Server-Sent Events (SSE) so both character sheets and panels stream to the UI as they're generated — users watch their manga being drawn in real time.

The FastAPI backend yields SSE events from an async generator:

async def event_stream():
    previous_panel_bytes: bytes | None = None

    for page in request.pages:
        previous_panel_bytes = None  # reset at each new page

        for panel in page.panels:
            png_bytes = await generate_panel(
                panel=panel,
                character_sheets=character_sheet_bytes,
                previous_panel_bytes=previous_panel_bytes,
                ...
            )
            b64 = base64.b64encode(png_bytes).decode("ascii")
            event = PanelGenerationEvent(
                page_number=page.page_number,
                panel_number=panel.panel_number,
                image_base64=b64,
                status="complete",
            )
            previous_panel_bytes = png_bytes
            yield f"data: {event.model_dump_json()}\n\n"

    yield "data: [DONE]\n\n"

return StreamingResponse(
    event_stream(),
    media_type="text/event-stream",
    headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)

Two things worth noting: previous_panel_bytes is threaded through each call so Gemini can see the previous panel and maintain visual continuity. And X-Accel-Buffering: no is essential when running behind a reverse proxy (Cloud Run, Render) — without it, the proxy buffers the stream and the real-time effect disappears.

Generated Output Examples

Here's a look at what Enpitsu actually produces end-to-end.

Character Model Sheets (Settei)

Gemini generates professional reference sheets showing front, 3/4, and side views alongside expression samples — exactly as used in anime production pipelines. Here's the settei for Alien Minion, one of the antagonists from a sci-fi shōnen story:

Panel Artwork

And here's what the storyboard step produces — full manga pages with speech bubbles, screentone shading, speed lines, and dynamic panel layouts:

Sketch-to-Manga Conversion

One feature we didn't cover in the main pipeline: Enpitsu can take your own hand-drawn sketches and convert them into polished manga artwork while preserving your original composition. Here's a page before and after:

Hand-drawn storyboard (before)	Converted manga page (after)

The model preserves the two-panel split, the character poses, and the battle damage — while adding screentone shading, sharp linework, and manga-style weight. You bring the rough composition; Gemini handles the inking.

The Sketch Flow in Action

Upload your sketches and the agent gets to work — extracting characters,
generating settei sheets, then converting each sketch to polished manga.

Planning & Settei

Results across three sketches:

Rough pencil lines in, publication-ready manga out.

Tech Stack

Layer	Technology
Frontend	Next.js 16, React 19, TypeScript, Tailwind CSS 4
Backend	Python, FastAPI, Uvicorn
Deployment	Google Cloud Run (backend), Render (frontend)
AI	Google Gemini 2.5 Flash + Gemini Image Models (Google GenAI SDK)
Auth	Firebase Authentication + Firebase Admin SDK
Export	html2canvas + jsPDF

What We Learned

Gemini's multimodal input is a powerful consistency tool. Treating character sheets as "visual anchors" passed to every generation call is a reusable pattern for any project needing consistent AI characters — and it extends naturally to new input modes like sketch-to-manga, where user drawings become the anchor instead.
Structured JSON output is underrated. response_schema with Pydantic means zero post-processing of Gemini's text output.
SSE is the right protocol for streaming AI results — simpler than WebSockets for server-to-client streaming, and it works cleanly for both character sheet generation and panel generation as independent streams.
Auto mode changes the product entirely. Letting users skip the step-by-step flow and generate a full manga in one click felt like a different app — and it opened up the question of how much creative control users actually want vs. how much they want magic.

What's Next

Project persistence — save and revisit generated manga from a personal dashboard
Panel regeneration — re-roll individual panels you're not happy with without redoing the whole manga

Source code: github.com/MobileMage/manga-gen

Top comments (16)

Willem van Heemstra • Mar 14

Hi, great stuff you are building! Your GitHub repository is not to be found anymore, unfortunately… Can you share, please?

Olamiposi • Mar 15

Hello, thank you!
Sorry about that, it was set to private initally. You can check it now

Willem van Heemstra • Apr 11

Awesome!

Shane Coughlan • Apr 4

It was very interesting to read about some of the transformations and adjustments for consistency you used. I wanted to make one suggestion: adding a license to the GitHub code so that it is clear if and how others can use it. I didn’t see a license when I checked it out, but please do let me know if I missed it. MIT seems to be a popular default for many AI-related projects.

Olamiposi • Apr 8

Thanks for the kind words and the suggestion! You're correct, we don't have a license on the repo yet. We're currently in the judging phase of the hackathon and have been asked to hold off on any changes to the codebase until winners are announced. Once that's done, we'll add a license (MIT sounds like a great fit). Appreciate you flagging it!

Shane Coughlan • Apr 8

That's really great news! Having a license attached will make it usable for testing in my workflow. Thank you for taking the time to reply, and of course thank you for developing this fascinating code.

Swift • Mar 15

This is awesome. Do you have examples of generated outputs? Would be nice to include in the post if you have time to grab some!

Olamiposi • Mar 16 • Edited

Thanks so much! I've updated the post with a dedicated examples section, you can see the character model sheets Gemini generates, full storyboard pages with screentone and speech bubbles, and a sketch-to-manga before/after. The sketch conversion is honestly one of my favourite parts: you draw the rough composition and Gemini handles the inking, screentones, and line weight.

Swift • Mar 16

Wow! It's even better than I was expecting! Great work

Olamiposi • Mar 16

Thank you very much!🚀

Michael Wirth • Mar 15

Very creative use of Gemini! Did you consider using a platform to make calling multiple AI models easier?

Olamiposi • Mar 15

Thanks! We did consider orchestration frameworks like LangChain early on, but decided against them for this project. Since we're going deep on a single provider (Gemini) rather than swapping models, the abstraction layer didn't buy us much — and the Google GenAI SDK's native async support and multimodal Part API were exactly what we needed for the character sheet consistency technique. Going direct also meant one less dependency and easier debugging when image generation behaved unexpectedly. For a multi-provider setup it'd be a different call though!

Ronnie • Mar 17

"This is amazing work! I love how you handled character consistency with Gemini’s multimodal input — the sketch-to-manga feature is especially impressive. Really inspiring approach for anyone wanting to generate full manga with AI."

Olamiposi • Mar 20

Thanks so much! The multimodal input was honestly the key insight, passing character sheets as image references on every panel call is what made consistency actually work. Glad it's inspiring!