DEV Community

Olamiposi
Olamiposi

Posted on

How We Built an AI Manga Studio with Google Gemini in a Week

We created this piece of content for the purposes of entering the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge

Enpitsu (鉛筆) — AI Manga Studio

Created for the Gemini Live Agent Challenge · #GeminiLiveAgentChallenge


The Idea

Manga is one of the world's most expressive storytelling formats — but creating it requires years of artistic training. We wanted to build something that lets anyone with a story idea produce a real, illustrated manga with consistent characters and cinematic black-and-white panels.

The result is Enpitsu (鉛筆 — Japanese for pencil): a full AI manga studio powered by Google Gemini. You type a story idea, pick a genre, and Enpitsu generates a complete manga — script, character art, and illustrated panels — exported as a PDF.


The Pipeline: Four Steps From Prompt to Manga

Step 1 — Story Script

The first step uses Gemini 2.5 Flash with structured JSON output. We pass the genre and story prompt and get back a full manga script: title, Japanese title, synopsis, characters with visual descriptions, and per-panel scene descriptions with dialogue.

The key Gemini feature here is response_mime_type: "application/json" with a Pydantic response_schema — Gemini returns valid, directly-usable JSON every time, no fragile parsing needed.

response = await client.aio.models.generate_content(
    model="gemini-2.5-flash",
    contents=user_prompt,
    config=GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        response_mime_type="application/json",
        response_schema=StoryResponse,
    ),
)
Enter fullscreen mode Exit fullscreen mode

Step 2 — Character Model Sheets (Settei)

For each character, Gemini's image models generate a professional settei (設定) — the character reference sheets used in real anime production. Each sheet shows front, 3/4, and side views plus emotion expressions, with clean linework on white.

We implemented a three-model fallback chain across available Gemini image preview models so generation degrades gracefully rather than failing.


Step 3 — Panel Generation (The Hard Part)

Generating a single panel is easy. Generating 20+ panels where the same character looks consistent across all of them — that's the hard problem in AI manga generation.

Our solution: pass every character's settei sheet as a multimodal image reference in every single panel generation call, labelled as either "IN THIS PANEL" or "reference only." We also pass the previous panel for visual continuity. This is the core technique that makes Enpitsu work.

for char_name, sheet_bytes in character_sheets.items():
    contents.append(types.Part.from_bytes(data=sheet_bytes, mime_type="image/png"))
    if char_name in present_set:
        contents.append(types.Part.from_text(
            text=f"[CHARACTER REFERENCE — IN THIS PANEL] {char_name} — match this design EXACTLY."
        ))
    else:
        contents.append(types.Part.from_text(
            text=f"[CHARACTER REFERENCE — NOT IN PANEL] {char_name} — provided for style consistency."
        ))
Enter fullscreen mode Exit fullscreen mode

The result is panels where characters stay recognizable from page 1 to page 10.


Step 4 — Reader & PDF Export

The completed manga is displayed in a reader UI and exported as a PDF using html2canvas + jsPDF.


Streaming With SSE

Generating 20+ panels takes time. Instead of making users stare at a spinner, We used Server-Sent Events (SSE) so panels stream to the UI as they're generated — users watch their manga being drawn in real time.

The FastAPI backend yields SSE events from an async generator:

async def event_stream():
    for panel in panels:
        png_bytes = await generate_panel(panel, ...)
        event = PanelGenerationEvent(image_base64=base64.b64encode(png_bytes).decode())
        yield f"data: {event.model_dump_json()}\n\n"
    yield "data: [DONE]\n\n"

return StreamingResponse(event_stream(), media_type="text/event-stream")
Enter fullscreen mode Exit fullscreen mode

Tech Stack

Layer Technology
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS 4
Backend Python, FastAPI, Uvicorn
AI Google Gemini 2.5 Flash + Gemini Image Models (Google GenAI SDK)
Auth Firebase Authentication + Firebase Admin SDK
Export html2canvas + jsPDF

What We Learned

  1. Gemini's multimodal input is a powerful consistency tool. Treating character sheets as "visual anchors" passed to every generation call is a reusable pattern for any project needing consistent AI characters.
  2. Structured JSON output is underrated. response_schema with Pydantic means zero post-processing of Gemini's text output.
  3. SSE is the right protocol for streaming AI results — simpler than WebSockets for server-to-client streaming of generation progress.

What's Next

Phase 2 is LiveKit integration — describe a scene with your voice and watch it generate in real time. Project persistence and panel regeneration are also on the roadmap.

Source code: github.com/MobileMage/manga-gen

Top comments (0)