We created this piece of content for the purposes of entering the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge
Enpitsu (鉛筆) — AI Manga Studio
Created for the Gemini Live Agent Challenge · #GeminiLiveAgentChallenge
The Idea
Manga is one of the world's most expressive storytelling formats — but creating it requires years of artistic training. We wanted to build something that lets anyone with a story idea produce a real, illustrated manga with consistent characters and cinematic black-and-white panels.
The result is Enpitsu (鉛筆 — Japanese for pencil): a full AI manga studio powered by Google Gemini. You type a story idea, pick a genre, and Enpitsu generates a complete manga — script, character art, and illustrated panels — exported as a PDF.
The Pipeline: Four Steps From Prompt to Manga
Step 1 — Story Script
The first step uses Gemini 2.5 Flash with structured JSON output. We pass the genre and story prompt and get back a full manga script: title, Japanese title, synopsis, characters with visual descriptions, and per-panel scene descriptions with dialogue.
The key Gemini feature here is response_mime_type: "application/json" with a Pydantic response_schema — Gemini returns valid, directly-usable JSON every time, no fragile parsing needed.
response = await client.aio.models.generate_content(
model="gemini-2.5-flash",
contents=user_prompt,
config=GenerateContentConfig(
system_instruction=SYSTEM_PROMPT,
response_mime_type="application/json",
response_schema=StoryResponse,
),
)
Step 2 — Character Model Sheets (Settei)
For each character, Gemini's image models generate a professional settei (設定) — the character reference sheets used in real anime production. Each sheet shows front, 3/4, and side views plus emotion expressions, with clean linework on white.
We implemented a three-model fallback chain across available Gemini image preview models so generation degrades gracefully rather than failing.
Step 3 — Panel Generation (The Hard Part)
Generating a single panel is easy. Generating 20+ panels where the same character looks consistent across all of them — that's the hard problem in AI manga generation.
Our solution: pass every character's settei sheet as a multimodal image reference in every single panel generation call, labelled as either "IN THIS PANEL" or "reference only." We also pass the previous panel for visual continuity. This is the core technique that makes Enpitsu work.
for char_name, sheet_bytes in character_sheets.items():
contents.append(types.Part.from_bytes(data=sheet_bytes, mime_type="image/png"))
if char_name in present_set:
contents.append(types.Part.from_text(
text=f"[CHARACTER REFERENCE — IN THIS PANEL] {char_name} — match this design EXACTLY."
))
else:
contents.append(types.Part.from_text(
text=f"[CHARACTER REFERENCE — NOT IN PANEL] {char_name} — provided for style consistency."
))
The result is panels where characters stay recognizable from page 1 to page 10.
Step 4 — Reader & PDF Export
The completed manga is displayed in a reader UI and exported as a PDF using html2canvas + jsPDF.
Streaming With SSE
Generating 20+ panels takes time. Instead of making users stare at a spinner, we used Server-Sent Events (SSE) so both character sheets and panels stream to the UI as they're generated — users watch their manga being drawn in real time.
The FastAPI backend yields SSE events from an async generator:
async def event_stream():
previous_panel_bytes: bytes | None = None
for page in request.pages:
previous_panel_bytes = None # reset at each new page
for panel in page.panels:
png_bytes = await generate_panel(
panel=panel,
character_sheets=character_sheet_bytes,
previous_panel_bytes=previous_panel_bytes,
...
)
b64 = base64.b64encode(png_bytes).decode("ascii")
event = PanelGenerationEvent(
page_number=page.page_number,
panel_number=panel.panel_number,
image_base64=b64,
status="complete",
)
previous_panel_bytes = png_bytes
yield f"data: {event.model_dump_json()}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(
event_stream(),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
Two things worth noting: previous_panel_bytes is threaded through each call so Gemini can see the previous panel and maintain visual continuity. And X-Accel-Buffering: no is essential when running behind a reverse proxy (Cloud Run, Render) — without it, the proxy buffers the stream and the real-time effect disappears.
Generated Output Examples
Here's a look at what Enpitsu actually produces end-to-end.
Character Model Sheets (Settei)
Gemini generates professional reference sheets showing front, 3/4, and side views alongside expression samples — exactly as used in anime production pipelines. Here's the settei for Alien Minion, one of the antagonists from a sci-fi shōnen story:
Panel Artwork
And here's what the storyboard step produces — full manga pages with speech bubbles, screentone shading, speed lines, and dynamic panel layouts:
Sketch-to-Manga Conversion
One feature we didn't cover in the main pipeline: Enpitsu can take your own hand-drawn sketches and convert them into polished manga artwork while preserving your original composition. Here's a page before and after:
| Hand-drawn storyboard (before) | Converted manga page (after) |
|---|---|
![]() |
![]() |
The model preserves the two-panel split, the character poses, and the battle damage — while adding screentone shading, sharp linework, and manga-style weight. You bring the rough composition; Gemini handles the inking.
The Sketch Flow in Action
Upload your sketches and the agent gets to work — extracting characters,
generating settei sheets, then converting each sketch to polished manga.
| Planning & Settei |
|---|
![]() |
Results across three sketches:
![]() |
![]() |
![]() |
Rough pencil lines in, publication-ready manga out.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS 4 |
| Backend | Python, FastAPI, Uvicorn |
| Deployment | Google Cloud Run (backend), Render (frontend) |
| AI | Google Gemini 2.5 Flash + Gemini Image Models (Google GenAI SDK) |
| Auth | Firebase Authentication + Firebase Admin SDK |
| Export | html2canvas + jsPDF |
What We Learned
- Gemini's multimodal input is a powerful consistency tool. Treating character sheets as "visual anchors" passed to every generation call is a reusable pattern for any project needing consistent AI characters — and it extends naturally to new input modes like sketch-to-manga, where user drawings become the anchor instead.
-
Structured JSON output is underrated.
response_schemawith Pydantic means zero post-processing of Gemini's text output. - SSE is the right protocol for streaming AI results — simpler than WebSockets for server-to-client streaming, and it works cleanly for both character sheet generation and panel generation as independent streams.
- Auto mode changes the product entirely. Letting users skip the step-by-step flow and generate a full manga in one click felt like a different app — and it opened up the question of how much creative control users actually want vs. how much they want magic.
What's Next
- Project persistence — save and revisit generated manga from a personal dashboard
- Panel regeneration — re-roll individual panels you're not happy with without redoing the whole manga
Source code: github.com/MobileMage/manga-gen








Top comments (13)
This is awesome. Do you have examples of generated outputs? Would be nice to include in the post if you have time to grab some!
Thanks so much! I've updated the post with a dedicated examples section, you can see the character model sheets Gemini generates, full storyboard pages with screentone and speech bubbles, and a sketch-to-manga before/after. The sketch conversion is honestly one of my favourite parts: you draw the rough composition and Gemini handles the inking, screentones, and line weight.
Wow! It's even better than I was expecting! Great work
Thank you very much!🚀
"This is amazing work! I love how you handled character consistency with Gemini’s multimodal input — the sketch-to-manga feature is especially impressive. Really inspiring approach for anyone wanting to generate full manga with AI."
Thanks so much! The multimodal input was honestly the key insight, passing character sheets as image references on every panel call is what made consistency actually work. Glad it's inspiring!
Hi, great stuff you are building! Your GitHub repository is not to be found anymore, unfortunately… Can you share, please?
Hello, thank you!
Sorry about that, it was set to private initally. You can check it now
Very creative use of Gemini! Did you consider using a platform to make calling multiple AI models easier?
Thanks! We did consider orchestration frameworks like LangChain early on, but decided against them for this project. Since we're going deep on a single provider (Gemini) rather than swapping models, the abstraction layer didn't buy us much — and the Google GenAI SDK's native async support and multimodal Part API were exactly what we needed for the character sheet consistency technique. Going direct also meant one less dependency and easier debugging when image generation behaved unexpectedly. For a multi-provider setup it'd be a different call though!
This really resonated with me.
Really appreciate you saying that! It was a fun week of building, hope it sparks some ideas for your own projects.
It was very interesting to read about some of the transformations and adjustments for consistency you used. I wanted to make one suggestion: adding a license to the GitHub code so that it is clear if and how others can use it. I didn’t see a license when I checked it out, but please do let me know if I missed it. MIT seems to be a popular default for many AI-related projects.