Manya Shree Vangimalla

Posted on Apr 20

How I Built a Magical Comic Book Generator with GenAI — NVIDIA Hackathon Winner 🏆

#genai #llm #javascript #nextjs

What if anyone could walk in, type a story idea, and walk out with a fully illustrated, personalized comic book powered entirely by AI?

That was the challenge I set for myself at the NVIDIA Hackathon. The result: Magical Comic Book, a GenAI-powered web app that turns natural language prompts into illustrated comic panels in real time. And we won. 🏆

The Idea

The concept was simple on the surface: let users describe a story, and have AI generate both the narrative and the visuals. But building it end-to-end in hackathon time with production-quality output was a different beast entirely.

The Tech Stack

Frontend: Next.js + React + Redux for a fast, reactive UI with panel-by-panel story rendering
Backend: Node.js with RESTful APIs connecting the frontend to AI inference pipelines
Story Generation: NVIDIA Nemotron LLM for narrative text generation and prompt engineering
Image Synthesis: Stable Diffusion XL for generating comic-style panel illustrations
Deployment: Vercel for scalable, zero-config frontend deployment

How It Works

User enters a story prompt — e.g., "A young girl discovers a dragon living in her school library"
Nemotron generates the story — broken into comic panels with scene descriptions and dialogue
SDXL renders each panel — using the scene descriptions as image generation prompts
The UI assembles the comic — panels flow into a readable, styled comic book layout in real time

The Engineering Challenges

Prompt Engineering at Speed

Getting Nemotron to output structured, panel-ready story content consistently required careful prompt design. I built a prompt template system that enforced JSON-structured output — panel number, scene description, character dialogue — so the frontend could render without extra parsing logic.

Latency vs. Quality

SDXL image generation is not instant. I implemented a streaming panel-reveal approach — panels load progressively as they're generated — so the user experience feels responsive even while the pipeline runs.

Reusable GenAI Pipeline Components

I designed the backend as a set of composable pipeline steps: prompt formatting → LLM inference → image prompt extraction → image generation → panel assembly. Each step is decoupled and independently testable, making the architecture easy to extend post-hackathon.

What I Learned

Building a GenAI application under time pressure teaches you things no tutorial can. A few takeaways:

Structured outputs from LLMs are non-negotiable for any downstream automation. Freeform text is the enemy of reliable pipelines.
User experience design matters as much as model quality. A slow but beautiful loading experience beats a fast but jarring one.
Model orchestration is its own engineering discipline. Chaining LLMs and diffusion models reliably requires thinking carefully about error handling, retries, and fallbacks.

What's Next

I'm exploring adding:

User accounts and a comic library to save and share creations
Style selection (manga, superhero, watercolor) to guide SDXL outputs
Voice narration using a TTS model for an immersive reading experience

If you're curious about the code, check out the GitHub repo. I'd love to hear from other GenAI builders — what challenges have you hit when chaining LLMs with image models?

Drop a comment below 👇

DEV Community