Mee Mee Alainmar

Posted on Jun 15

Building Perri: A Comic Strip Generator

#huggingface #ai #python

Meet Perri Comic Generator, a lightweight, single-panel comic creator that merges LLM-driven storytelling with real-time diffusion models. By pairing an Gradio frontend with a high-performance backend, Perri orchestrates a seamless pipeline: it takes a simple story seed, structures it into a panel description, generates the art, and burns the dialogue right onto the final image.

The best part? It achieves all of this without massive, resource-heavy infrastructure. Every AI model under Perri's hood is under 32 billion parameters, proving that you don't need giant, compute-heavy models to build something amazing.

Here is a look inside the architecture and tech stack that powers Perri.

The Technical Architecture

Perri is built using a clean separation of concerns, splitting the heavy lifting of generation away from the user interface.

1. The Frontend (`app.py`)

Built using Gradio 6.16.0, the frontend provides a sleek, user-friendly interface for inputting story seeds. To match the creative spirit of comics, the UI utilizes a custom theme, incorporating a vintage aesthetic complete with star-twinkle CSS overlays.

The frontend's main jobs are:

Capturing the user's initial prompt.
Shipping the payload to the backend infrastructure via secure API requests.
Decoding the backend's response—a Base64-encoded JPEG—and rendering it within the Gradio image component.

2. The Backend Orchestrator (`orchestrator.py`)

The orchestrator acts as the brain of the operation, executing three distinct phases in the lifecycle of a single comic panel:

Script Generation: It refines the user's raw prompt into a highly structured visual script and dialogue snippet using meta-llama/Meta-Llama-3-8B-Instruct.
Image Generation: It passes the visual description to stabilityai/sdxl-turbo to synthesize the retro comic art.
Dialogue Overlay Composition: Instead of relying on separate text captions, the orchestrator dynamically draws the generated dialogue directly onto the JPEG image, ensuring an authentic comic book feel.

The Lean AI Stack (Under 32B Parameters)

Modern AI development often leans toward massive foundational models, but Perri prioritizes speed, efficiency, and cost-effectiveness by utilizing specialized models that punch well above their weight class.

Model Role	Model Used	Parameter Size	Why It Was Chosen
Story & Scripting	`meta-llama/Meta-Llama-3-8B-Instruct`	8 Billion	Delivers highly precise, structured instruction-following for scripting without the latency of larger LLMs.
Art Generation	`stabilityai/sdxl-turbo`	~3.5 Billion	A single-step adversarial diffusion model that generates high-quality comic art in a fraction of a second.

By keeping all models well under the 32B threshold, the entire pipeline can run on highly optimized, consumer-accessible cloud GPUs, keeping latency low and the user experience snappy.

Deployment & Infrastructure

Perri is configured to run effortlessly in the cloud but is designed with a decoupled infrastructure:

Hugging Face Spaces: Hosts the Gradio frontend under an open-source MIT license, providing an easily shareable link for the community.
Modal Labs (MODAL_ENDPOINT_URL): Powers the backend worker pool. Modal allows the image generation and orchestrator logic to scale to zero when not in use, saving compute costs while offering rapid cold-start times when a user requests a comic.

Environment Variables Required

To bridge the frontend and backend securely, the application relies on two key environment secrets:

HF_TOKEN: For authenticating requests to Hugging Face hubs and spaces.
MODAL_ENDPOINT_URL: Directs the frontend UI to the serverless backend worker.

Running Perri Locally

Want to experiment with the theme or modify the layout? You can spin up the frontend locally in just a few steps.

Clone the repository and install your dependencies (including gradio).
Set up your .env file with your MODAL_ENDPOINT_URL.
Launch the application:

python app.py

Wrapping Up

Perri Comic Generator demonstrates how small, specialized models can be chained together to build rich, creative applications. By leveraging an 8B LLM for structuring thoughts and a fast Turbo diffusion model for generation, Perri delivers a nostalgic, automated comic-creation experience without the overhead of massive enterprise AI infrastructure.

DEV Community

Building Perri: A Comic Strip Generator

The Technical Architecture

1. The Frontend (`app.py`)

2. The Backend Orchestrator (`orchestrator.py`)

The Lean AI Stack (Under 32B Parameters)

Deployment & Infrastructure

Environment Variables Required

Running Perri Locally

Wrapping Up

Top comments (0)

The Technical Architecture

1. The Frontend (app.py)

2. The Backend Orchestrator (orchestrator.py)

The Lean AI Stack (Under 32B Parameters)

Deployment & Infrastructure

Environment Variables Required

Running Perri Locally

Wrapping Up

1. The Frontend (`app.py`)

2. The Backend Orchestrator (`orchestrator.py`)