Struggling with Research Figures? Here's How Multi-Agent Collaboration Gets It Right
The Problem Every Researcher Knows Too Well
Anyone who's done research knows this pain: creating a single figure from concept to completion can be more exhausting than writing the actual paper. You need logical structure, data precision, and style compliance—miss any one of these, and you're back to the drawing board.
Single-model AI generation tools often produce beautiful images with broken logic, or logically sound diagrams that look terrible, or worst of all—figures where all the proportions are completely off.
PaperBanana solved this problem, and it works remarkably well. The key insight? Break the task into multiple roles and let an AI team collaborate.
Why Traditional AI Falls Short
Many assume that throwing a large language model at the problem should work. But research figures aren't ordinary illustrations—they need to accurately express logic, ensure data precision, and ultimately meet academic journal aesthetics.
A single model can't nail all three at once. The result? Either gorgeous images with completely wrong logic, or logically correct diagrams that look like they're from the '90s, and almost always with numerical proportions that make no sense.
This is the core pain point of research figure generation, and exactly why solutions like PaperBanana emerged.
PaperBanana's Five-Role Collaboration
PaperBanana's design philosophy is simple: Split the generation task into five specialized roles, let each focus on what they do best, then collaborate iteratively.
The Visual Workflow
1. Retriever — The Inspiration Board
The Retriever searches through a curated reference database to find the most relevant examples.
It focuses on visual structure matching, ensuring that subsequent generation has reliable layout references to work from.
Think of it like a designer browsing templates before starting to sketch—that's what the Retriever does.
2. Planner — The Skeleton Designer
The Planner is the core brain. It transforms paper descriptions and figure objectives into detailed figure plans, including:
Figure components (nodes/modules)
Logical relationships and arrow directions between components
Spatial layout suggestions
Labels, annotations, etc.
The Planner's core job is to provide the skeleton, preventing the generation from going off the rails.
3. Stylist — The Aesthetic Director
With the skeleton in place, the Stylist handles the aesthetics.
It extracts colors, fonts, line weights, and shapes from reference examples, optimizing the Planner's output to meet journal standards.
NeurIPS and Nature have different figure styles—the Stylist ensures generated figures comply with academic norms.
4. Visualizer — The Executor
The Visualizer generates figures based on the standardized plan:
Method figures → Rendered using high-quality image generation models
Data charts → Outputs reproducible Matplotlib code
This means generated figures aren't just pretty—they're directly usable as research materials, reproducible and modifiable.
5. Critic — The QA/Feedback Loop
The Critic is key to closing the loop. It checks whether the figure faithfully reflects the text, whether it's clear, and whether it meets style specifications.
If unsatisfied, it provides revision suggestions, prompting the Planner/Visualizer to iterate. Usually 2–3 rounds produce high-quality figures.
Why Multi-Role Collaboration Works
Compared to single-model end-to-end generation, PaperBanana has three major advantages:
Reference-driven: The Retriever provides structural and stylistic examples, making generation more reliable
Clear division of labor: Logic, style, and rendering are separated, avoiding the chaos of black-box generation
Closed-loop self-checking: Critic + iteration makes figure quality controllable
In other words, this is a process innovation for AI-assisted research figure creation. In experiments, PaperBanana significantly outperformed baselines in fidelity, readability, and aesthetics.
If you're interested in the design of this scenario, I've compiled the complete Prompt set—grab it below 👇
Beyond Academic Figures
This multi-role collaboration pattern isn't limited to academic illustrations.
For flowcharts, experimental design diagrams, teaching demonstrations, automated data visualization, and even complex tasks like code generation and decision planning, multi-agent collaboration proves more reliable.
Top comments (0)