The first version of our hairstyle preview tool made 8 separate gpt-image-2 API calls — one per hairstyle. It worked. It was also $0.32 per preview, took 40 seconds, and the faces drifted between calls (each generation re-derived the face from the prompt + uploaded image).
This post is about how we cut that to a single API call producing a 9-grid (1 reference + 8 variants) — same face, lower cost, faster, and weirdly easier to prompt.
The 8-call problem
Naive architecture:
for hairstyle in ['crew cut', 'mid fade', ...]:
img = gpt_image_2.generate(
prompt=f"User's face with {hairstyle} hairstyle",
reference=user_selfie,
)
grid.add(img)
Three problems compound:
Cost. 8 calls × $0.04 each = $0.32. We're selling at $0.99/test — margin is fine but eats fast at scale.
Latency. 8 sequential calls = ~40s. Parallel cuts to ~5s if you can, but rate limits and queue priority mean parallelization is unreliable. Users see a spinner.
Face drift. Each call independently interprets "user's face with X." The model re-imagines facial proportions slightly differently each time. Side-by-side, the 8 outputs don't look like the same person. UX killer for a "compare hairstyles on YOUR face" tool.
The single-call fix
We rewrote the prompt to request a 9-grid in one shot:
A 3x3 grid showing the same person with 9 different hairstyles.
Grid positions:
[1] reference: original photo, unchanged
[2] Crew Cut
[3] Mid Fade
[4] Wavy Side Part
[5] Caesar Cut
[6] Long Straight
[7] Quiff
[8] Surfer Waves
[9] Buzz Cut
Constraints:
- Same person in all 9 cells (consistent face, age, skin)
- Same lighting and angle across cells
- Only hair varies between cells
- Each cell separated by a thin white border
Three benefits:
1 API call = $0.04, not $0.32. 8x cost reduction.
~6s vs ~40s. Single-call latency, no parallel-queue gambling.
Face consistency by construction. The model treats all 9 cells as one coherent image, so facial features stay identical. No drift.
Prompt-engineering challenges
It wasn't free. Three things we had to work out:
Layout discipline. Without explicit "3x3 grid" + "separate cells", gpt-image-2 would blend or overlap. The thin white border instruction was crucial.
Cell ordering. First attempt was "list hairstyles in row-major order" and we got random placement. Switching to "Grid positions: [N] hairstyle" with numbered slots gave deterministic placement (which we needed for the UI to label cells correctly).
Hairstyle distinctiveness. Some styles (Crew Cut vs Buzz Cut) look similar at 1/9th of an image. We had to swap in more visually-distinct sets so user choices were meaningful.
What we'd do differently
The 9-grid is locked at 8 variants. If the model could accept "show me 16 styles", we'd offer that. Current cap is real — gpt-image-2 maintains identity well at 9 cells, less reliably at 16+. (The model is doing more work in less canvas space per cell.)
Long-term: per-cell quality + identity preservation will improve as models scale. For now, 8 is the sweet spot.
Try it
If you want to see what 9-grid hairstyle previews look like in practice, AI Omoggle is the tool — single test from $0.99, no photos stored.
I'd love to hear from anyone doing similar single-call multi-variant prompts. The "compose in one image, slice in UI" pattern feels like it generalizes to other AI image use cases.
Top comments (0)