DEV Community

汪小春
汪小春

Posted on

One gpt-image-2 call, 9 hairstyle variants: prompt engineering for grid layouts

The first version of our hairstyle preview tool made 8 separate gpt-image-2 API calls — one per hairstyle. It worked. It was also $0.32 per preview, took 40 seconds, and the faces drifted between calls (each generation re-derived the face from the prompt + uploaded image).

This post is about how we cut that to a single API call producing a 9-grid (1 reference + 8 variants) — same face, lower cost, faster, and weirdly easier to prompt.

The 8-call problem

Naive architecture:

for hairstyle in ['crew cut', 'mid fade', ...]:
    img = gpt_image_2.generate(
        prompt=f"User's face with {hairstyle} hairstyle",
        reference=user_selfie,
    )
    grid.add(img)
Enter fullscreen mode Exit fullscreen mode

Three problems compound:

Cost. 8 calls × $0.04 each = $0.32. We're selling at $0.99/test — margin is fine but eats fast at scale.

Latency. 8 sequential calls = ~40s. Parallel cuts to ~5s if you can, but rate limits and queue priority mean parallelization is unreliable. Users see a spinner.

Face drift. Each call independently interprets "user's face with X." The model re-imagines facial proportions slightly differently each time. Side-by-side, the 8 outputs don't look like the same person. UX killer for a "compare hairstyles on YOUR face" tool.

The single-call fix

We rewrote the prompt to request a 9-grid in one shot:

A 3x3 grid showing the same person with 9 different hairstyles.

Grid positions:
[1] reference: original photo, unchanged
[2] Crew Cut
[3] Mid Fade
[4] Wavy Side Part
[5] Caesar Cut
[6] Long Straight
[7] Quiff
[8] Surfer Waves
[9] Buzz Cut

Constraints:
- Same person in all 9 cells (consistent face, age, skin)
- Same lighting and angle across cells
- Only hair varies between cells
- Each cell separated by a thin white border
Enter fullscreen mode Exit fullscreen mode

Three benefits:

1 API call = $0.04, not $0.32. 8x cost reduction.

~6s vs ~40s. Single-call latency, no parallel-queue gambling.

Face consistency by construction. The model treats all 9 cells as one coherent image, so facial features stay identical. No drift.

Prompt-engineering challenges

It wasn't free. Three things we had to work out:

Layout discipline. Without explicit "3x3 grid" + "separate cells", gpt-image-2 would blend or overlap. The thin white border instruction was crucial.

Cell ordering. First attempt was "list hairstyles in row-major order" and we got random placement. Switching to "Grid positions: [N] hairstyle" with numbered slots gave deterministic placement (which we needed for the UI to label cells correctly).

Hairstyle distinctiveness. Some styles (Crew Cut vs Buzz Cut) look similar at 1/9th of an image. We had to swap in more visually-distinct sets so user choices were meaningful.

What we'd do differently

The 9-grid is locked at 8 variants. If the model could accept "show me 16 styles", we'd offer that. Current cap is real — gpt-image-2 maintains identity well at 9 cells, less reliably at 16+. (The model is doing more work in less canvas space per cell.)

Long-term: per-cell quality + identity preservation will improve as models scale. For now, 8 is the sweet spot.

Try it

If you want to see what 9-grid hairstyle previews look like in practice, AI Omoggle is the tool — single test from $0.99, no photos stored.

I'd love to hear from anyone doing similar single-call multi-variant prompts. The "compose in one image, slice in UI" pattern feels like it generalizes to other AI image use cases.

Top comments (0)