App Store screenshots are the highest-leverage marketing asset an app has — and the most painful to maintain. Now multiply that pain by 39 languages and 3 device classes. Doing that by hand is not "tedious," it's impossible to keep in sync.
So I built a pipeline that turns one command into ~800 finished, captioned, device-correct screenshots for Cadento, my SwiftUI focus timer. Here's the architecture.
The scale problem
The output target:
| Device | Shots per language | Languages | Total |
|---|---|---|---|
| iPhone 6.9" | 10 | 39 | 390 |
| iPad 13" | 8 | 39 | 312 |
| Apple Watch | 3 | 39 | 117 |
That's 819 images, each needing the right language UI and the right localized caption. Change one screen design and every number above regenerates. Hand-editing is off the table — the only sane answer is "rebuild everything from source on demand."
The pipeline, end to end
XCUITest (per language) → raw localized PNGs
↓
extract from .xcresult
↓
Python + Pillow: compose background + device frame + caption
↓
AppStore画像/<device>/<lang>/1..N.png (exact store dimensions)
Five stages. Each is independently re-runnable.
Stage 1 — Capture real localized screens with XCUITest
The key insight: don't fake screenshots, drive the real app. A UI test launches the app, forces a specific language/locale, navigates to each screen, and snapshots it.
Language and locale come in as environment variables so one test file covers every language:
let lang = ProcessInfo.processInfo.environment["SHOT_LANG"] ?? "en"
let locale = ProcessInfo.processInfo.environment["SHOT_LOCALE"] ?? "en_US"
app.launchArguments += ["-AppleLanguages", "(\(lang))"]
app.launchArguments += ["-AppleLocale", locale]
app.launch()
// navigate + snapshot each screen
let shot = XCTAttachment(screenshot: app.screenshot())
shot.lifetime = .keepAlways
add(shot)
A shell loop runs this once per language. Because it's the actual app, the screenshots are guaranteed to match what users see — including RTL flips for Arabic/Hebrew and text expansion in German.
Stage 2 — Extract PNGs from the .xcresult
XCUITest buries screenshots inside an .xcresult bundle. A small Python script walks the result and pulls out the raw PNGs into a flat per-language folder. Nothing clever — just plumbing so the next stage has clean inputs.
Stage 3 — Compose with Python + Pillow
This is where raw screens become marketing. For each shot, Pillow:
- Draws the branded background (generated separately, app-themed gradients)
- Places the device frame
- Drops the raw screenshot into the frame at the correct offset
- Renders the localized caption on top — pulled from a per-language strings map
The caption text is itself localized (39 languages of ASO copy), so the marketing message reads natively, not just the UI underneath it. Font fallback matters here: CJK, Arabic, Hebrew, Thai, and Devanagari all need the right font or you get tofu (□□□).
Stage 4 — Live Activity & Watch shots
Live Activity (Dynamic Island / lock screen) and Apple Watch screens are generated through their own paths and folded into the same compositor, so the final set is consistent across all surfaces.
Stage 5 — Output to exact store dimensions
Everything lands in a predictable tree at the exact pixel sizes App Store Connect requires:
AppStore画像/iPhone_6.9/<lang>/1..10.png (1320×2868)
AppStore画像/iPad_13/<lang>/1..8.png (2064×2752)
AppStore画像/AppleWatch/<lang>/1..3.png (410×502)
From here it's a straight upload (I drive App Store Connect's API to swap a single device's set without touching the others — but that's another post).
Lessons from running it for real
- Drive the real app, don't mock. The whole value is that screenshots can't lie about what the UI does in each language.
- Environment variables > 39 test targets. One parameterized UI test beats copy-pasted code every time.
- Font fallback is not optional. Test the hardest scripts (Arabic, Thai, Hindi, CJK) early or you'll ship boxes.
- Make every stage idempotent. A design change should be one command away from 819 fresh images, not a weekend.
- Separate UI capture from caption rendering. Redesign the screen? Re-run stage 1. Rewrite the marketing copy? Re-run stage 3. They shouldn't be coupled.
The payoff: when I change a screen or a tagline, I'm not dreading a manual marathon. I run the pipeline, and the entire localized store presence updates itself.
I'm a solo iOS developer from Japan building small, deeply localized apps. Cadento (focus timer, 39 languages) is on the App Store. Ask me anything about the pipeline in the comments.
Top comments (0)