Local iPhone Stable Diffusion 1.5 Benchmark - ridiculously fast generations!

#coreml #ai #stablediffusion #aiart

I have been testing how far Stable Diffusion 1.5 can be pushed running locally on iPhone 17 and how good the results are.

For this benchmark I used PhoneDiffusion with three SD 1.5 model packs:

CyberRealistic
DreamShaper 8 LCM
Realistic Vision V5.1 Hyper

They have different assumptions baked into their recommended settings, schedulers and step counts. That makes them a good small test set for local image generation, because they represent three different modes: quality pass, fast iteration and very fast draft generation.

The Benchmark

The test was intentionally simple.

I used three prompts:

a cyberpunk penthouse
an anime forest elder
a futuristic cybertruck

Each model generated three images per prompt. That produced 27 images total. For the final comparison, I selected the strongest visual result from each model/prompt pair, giving 9 selected outputs.

All images were generated at 512 x 512. The timings came from warm PhoneDiffusion generation metadata, with the model packs already installed and prepared. Compute used CPU + Neural Engine.

This is not a cold-start benchmark. It is closer to what a user sees after the app is open, the model is ready, and they are iterating on prompts.

Why the Step Counts Are Different
A common way to benchmark models is to force every model through the same settings.

That would make this comparison less useful.

CyberRealistic, DreamShaper LCM and Realistic Vision Hyper are meant to run differently. If every model were tested at the same step count and CFG, the result would mostly measure bad configuration choices.

Model Benchmark for iPhone 17

CyberRealistic behaves like the quality-oriented model in this group.

At 30 steps / CFG 7, it averaged 13.7s across the selected outputs. That is not instant, but the extra time shows up in the images. The cyberpunk room has more structure and surface detail. The cybertruck result looks more intentionally designed.

DreamShaper 8 LCM is the most balanced result.

At 10 steps / CFG 2, it averaged 4.7s on selected outputs. That is fast enough to keep trying prompts without losing momentum, while still producing images that are visually coherent. This is the one I would use for most exploration work.

Realistic Vision V5.1 Hyper is the speed configuration.

At 6 steps / CFG 1.5, it averaged 3.2s for selected outputs. That is a different interaction model. You can test prompt wording, composition, and subject placement quickly, then switch to a slower model when the direction is right.