I have been testing how far Stable Diffusion 1.5 can be pushed running locally on iPhone 17 and how good the results are.
For this benchmark I used PhoneDiffusion with three SD 1.5 model packs:
- CyberRealistic
- DreamShaper 8 LCM
- Realistic Vision V5.1 Hyper
They have different assumptions baked into their recommended settings, schedulers and step counts. That makes them a good small test set for local image generation, because they represent three different modes: quality pass, fast iteration and very fast draft generation.
The Benchmark
The test was intentionally simple.
I used three prompts:
- a cyberpunk penthouse
- an anime forest elder
- a futuristic cybertruck
Each model generated three images per prompt. That produced 27 images total. For the final comparison, I selected the strongest visual result from each model/prompt pair, giving 9 selected outputs.
All images were generated at 512 x 512. The timings came from warm PhoneDiffusion generation metadata, with the model packs already installed and prepared. Compute used CPU + Neural Engine.
This is not a cold-start benchmark. It is closer to what a user sees after the app is open, the model is ready, and they are iterating on prompts.
Why the Step Counts Are Different
A common way to benchmark models is to force every model through the same settings.
That would make this comparison less useful.
CyberRealistic, DreamShaper LCM and Realistic Vision Hyper are meant to run differently. If every model were tested at the same step count and CFG, the result would mostly measure bad configuration choices.
Model Benchmark for iPhone 17
CyberRealistic behaves like the quality-oriented model in this group.
At 30 steps / CFG 7, it averaged 13.7s across the selected outputs. That is not instant, but the extra time shows up in the images. The cyberpunk room has more structure and surface detail. The cybertruck result looks more intentionally designed.
DreamShaper 8 LCM is the most balanced result.
At 10 steps / CFG 2, it averaged 4.7s on selected outputs. That is fast enough to keep trying prompts without losing momentum, while still producing images that are visually coherent. This is the one I would use for most exploration work.
Realistic Vision V5.1 Hyper is the speed configuration.
At 6 steps / CFG 1.5, it averaged 3.2s for selected outputs. That is a different interaction model. You can test prompt wording, composition, and subject placement quickly, then switch to a slower model when the direction is right.
If you want to try it out yourself you can learn how to run Stable Diffusion on your iPhone here:
Download Phonediffusion here: https://apps.apple.com/us/app/phonediffusion/id6762061991



Top comments (0)