Apple's Neural Engine can process 35 trillion operations per second on the A17 Pro. Most of that power sits unused while you pay monthly subscriptions to ask questions on someone else's server.
Off Grid is a free, open-source app that runs large language models directly on your iPhone. No internet after the first download. No iCloud. No Apple Intelligence required. Just your phone and a model.
What You Need
Minimum: iPhone 12 or newer (A14 chip), iOS 17+, 4GB+ RAM. Smaller models (0.6B to 1B) will run fine.
Recommended: iPhone 15 Pro or newer (A17 Pro or later), 8GB RAM. This is where on-device AI gets genuinely useful. 3B to 7B models run smoothly with hardware acceleration via Metal and the Apple Neural Engine.
Storage note: iPhones don't have expandable storage. Models range from 80MB to 4GB+. A 64GB iPhone with lots of photos might not have room for multiple large models. Check your available storage before downloading.
What Off Grid Can Do on iPhone
Six AI capabilities in one app, all running on your phone's silicon:
Text generation. Run Qwen 3, Llama 3.2, Gemma 3, Phi-4, or any GGUF model. Uses llama.cpp via Metal for GPU acceleration. Streaming responses with markdown rendering. 15 to 30 tokens per second on A17 Pro and later.
Image generation. On-device Stable Diffusion through Apple's ml-stable-diffusion pipeline with Core ML and Neural Engine acceleration. 8 to 15 seconds per image on iPhone 15 Pro. 20+ models available.
Vision AI. Attach a photo or use your camera and ask questions about what you see. SmolVLM and Qwen3-VL supported.
Voice transcription. On-device Whisper speech to text. Real-time partial transcription as you speak. No audio leaves your phone.
Tool calling. The model can chain web search, calculator, date/time, and device info together in an automatic loop. Works with models that support function calling format.
Document analysis. Attach PDFs, code files, CSVs, and more to your conversations.
Onboarding
|
Text Generation
|
Image Generation
|
Vision
|
Attachments
|
Which Models to Use on iPhone
Off Grid's model browser filters by your device so you never download something that won't run. Here's what works:
iPhone 12/13 (4GB RAM): Qwen 3 0.6B or SmolLM3 360M. Expect 8 to 15 tokens per second. Good for short answers and simple tasks.
iPhone 14/15 (6GB RAM): Qwen 3 1.5B or Phi-4 Mini. Noticeably better quality. 15 to 25 tokens per second with Metal acceleration.
iPhone 15 Pro/16 Pro (8GB RAM): The sweet spot. Llama 3.2 3B, Qwen 3 4B, or Gemma 3 run well. Quality at this size is genuinely useful for drafting, summarization, coding help, and analysis. 20 to 30+ tokens per second.
Quantization: Q4_K_M gives you the best balance of size, speed, and quality. Don't go below Q3 unless storage is very tight.
How iOS Hardware Acceleration Works
Apple's chips have three compute paths and Off Grid uses them automatically:
Metal (GPU): Available on all modern iPhones. Handles general purpose parallel computation. This is what llama.cpp uses for GPU-accelerated text inference.
Apple Neural Engine (ANE): A dedicated AI accelerator. Extremely fast and power efficient. Core ML targets the ANE directly for image generation.
CPU: Always available as a fallback. Slower but works for smaller models.
The advantage of iOS over Android: Apple's hardware and software stack is tightly integrated. If Off Grid works on one iPhone 15 Pro, it works on all of them. No fragmentation.
The KV Cache Trick That Triples Your Speed
Off Grid lets you configure KV cache quantization in settings. The KV cache stores your conversation context. By default it uses f16 (16-bit). Switching to q4_0 (4-bit) roughly triples inference speed with minimal quality impact.
The app nudges you to optimize after your first generation. This is the single biggest performance improvement you can make.
Memory Management on iOS
iOS is more aggressive about killing background apps than Android. Off Grid handles this with lifecycle-independent services. Text and image generation continue running even when you navigate away from the chat screen. But if you leave the app for a long time and iOS reclaims memory, you may need to reload the model.
The RAM budget is tighter on iOS. On an 8GB iPhone, you realistically have 4 to 5GB available. A 7B Q4 model needs about 5.5GB at runtime. It will fit but just barely.
Practical advice: start with a 1.5B to 3B model. If it runs smoothly, try the next size up. If the app closes unexpectedly, the model is too large for your device.
Privacy: Stronger Than Apple Intelligence
Apple Intelligence uses Private Cloud Compute for tasks that exceed on-device capability. Apple says it's end to end encrypted. You're trusting Apple.
Off Grid is private in a stronger sense. There is no cloud component. The computation happens entirely on your phone. No network requests after model download. Verify it yourself: turn on airplane mode and everything works. The code is open source, MIT licensed. No analytics, no telemetry, no accounts.
For people handling sensitive data (medical, legal, financial, proprietary business information, personal journaling), the difference between "a company promises privacy" and "there is no server to send data to" matters.
Getting Started
- Install Off Grid from the App Store
- Browse recommended models filtered for your device
- Download a model over WiFi
- Enable airplane mode to verify offline capability
- Start chatting
Switch KV cache quantization to q4_0 in settings for the best speed. The quality difference is negligible for most conversations.
What's Coming
Apple's Neural Engine gets more powerful with every chip generation. The A18 has a 16-core Neural Engine. Off Grid ships updates weekly, with tool calling, configurable KV cache, and vision support all added in the last month.
Check the GitHub for the latest releases and the roadmap.
The gap between a 3B model on your iPhone and a 70B model in the cloud is real today. But for the tasks you actually do on your phone, local models are already good enough. And they're getting better every quarter.

Onboarding
Text Generation
Image Generation
Vision
Attachments
Top comments (0)