How to Run Gemma 4 on Your iPhone in 2026 (Completely Offline, No Subscription)

#ai #automation #machinelearning #javascript

Google released Gemma 4 on April 2, 2026 — their most capable open model yet. Built on the same research as Gemini 3, released under Apache 2.0, with native multimodal understanding. The E2B variant is specifically designed for phones and edge devices. You can run it locally on your iPhone.

Off Grid is a free, open-source app that runs Gemma 4 and other GGUF models entirely on your iPhone. No cloud. No subscription. No data leaving your device.

App Store | GitHub

Which Gemma 4 Model Fits Your iPhone

Text Generation

Attachments

Vision AI

Gemma 4 E2B (Q4_K_M) — ~1.3GB download. The edge variant. Designed for phones. Fits on any iPhone with 6GB+ RAM (iPhone 13 Pro and newer). 12 to 18 tokens per second with Metal GPU acceleration.

Gemma 4 E4B (Q4_K_M) — ~2.5GB download. Needs 8GB RAM (iPhone 15 Pro, 16 Pro). Noticeably better reasoning and output quality. 10 to 15 tokens per second.

Why Gemma 4

Built from Gemini 3 research. Best intelligence-per-parameter ratio of any open model.

Native multimodal. Processes text, images, and audio natively — not through bolted-on adapters.

256K context window. Long conversations and full document analysis.

140+ languages. Broad multilingual support.

Apache 2.0. Fully open. No restrictions.

How This Compares to Apple Intelligence

Apple Intelligence uses Apple's proprietary models and routes some tasks to their cloud servers. You can't choose which model to run or verify what stays on device.

Off Grid runs Gemma 4 entirely on your iPhone using Metal GPU acceleration. You pick the model. The code is open source. Everything stays local. And the same app works on Android and macOS.

Getting Started

Install Off Grid from the App Store
Open the model browser — Gemma 4 variants appear in the recommended section
Pick E2B for 6GB iPhones, E4B for 8GB+ iPhones
Download over WiFi
Start chatting

Switch KV cache to q4_0 in settings to roughly triple inference speed. Off Grid also runs Qwen 3.5, Llama 3.2, image generation, vision, voice transcription, tool calling, and document analysis — all on device.

Check the GitHub for the latest releases.