Qwen 3.5 is the most capable open-weight small model family available right now. The small series (0.8B, 2B, 4B, 9B) was designed from the ground up for on-device deployment — not distilled from a larger model. It uses a hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts, supports 200+ languages, and handles a 262K context window. You can run it locally on your Android phone.
Off Grid is a free, open-source app that runs Qwen 3.5 and other GGUF models entirely on your phone's hardware. No cloud. No account. No data leaving your device.
Which Qwen 3.5 Model Fits Your Phone
Text Generation
|
Attachments
|
Vision AI
|
Qwen3.5-0.8B (Q4_K_M) — ~500MB download. Fits on any phone with 4GB+ RAM. Designed for edge deployment. Fast at 15 to 25 tokens per second on mid-range hardware. Surprisingly capable for its size thanks to the MoE architecture — it activates only a fraction of its parameters per token.
Qwen3.5-2B (Q4_K_M) — ~1.2GB download. The sweet spot for 6 to 8GB RAM phones. Significantly better reasoning than 0.8B. 10 to 18 tokens per second on recent Snapdragon chips. Good for writing, code, and multilingual tasks.
Qwen3.5-4B (Q4_K_M) — ~2.5GB download. Needs 8GB+ RAM. This is where Qwen 3.5 gets seriously impressive. The hybrid architecture means it punches well above its parameter count. Multi-step reasoning, detailed analysis, code generation.
Qwen3.5-9B (Q4_K_M) — ~5GB download. Needs 12GB+ RAM. Rivals models with 20x more parameters on benchmarks. If your phone has the RAM, this is the best open model you can run locally.
Why Qwen 3.5 Over Other Models
Hybrid architecture. Qwen 3.5 Small isn't a shrunken version of a big model. It combines Gated Delta Networks with Mixture-of-Experts, which means it activates only the parameters it needs per token. More intelligence per byte of RAM than traditional dense models.
200+ languages. The broadest language support of any open model at this size. If you work in Hindi, Arabic, Spanish, Japanese, or any non-English language, Qwen 3.5 is the best option for mobile.
262K context window. Previous small models topped out at 8K to 32K context. Qwen 3.5 Small handles 262K tokens, which means you can feed it entire documents and have long conversations without losing context.
Thinking mode. Toggle between fast responses and step-by-step reasoning. Thinking mode shows the model's chain of thought, which produces better results on complex problems.
Multimodal. Qwen 3.5 natively understands text, images, and audio. Point your camera at something and ask about it.
Performance on Real Hardware
| Model | Snapdragon 8 Gen 3 | Snapdragon 8 Gen 2 | Mid-range |
|---|---|---|---|
| Qwen3.5-0.8B | ~25 tok/s | ~20 tok/s | ~12 tok/s |
| Qwen3.5-2B | ~18 tok/s | ~14 tok/s | ~8 tok/s |
| Qwen3.5-4B | ~12 tok/s | ~10 tok/s | ~5 tok/s |
| Qwen3.5-9B | ~8 tok/s | ~6 tok/s | — |
Switch KV cache to q4_0 in settings to roughly triple these speeds.
Setting Up Qwen 3.5 on Off Grid
- Install Off Grid from the Play Store
- Open the model browser — Qwen 3.5 variants are in the recommended section
- Pick the size that matches your RAM
- Download over WiFi
- Start chatting
Off Grid automatically uses QNN NPU acceleration on Snapdragon 8 Gen 1+, Adreno GPU via OpenCL on older chips, or CPU on everything else.
Privacy
Every Qwen 3.5 inference runs on your phone's processor. After the initial download, Off Grid makes zero network requests. Turn on airplane mode and verify. Everything works.
No analytics. No telemetry. No accounts. Open source and MIT licensed.
Off Grid also runs image generation, vision AI, voice transcription, tool calling, and document analysis — all on device. Check the GitHub for the latest releases.
Text Generation
Attachments
Vision AI
Top comments (0)