Mohammed Ali Chherawalla

Posted on Mar 18

How to Use LM Studio Models on Your iPhone or Android in 2026 (No Cloud, No Subscription)

#software #ai #privateai

You are already paying for the hardware that can run AI models better than most cloud services. You just might not be using it yet.

If you have a Mac with Apple Silicon or a PC with a decent GPU, you can run Qwen 3.5 9B locally. This is a model that outperforms OpenAI's GPT-OSS-120B on multiple benchmarks while being 13 times smaller. It runs on a MacBook Air. It costs nothing after the initial download. No subscription. No API key. No data leaving your machine.

But here is the problem. You set up LM Studio on your laptop. You download a model. You chat with it. And then you get up from your desk, pull out your phone, and you are back to paying OpenAI $20 a month. The model is still running. Your laptop is still on. You just have no good way to reach it.

Off Grid fixes that. It auto-discovers LM Studio servers on your network and lets you use them from your phone. No IP addresses. No port numbers. No configuration.

What you need

On your computer:

LM Studio installed (free, runs on Mac, Windows, Linux)
At least one model downloaded (I recommend Qwen 3.5 9B if your machine has 16GB+ RAM)

On your phone:

Off Grid installed (Android APK or iOS App Store)
Connected to the same WiFi network as your computer

That is the full list.

Step 1: Start the LM Studio server

Open LM Studio. Click the Developer tab in the left sidebar. Load the model you want to use. Toggle the server to "Start."

Here is the key part: check the box that says "Serve on Local Network." This makes your LM Studio instance accessible to other devices on your WiFi. LM Studio handles everything else - no firewall rules, no environment variables, no terminal commands.

You will see a URL at the top of the Developer tab. Something like http://192.168.1.x:1234. You do not need to write this down. Off Grid will find it automatically.

Step 2: Open Off Grid and scan

Open Off Grid on your phone. Go to the Remote Models section. Tap "Scan Network."

Off Grid scans your local network for active servers on known ports. When it finds your LM Studio instance, it pulls the list of every model you have loaded and displays them. Tap one. Start chatting. Responses stream in token by token, just like they do in LM Studio's own interface.

The entire setup takes less time than signing up for a ChatGPT account.

What this actually feels like in practice

The difference between a 3B model running on your phone and a 9B model running on your Mac is not subtle. It is the difference between an assistant that can handle basic questions and one that can reason through complex problems, write well, and understand nuance.

Qwen 3.5 9B on a MacBook Pro with M2 or newer runs at 30-50 tokens per second. That is fast enough that responses feel instant. And because everything stays on your local network, latency is measured in milliseconds, not the hundreds of milliseconds you get with cloud APIs.

A few things that work well over the network that struggle on-device:

Long document analysis. Paste a 10-page contract or a research paper and ask for a summary. The 9B model has a 262,000 token context window. Your phone's 3B model would choke on this.

Code review. Share a function and ask what could go wrong. The 9B model catches edge cases and suggests improvements that smaller models miss.

Writing feedback. Draft an email or a pitch, paste it in, and ask if the tone is right. The larger model picks up on subtlety that smaller ones flatten.

Multilingual work. Qwen 3.5 supports 201 languages. If you work across languages, the 9B model handles translation and cross-lingual reasoning significantly better than anything that fits in your phone's RAM.

Why not just use the LM Studio chat interface?

You can. LM Studio has a great chat UI on your desktop. But your desktop is in one room of your house, and your phone is always with you.

The point is not to replace LM Studio. The point is to extend it. Your Mac becomes a private AI server. Your phone becomes the remote interface. You get the quality of a desktop model with the convenience of a mobile app.

And because Off Grid also runs models directly on your phone, you have a fallback. Walk out of WiFi range and the on-device model takes over. Come back home and the LM Studio models are available again. You can even switch models mid-conversation - start with the on-device 2B while commuting, then hand off to the 9B on your Mac when you get home, all in the same chat thread.

Beyond chat, Off Grid now supports projects with a built-in knowledge base and RAG. Attach your documents to a project and any model - whether it is running on your phone or on your Mac - can search through them when generating answers. There is also tool calling: models that support function calling can chain together web search, calculator, date/time, and device info automatically. All running locally, all private.

You paid for this hardware. It is sitting on your desk. You should be able to use it from anywhere in your house without jumping through hoops. That is the whole idea.

The math on what you are saving

A ChatGPT Plus subscription costs $20 per month. That is $240 per year for access to models that are, for many everyday tasks, not dramatically better than what Qwen 3.5 9B can do locally.

A Claude Pro subscription is $20 per month. Same math.

If you already own a Mac with Apple Silicon (and if you bought one in the last three years, you do), the marginal cost of running local AI is zero. The electricity cost of leaving your laptop on is negligible. The model download is free. LM Studio is free. Off Grid is free and open source.

You are not buying a "Mac Mini AI server." You are not setting up a homelab. You are using the laptop you already own and the phone you already carry. That is it.

Where this is heading

We are building Off Grid toward something bigger than a chat app that happens to run locally. The vision is a personal AI operating system - one that uses whatever compute is available to you, whether that is your phone's processor, your laptop's GPU, or a machine on your network - and keeps everything private by default.

Network discovery is one piece of that. On-device models are another. Tool calling, document analysis, vision AI, voice transcription - these are all already in Off Grid today. The next step is making all of these capabilities work together seamlessly, across every device you own, without any of your data ever touching someone else's server.

If that sounds like something you want to be part of, we have a community building this together. Join the Off Grid Slack from our GitHub - feature requests, bug reports, model recommendations, and conversations about where local AI is heading.

Try it

Off Grid is free, open source, and MIT licensed.

GitHub (1,000+ stars, 10,000+ downloads in 4 weeks)
Android: grab the latest APK from GitHub Releases
iOS: available on the App Store (search "Off Grid AI")

If you have not set up LM Studio yet, it takes about five minutes. Download it, search for "Qwen 3.5 9B" in the model browser, download, load, and start the server. Then open Off Grid and scan. You will be chatting with a model that rivals cloud AI - from your phone, on your own network, with zero data leaving your house.

The hardware you already own is enough. You just need the software to unlock it.

Off Grid is built by the team at Wednesday Solutions, a product engineering company that helps founders go from idea to launched product. 4.8/5.0 on Clutch across 23 reviews.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.