Mohammed Ali Chherawalla

Posted on Mar 18

How to Turn Your Home Network Into a Private AI Cloud You Access From Your Phone

#ai #network #llm #hardware

Your home network probably has more AI compute sitting idle than you think.

If you have a desktop or laptop running Ollama or LM Studio, you already have a local AI server. You have models downloaded. You have inference running. But you can only use it sitting at that one machine, staring at that one screen.

That is a waste.

Your desktop GPU can run a 70B parameter model. Your phone cannot. But your phone is the device you actually have with you when you are walking around the house, lying on the couch, sitting in a coffee shop on your home VPN, or standing in the kitchen trying to remember a recipe. The compute is ten feet away. You just have no way to reach it.

Every guide out there for solving this involves the same painful dance. Set environment variables. Configure OLLAMA_HOST to 0.0.0.0. Open firewall ports. Find your local IP address. Restart services. Hope nothing breaks.

We thought that was absurd. So we built auto-discovery into Off Grid.

What Off Grid does on your network

Off Grid is an open-source app that runs AI on your phone. We have written about running LLMs locally on Android and on iPhone before. Those articles cover running small models directly on your phone's hardware.

This is different. This is about using the big models running on your PC - from your phone.

Off Grid now auto-scans your local network for any OpenAI-compatible server. That means Ollama, LM Studio, LocalAI, or anything else that exposes that API. When it finds one, it pulls the list of available models and lets you pick which one to talk to. Streaming responses, just like you would get sitting at the desktop.

No IP address to type in. No environment variables. No port numbers. You open the app, and your network's models are there.

How to set it up

Step 1: Make sure your server is accessible on the network

For Ollama, you need to set one environment variable so it listens beyond localhost:

On Mac/Linux, run:

OLLAMA_HOST=0.0.0.0 ollama serve

On Windows, add OLLAMA_HOST as a system environment variable with value 0.0.0.0, then restart Ollama.

For LM Studio, it is even simpler. Open the app, go to the Developer tab, start the server, and check the box that says "Serve on Local Network." That is it.

Step 2: Open Off Grid on your phone

Make sure your phone is on the same WiFi network as the machine running your models. Open Off Grid, go to the Remote Models section, and tap Scan Network.

Off Grid scans for active servers on common ports (11434 for Ollama, 1234 for LM Studio). When it finds one, it shows you every model loaded on that server.

Step 3: Pick a model and start chatting

Tap a model. Start typing. Responses stream in over your local network. No internet involved. No data leaves your house.

That is the entire setup.

Why this matters more than it sounds

Running a 3B model on your phone gives you a capable personal assistant that works offline. Running a 70B model on your desktop GPU gives you something that rivals cloud AI for most tasks. But until now, you had to choose between portability and power.

Network discovery removes that tradeoff. Your phone becomes the interface. Your desktop becomes the engine. You get the portability of mobile with the raw power of a proper GPU.

A few scenarios where this changes things:

You are reviewing a long document. You are on the couch with your phone. Your Mac has Qwen 3.5 9B loaded. You paste the text, ask it to summarize the key points, and get a response that would have been impossible with the 2B model running on the phone itself.

You are writing and need a sounding board. You are in bed, thinking through an email or a pitch or a blog post. Your desktop has a model running that can give you real feedback on tone and structure. You do not need to get up and walk to your office.

You want complete privacy for sensitive questions. Medical, legal, financial, personal. Nothing leaves your network. No cloud provider sees your prompts. The model runs on your hardware, the phone talks to it over your WiFi, and that is the end of the data trail.

You do not need to buy anything

This is not a "build a homelab" article. You do not need a Mac Mini server. You do not need a cloud GPU subscription. You do not need new hardware.

If you bought a Mac in the last three years, you have Apple Silicon. That is enough to run Qwen 3.5 9B - a model released in March 2026 that outperforms OpenAI's GPT-OSS-120B on reasoning, language understanding, and visual tasks while being 13 times smaller. It runs on a MacBook Air with 16GB of RAM. At 30-50 tokens per second on Apple Silicon, responses feel instant.

If you have a PC with a decent NVIDIA GPU and 16GB+ of VRAM, you can run even larger models. Llama 3.1 70B, Qwen 3.5 in various sizes, DeepSeek V3. This is where local AI becomes indistinguishable from cloud AI for most tasks.

The point is: the hardware sitting on your desk right now is already powerful enough. You just need to make it reachable. That is what Off Grid does.

Off Grid does not care what model you load on the server. If the server exposes it through the API, Off Grid discovers it and lets you use it.

Local models plus remote models in one app

The thing that makes this different from just using a web UI for Ollama is that Off Grid already runs models on your phone too. So you have both. And you can switch between them mid-conversation.

Start a chat with the 2B model on your phone while you are out. Get home, switch to the 9B model on your Mac for a deeper follow-up, in the same chat. The context carries over. You are not starting from scratch every time you change models.

Off Grid also supports projects, a built-in knowledge base with RAG, and tool calling. Attach documents to a project, and any model - local or remote - can search through them when answering your questions. Tool calling means models that support function calling can chain together web search, calculator, date/time, and device info automatically. All of this works with both on-device and remote models.

You paid for this hardware. It is on your network. You should be able to use all of it, from anywhere in your house, without friction. That is the entire philosophy.

Walking to the store and your WiFi drops? The on-device model takes over. Back on your home network? The 9B model is available again. You do not need two apps or two workflows. One app, and the intelligence scales up or down based on what compute is available.

How we built this

I am going to get a bit technical here because I think the implementation is interesting.

Network discovery uses mDNS (multicast DNS) and port scanning on known service ports. When Off Grid finds an active server, it hits the /v1/models or /api/tags endpoint to pull the model list. Each server gets tracked with its IP, port, and available models. The app polls periodically so if you load a new model on your desktop, it shows up on your phone within seconds.

Streaming uses Server-Sent Events over the OpenAI-compatible chat completions endpoint. This means responses come in token by token, just like they do in LM Studio's own interface. There is no batching or waiting for the full response.

API keys, if you have set them up on your server, are stored in your phone's secure keychain. Not in plain text, not in shared preferences. The same way you would expect a banking app to store credentials.

We built Off Grid with React Native, and this feature works on both iOS and Android. The networking layer is shared, the discovery logic is shared, and the UI adapts to whatever models are available, whether they are running locally on the phone or remotely on your network.

At Wednesday Solutions, Off Grid is what we build when we are building for ourselves. The same team that ships products for startups and enterprises - from Rapido's rider app rewrite to fintech platforms processing billions of API calls - built this because we wanted it to exist. It is the best proof we know of that we care about craft, not just client work.

Where this is heading

Network discovery is one piece of something bigger. We are building Off Grid toward a personal AI operating system - one that uses whatever compute is available to you, whether that is your phone's processor, your laptop's GPU, or a machine on your network - and keeps everything private by default.

On-device inference, remote model discovery, tool calling, vision AI, voice transcription, document analysis - all of these are in Off Grid today. The next step is seamless handoff between devices, automatic routing based on task complexity, and shared context across everything you own.

If you want to help shape what this looks like, we are building it in the open. Join the Off Grid Slack from our GitHub - feature requests, model recommendations, and conversations about where personal AI is heading.

Try it

Off Grid is free, open source, and MIT licensed.

GitHub (1,000+ stars, 10,000+ downloads in 4 weeks)
Android: grab the latest APK from GitHub Releases
iOS: available on the App Store (search "Off Grid AI")

If you already have Ollama or LM Studio running, you can be using your desktop models from your phone in under two minutes.

If you do not have a local AI server yet, the earlier guides on running LLMs on Android and on iPhone cover how to get started with on-device models first. Then come back here when you are ready to add your network's compute to the mix.

The hardware you already own is enough. Off Grid just makes it reachable.

Off Grid is built by the team at Wednesday Solutions. We are a product engineering company that helps founders go from idea to launched product. If you are building something and need a technical partner, our Clutch reviews (4.8/5.0 across 23 reviews) tell the story better than we can.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.