This article was originally published on runaihome.com
TL;DR: LM Studio 0.4.16 (June 4, 2026) shipped Locally for iPhone/iPad and LM Link — an end-to-end encrypted Tailscale mesh that lets your phone run the models on your home Mac or RTX rig over any network. Inference stays on the rig; only a device list touches LM Studio's servers. It works, it's private, and on a Mac Studio M4 Max a 7B model streams at ~87 tok/s — far faster than you read on a phone.
What you'll be able to do after this guide:
- Chat with a 70B model running on your home machine from your phone on cellular, with chats stored only on your devices
- Connect across CGNAT, hotel Wi-Fi, and corporate firewalls with zero port forwarding
- Know which model sizes actually feel responsive over a remote link, and which don't
Honest take: If you already own a capable home rig and a Mac/PC running LM Studio, LM Link is the cleanest remote-access setup we've tested — no reverse proxy, no exposed ports, no SSH tunnel. The catch is it's account-gated and LM Studio-only on both ends, so it's a convenience layer for existing LM Studio users, not a general remote-inference server.
What LM Studio actually shipped
LM Studio 0.4.16 landed on June 4, 2026 (build 1) with two connected pieces:
- Locally — a first-party iPhone and iPad app. LM Studio acquired the standalone Locally AI app in April 2026 and rebuilt it as the official mobile front end. It's the chat client; it does not run large models itself.
- LM Link — the transport. It connects devices you own that are signed into the same LM Studio account, then lets a "client" device (your phone) load and use models running on a "host" device (your Mac or PC) as if they were local.
Build 2, on June 8, 2026, removed the request-gated waitlist so LM Link is open to everyone, and bumped the default context length to 8k tokens. Launch is iPhone and iPad only — Android has not been announced.
The important architectural detail: inference never moves. Your phone is a thin client. The model weights load into your rig's VRAM or unified memory, tokens generate there, and only the text streams back to the phone. That's the entire reason this is interesting for home-lab owners — your phone's 8GB of RAM was never going to run a 70B model, but your Mac Studio or 24GB GPU can, and now you can reach it from the couch or a train.
How LM Link works under the hood
LM Link is built on Tailscale, the WireGuard-based mesh VPN, but you don't install or configure Tailscale yourself. LM Studio embeds tsnet — a userspace library version of Tailscale that runs entirely inside the app. That matters for three reasons:
-
No kernel changes, no admin rights.
tsnetis a userspace Go program that adds mesh networking without touching kernel sockets, system routing tables, or global VPN settings. Installing LM Link doesn't reroute your other traffic. - NAT traversal without port forwarding. The tunnel punches through CGNAT, corporate firewalls, and double-NAT home routers. Two devices find each other through Tailscale's coordination servers and then connect directly. You never open a port on your router or expose anything to the public internet.
- End-to-end encryption via WireGuard. Prompts, responses, model listings, and hardware info travel only between your devices. Per LM Studio, neither Tailscale nor LM Studio's backend can read the contents — the only thing that touches LM Studio's servers is your device discovery list, so the two devices can find each other.
On the host, remote models are still served through the standard localhost:1234 OpenAI-compatible endpoint, which is why the same setup works with any tool that already talks to LM Studio's local server. If you've read our local AI privacy audit, this is the rare remote-access feature that doesn't blow a hole in the "data stays on my machine" promise: the inference and the chat history both stay on hardware you own.
Setup: phone to rig in about five minutes
You need LM Studio 0.4.16 or later on the host (Mac, Windows, or Linux), an LM Studio account, and the Locally app on an iPhone or iPad. Both devices sign into the same account — that shared identity is what authenticates the link. No API keys, no static tokens.
Step 1 — Update and prep the host
Update LM Studio to 0.4.16+ on the machine that has the GPU or Apple Silicon. Sign in to your LM Studio account (top-right). Download at least one model you want to reach remotely — a 7B–14B model is the sweet spot for phone use; more on why below.
Step 2 — Enable LM Link on the host
Open LM Studio settings and toggle LM Link on. Since build 2, there's no waitlist. The host registers itself in your account's device list. Load the model you want available — LM Link exposes whatever is currently loaded (or loadable) on the host through the link.
# Sanity-check the local server is up before going remote.
# On the host, LM Studio serves an OpenAI-compatible endpoint:
$ curl http://localhost:1234/v1/models
{"data":[{"id":"qwen2.5-7b-instruct","object":"model", ... }],"object":"list"}
If that returns your model, LM Studio's server is healthy and LM Link has something to serve.
Step 3 — Install Locally and sign in
Install Locally from the App Store on your iPhone or iPad. Sign in with the same LM Studio account, then enable LM Link in Locally's settings.
Step 4 — Let the devices discover each other
With LM Link on at both ends, the host and phone discover each other over the mesh — regardless of which networks they're on. Your home Mac on Wi-Fi and your phone on LTE will still find each other. In Locally, pick the host device, pick the model, and start chatting. The first connection can take a few seconds while the tunnel establishes; after that it's persistent.
Step 5 — Verify it's actually remote
The honest test: put your phone on cellular only (turn off Wi-Fi), then send a prompt. If it streams a reply, you're running a model on your home rig from the cellular network with no ports open and nothing exposed to the internet. That's the whole point.
Real-world latency: what actually feels fast
This is where expectations need calibrating. "Latency" over LM Link has two parts, and only one of them is the network.
Part 1 — the network round trip. Because LM Link uses Tailscale, the best case is a direct WireGuard connection between your phone and rig, where added latency is just the cellular/Wi-Fi round trip — typically tens of milliseconds on LTE, lower on 5G. If a direct path can't be established (some restrictive carrier or corporate NATs), Tailscale falls back to a relay, which adds more latency. Either way, this is a one-time cost on connection plus a small per-message overhead. It is not the thing you'll notice.
Part 2 — token generation speed. This dominates the experience, and it's entirely determined by your host hardware, not the link. Streaming hides network latency well: as long as the rig generates tokens faster than you read them, the reply feels live.
So the real question isn't "is LTE fast enough" — it's "is your host fast enough." Here's the calibration that matters. Comfortable reading speed is roughly 7–10 tokens per second. On a phone, where you read in shorter bursts, anything above ~15 tok/s feels essentially instant. Now map that to verified Apple Silicon numbers:
| Model on host | Mac Studio M4 Max (546 GB/s) | Feels like on a phone |
|---|---|---|
| Qwen2.5 7B Q4_K_M | ~87 tok/s | Instant — text appears faster than you can read |
| 14B Q4_K_M | ~40–50 tok/s | Instant for reading |
| Llama 3.3 70B Q4_K_M | ~20–28 tok/s | Comfortable; faster than reading speed |
| 70B at long context | drops toward ~18 tok/s | Still readable, slight wait on first token |
The takeaway: on a [Mac Studio M4 Ma
Top comments (0)