Your Phone Now Has Its Own Agent Skills. Google Just Showed Us What That Means.

#ai #llm #mobile #agents

Google released an iPhone app this week that runs Gemma 4 models locally. That's notable on its own. But the more interesting part is what they built on top of it: a skills system that lets the model call JavaScript functions.

The app is called Google AI Edge Gallery. Terrible name. Really interesting execution.

On-Device, Actually Useful

The E2B model is a 2.54GB download. It runs fast enough to feel genuinely useful—2.4 seconds for the skills demo Simon Willison tested. That's the first time I've seen a local model vendor ship an official iPhone app for trying their models. Not a third-party wrapper, not a web demo. An actual app you install and use.

The app supports image analysis and audio transcription (up to 30 seconds) with the smaller Gemma 4 models. But the skills demo is where it gets interesting.

Agent Skills in Your Pocket

The skills demo shows the model calling tools: interactive-map, kitchen-adventure, calculate-hash, text-spinner, mood-tracker, mnemonic-password, query-wikipedia, and qr-code.

Each skill is an HTML page. The model decides which one to call based on your prompt. Ask "Show me the Castro Theatre on a map" and it calls the interactive-map skill, pulling up an embedded Google Map with the location pinned.

This is the same pattern we've seen in desktop agents—MCP servers, function calling, tool use. But now it's running on a phone, with a 2.5GB model, in 2.4 seconds.

The Missing Piece: Persistent Memory

The app has one glaring omission: conversations are ephemeral. No permanent logs. You can't build on previous interactions. The skills work, but they don't accumulate.

That's probably the right v1 tradeoff. Persistent memory on-device creates privacy and storage complexity most users aren't ready for. But it also means the app is a playground, not a productivity tool.

Why This Matters

Two years ago, running a frontier-quality model on a phone meant compromising on capability or waiting for cloud inference. Now we're seeing the early stages of agents that don't need the cloud for basic operations.

The skills demo is rough—it crashed on follow-up prompts in testing. But it's a proof of concept for what mobile agent architectures might look like: local model for inference, lightweight HTML/JS skills for tool calling, and the phone's native capabilities for I/O.

The gap between "runs on device" and "actually useful" is narrowing faster than I expected.