Daniel Nwaneri

Posted on May 20

MCP Just Landed on Your Phone: What Google AI Edge Gallery Actually Does

#googleiochallenge #gemma #mcp #devchallenge

Google I/O Challenge submission

This is a submission for the Google I/O Writing Challenge

I was already running MCP servers on my desktop — connected to Claude, wired into my daily workflow — when Google announced at I/O 2026 that AI Edge Gallery now supports MCP connections on Android. I pulled out my Pixel and started testing.

First attempt: "No eligible devices." The app requires capable hardware. Second device — it opened.

What I found is more interesting than the announcement.

What Google AI Edge Gallery Is

Google AI Edge Gallery is an open-source Android app from Google Research. Large language models run entirely on your device — no internet required for inference, no data leaves your phone. Every prompt, every image, every audio clip stays local.

That part isn't new. What changed at I/O 2026: the app now supports agents. Not a chat interface with a web search button — a proper agent runtime with toggleable skills, calendar integration, scheduled reminders, and experimental MCP connections. The same protocol the rest of the serious agent ecosystem runs on.

The Model List Surprised Me

Opening the Models panel, I expected a Gemma showcase. That's not what this is.

Model	Size	Notes
Gemma-4-E2B-it	2.6 GB	Recommended across most use cases, 32K context
Gemma-4-E4B-it	3.7 GB	Multi-modal, 32K context
Gemma-3n-E2B-it	3.7 GB	Text, vision, audio, 4096 context
Gemma-3n-E4B-it	4.9 GB	Text, vision, audio, 4096 context
Gemma3-1B-IT	584 MB	4-bit quantized
Qwen2.5-1.5B-Instruct	1.6 GB	Alibaba's model, LiteRT-LM ready
DeepSeek-R1-Distill-Qwen-1.5B	1.8 GB	Reasoning model, fully on-device
TinyGarden-270M	289 MB	Fine-tuned FunctionGemma, task automation
MobileActions-270M	289 MB	Fine-tuned FunctionGemma, device control

DeepSeek and Qwen sitting in a Google app isn't accidental. The actual product here is LiteRT-LM — Google's mobile inference runtime — not Gemma. Which model you run on top is your choice.

Gemma 4: The Number That Matters

Gemma 4 E2B runs in 2.6 GB and opens a 32K context window. Gemma 3n had 4096 tokens. Multi-modal input — text, images, and audio — in one model.

That context gap is what makes the agent use case real. Tool call outputs, calendar data, conversation history — there's room to feed all of it back to the model without truncating. Running a 32K context window offline on a phone wasn't viable when the best on-device options topped out under 2K.

All models run through LiteRT-LM — previously TensorFlow Lite, now meaningfully upgraded.

Agent Skills: What It Actually Looks Like

This is the part I came to test. Here's what's actually inside.

The Skills Architecture

Agent Skills runs on 12 skills total — split into two tiers:

Built-in skills (Google's own):

calculate-hash — hash a given text
create-calendar-event — write to OS calendar
interactive-map — show a map view for a location

Community skills (user-created, same interface):

read-calendar-events — read OS calendar for a specific date
schedule-notification — schedule a one-time or repeating daily notification
query-wikipedia — pull a Wikipedia summary on a topic
qr-code — generate a QR code for a URL
mood-tracker — stores daily mood and comments, tracks history
send-email — send an email
learn-something-new — daily learning companion with image card and scheduled notification
kitchen-adventure — dungeon master RPG set in a world of sentient kitchen appliances
text-spinner — "Spin the given text on my head"

Two things worth noticing. First: calendar read and calendar write are separate skills with separate toggles. That's a granular permissions model — the agent can write events without having read access, or vice versa. Second: send-email means an offline on-device model can send emails through a skill. That's not a demo capability.

Every skill has a "View" button — inspect the full skill definition before enabling it. Each is individually toggleable. The chat bar shows a live count: Skills 8 | MCP 0. Skills and MCP are tracked separately in the same toolbar.

Four ways to add skills:

Featured list — curated community contributions
Load from URL — any web-hosted skill directory
Import local skill — from the device directly, no server needed
GitHub Discussions — browse the full community

Local import is the developer detail. You can build and test a skill entirely on-device without hosting anything. The iteration loop for custom skill development doesn't require a deployed server.

Agent Skills only accepts two models — both Gemma 4. Gemma 3, DeepSeek, Qwen: available for AI Chat, locked out of agents. Google drew a capability line and stuck to it.

The MCP Setup

Tap the MCP counter in the toolbar → empty state with a single button: + Add MCP server.

The dialog asks for:

Your MCP server URL
Authorization: None, Request header, or OAuth (WIP)

OAuth isn't ready yet. That's the honest limitation for anyone planning to connect enterprise or authenticated MCP servers — you're working with no auth or bearer token only, for now. Public MCP servers work today. Authenticated production servers will have to wait for OAuth to ship.

Worth noting what this architecture means even without OAuth: tool-selection logic runs on-device, and only the structured API call leaves the phone. For healthcare or legal tooling that can't send raw queries to a server, that's a meaningful trust boundary — not a workaround.

For developers already running public MCP servers: enter the URL, the app fetches the tool manifest, tool definitions load into the system prompt alongside your active skills. The model handles invocation from there.

The Rest of the App

Each use case — AI Chat, Ask Image, Audio Scribe, Prompt Lab — links directly to API documentation and example code from its own screen. This is a teaching environment, not just a demo. Google built it for developers to read, fork, and build on.

AI Chat supports Thinking Mode — Gemma 4's step-by-step reasoning exposed inline. Useful before you wire a model into anything production-facing.

Mobile Actions runs on MobileActions-270M — 289 MB, fully offline. A 270M parameter model doing device automation. For context, Gemma 4 E2B is roughly 10x that size and handles general reasoning. The argument being made with that design: narrow fine-tunes at sub-300MB can do discrete tasks better than a general model, and they fit anywhere.

Tiny Garden — 289 MB, natural language gardening game — is the same point made playfully. Watch how function-calling works on-device in a consequence-free environment.

Getting Started

1. Install Google AI Edge Gallery (Play Store)
   Requires capable Android hardware — tested on Pixel

2. Download Gemma-4-E2B-it (2.6 GB)
   Only Gemma 4 models run Agent Skills

3. Open Agent Skills → tap the Skills or MCP button in the toolbar

4. For built-in skills: toggle on what you need
   For MCP: tap MCP → Add MCP server → enter URL + auth

5. Start chatting — the model sees your active skills and connected tools

Source code: github.com/google-ai-edge/gallery. Community skills shareable via GitHub Discussions.

What Developers Should Take From This

The Gemma 4-only lock on Agent Skills is the tell. This isn't a checkbox feature. Google shipped agentic tool use where the model can handle it reliably, and locked out weaker models until that changes. That's a better decision than letting anything run and degrading silently.

The OAuth (WIP) flag on MCP auth is the other honest signal. Public MCP servers work today. Enterprise-grade authenticated connections aren't there yet. That's not a failure — it's a preview of where this is going, with the current edges visible rather than hidden.

The 270M fine-tuned models are the underrated part of this release. MobileActions and TinyGarden are evidence of a different architecture: specialized micro-models for narrow tasks, general models for reasoning, LiteRT-LM as the runtime connecting them. At 289 MB each, those models fit anywhere.

MCP being here matters because it's the same protocol across Claude, Cursor, VS Code extensions, and now Google's on-device runtime. Build a tool as an MCP server once and every compatible client picks it up. That's not small.

Verdict

AI Edge Gallery is the most developer-forward release from Google I/O 2026. Not a consumer product — a reference implementation of what on-device agents look like when an open protocol, a capable model family, and a mobile inference runtime land in the same place.

If you're building with MCP today, install the app and point it at your existing server. Your tools already work. That's not a coincidence — it's what a protocol looks like when it actually wins.

Written with assistance from Claude (Anthropic). Hands-on testing on Pixel: model list, Agent Skills interface, MCP setup flow, and skills management observed directly. Gemma-4-E2B-it downloaded; model inference and chat results not included in this article.

Top comments (16)

leob • May 21 • Edited

Google is pushing things pretty hard - but this is for the AI enthusiasts, my phone is 4 (5? 6? I'm not sure) years old, I'm not gonna buy a new one just to run this stuff :-)

But, the fact that you can run LLMs locally on your phone - that's really impressive ... nice piece of research you did!

Daniel Nwaneri • May 21

Hardware wall is real. I hit it too on my first device ("no eligible devices"). Ran it on a Pixel in the end.

The interesting part isn't the flagship models though. There's a 584MB Gemma3-1B-IT in the list running on 4-bit quantized hardware — lower floor. Google's building toward it.

What would you actually want an offline on-device agent to do, if the hardware wasn't the constraint?

Said • May 22

to continue where I left in something I was doing previously for example if I was researching x then continue where I left.

If I was reading or listening it would continue where I left all the extra moves to continue after getting back on topic would be managed by AI.

experience would be something like continuing an story with an friend after few days of not talking.

leob • May 21

Well currently nothing lol, I have no use cases - but just the fact that it's possible to run an LLM on local phone hardware, that's pretty baffling when you come to think about it :-)

Daniel Nwaneri • May 21

"No use cases" is actually the honest position. Most people with one don't know they have it yet.

The baffling part is the right instinct — 584MB doing function calling and calendar writes, offline, on hardware from 2020. That's not a demo. That's the capability existing before the killer app does.

Usually happens in that order...

Syed Ahmer Shah • May 24

Focusing on local, on-device execution through the AI Edge Gallery is a massive win for privacy and latency. It completely changes the game for mobile devs who want to implement MCP without inheriting the cost and lag of constant cloud roundtrips.

Daniel Nwaneri • May 24

The OAuth (WIP) flag complicates the privacy story slightly - authenticated servers have to wait. Public MCP servers are the privacy win today.

S M Tahosin • May 24

Having the Model Context Protocol running locally on Android opens up some wild possibilities. Integrating desktop-level MCP servers directly into a mobile environment means we can finally bypass cloud-API limits for mobile agents. Have you tried chaining it with any local vision tools on the Pixel yet?

Daniel Nwaneri • May 24

Haven't chained it with vision tools . That's still on the list.

Worth knowing though: Ask Image and Agent Skills are separate tiles in the app with separate model instances. Cross-tool chaining isn't built in yet. The orchestration layer to bridge them doesn't exist in the current build.

The "bypass cloud-API limits" framing is also worth unpinning. MCP over Streamable HTTP means the server call still goes out . it's the tool-selection logic that stays local. Not the same as fully air-gapped, which matters depending on what you're trying to bypass.

Manuel Bruña • Jun 15

Phone-based MCP is exciting, but the UX for consent has to be sharper than desktop. Sensors, contacts, files, and local apps have very different blast radius. I’d prefer small, revocable scopes over one broad “let the agent use my phone” permission.

Suny Choudhary • May 21

This is an interesting direction because MCP on mobile changes the mental model.

On desktop, MCP usually feels like a developer or workstation layer: files, terminals, repos, browsers, databases, cloud tools. On phones, the context is different. The device has sensors, camera, location, apps, notifications, identity, and a lot of personal data sitting very close to the user.

That makes the edge/local part important. If the model and tool layer can run locally, it opens up useful workflows without sending everything to a cloud model by default.

But it also raises the same question every agent platform eventually hits: what tools should the model actually be allowed to call, and how visible is that to the user?

MCP on phones could be powerful, but the permission model will matter more than the demo.

Daniel Nwaneri • May 21

The permission model question is already showing up in the design. Calendar read and write are split into separate skills — two toggles, not one. That's a deliberate permission boundary baked into the architecture.

But send-email is also in the list. Same interface, same toggle. The model can be enabled to send email offline, with no visible indication to the user when it decides to fire.

The skill-level toggle is the current answer to your question. Whether it's a sufficient answer is a different thing.

Max Quimby • May 24

The "tool-selection logic runs on-device, only the structured API call leaves the phone" pattern is the part of this that's most underrated. It flips the usual cloud-LLM trust model: instead of "we trust the model provider with the full conversation context," you're only exposing the specific tool invocation. For enterprise/health/financial agent use cases that's a meaningful difference.

The separate calendar-read and calendar-write toggles also caught my eye. Most MCP server configs I've seen treat permissions at the server level — connect the server, get all its tools. Splitting capabilities into individually-grantable skills is closer to how mobile OS permissions actually work, and probably the right primitive for non-developer end users who shouldn't have to reason about server URLs.

One concern: 32K context with a 2.6 GB model plus tool outputs is going to feel tight for any multi-turn agent flow. Have you tried it on longer conversations to see how quickly it has to start dropping history? That's the realistic ceiling for "useful mobile agent" right now in my experience.

Daniel Nwaneri • May 26

The trust model framing is sharper than what I put in the article. "Exposing only the specific tool invocation" rather than the full conversation context . That's the actual privacy primitive, and it matters for regulated use cases in a way that "on-device = private" doesn't quite capture on its own.

On permissions: the View button per skill adds another layer — users can inspect the full skill definition before enabling it. Not just what it can do, but how it's implemented. Closer to informed consent than typical mobile permissions.

On the context ceiling — haven't run multi-turn inference yet (footer says so). But the concern is architecturally right. Tool outputs are verbose: Wikipedia summaries, calendar dumps, MCP response payloads eat tokens faster than conversation turns do. The 32K number behaves smaller than it looks in practice.

Benjamin Nguyen • May 21

neat! I have gemini on my phone already/

Melvin Great • May 25

Nice research you did there. It was an interesting read.

View full discussion (16 comments)