Gemini Nano On-Device Function Calling for Android

#webdev #programming

---
title: "Gemini Nano Function Calling: Building Offline AI Agents on Android"
published: true
description: "A hands-on guide to architecting offline AI agents on Android using Gemini Nano's on-device function calling, structured JSON output, and a WorkManager + Room sync pipeline."
tags: android, kotlin, architecture, mobile
canonical_url: https://blog.mvpfactory.co/gemini-nano-function-calling-offline-ai-agents-android
---

## What We Will Build

Let me show you a pattern I use in every project that needs on-device intelligence. We are going to architect an offline-capable AI agent on Android using Gemini Nano's function calling and structured JSON output — the features Google expanded at I/O 2026. By the end, you will have a validation pipeline that tames on-device hallucinations and a WorkManager + Room queue that executes agent actions when connectivity returns.

## Prerequisites

- Android Studio with Kotlin
- Familiarity with Room and WorkManager
- Access to Gemini Nano on-device APIs (AI Core)
- A device or emulator supporting on-device inference

## Step 1: Respect the 32K Token Budget

Cloud Gemini Flash gives you 1M+ tokens. Gemini Nano gives you roughly 32K on-device. That budget covers your system prompt, tool definitions, conversation history, *and* the response. Most teams get this wrong by porting cloud schemas directly.

Here is the minimal setup to get this working:

kotlin
// Bad: verbose schema that eats your token budget
val cloudSchema = Tool(
name = "create_calendar_event",
description = "Creates a new calendar event with the specified title, " +
"date, time, duration, location, attendees, recurrence pattern, " +
"reminder settings, and optional notes...",
parameters = /* 12 parameters with long descriptions */
)

// Good: minimal schema optimized for on-device budget
val nanoSchema = Tool(
name = "cal_create",
description = "Create event",
parameters = listOf(
Param("title", "string", required = true),
Param("iso_time", "string", required = true),
Param("dur_min", "int", required = true)
)
)


A trimmed schema set of 5 tools consumes roughly 800–1,200 tokens, leaving headroom for conversation context. A verbose 15-tool schema can eat 4,000+ tokens before a single user message. Budget 1,200 tokens maximum for tool definitions. Use short names, minimal descriptions, and cap at 5 tools per agent context. Swap tool sets dynamically based on user intent rather than loading everything at once.

## Step 2: Build a Three-Layer Validation Pipeline

A quantized on-device model hallucinates more than its cloud counterpart. For function calling, this shows up as malformed JSON, invented parameter names, or calls to tools that do not exist in your schema.

The docs do not mention this, but Layer 1 alone catches roughly half of all failures — the model returns valid function calls but wraps them in explanatory text.

kotlin
fun parseAgentAction(raw: String): AgentAction? {
// Layer 1: Extract JSON from response (model may wrap it in markdown)
val json = JsonExtractor.findFirst(raw) ?: return null

// Layer 2: Validate against registered tool schemas
val parsed = try {
    toolRegistry.parse(json)
} catch (e: SchemaValidationException) {
    null
}

// Layer 3: Semantic bounds checking
return parsed?.takeIf { action ->
    semanticValidator.isReasonable(action)
    // e.g., duration_min in 1..480, title.length < 200
}

}


## Step 3: Wire Up the WorkManager + Room Offline Queue

Where on-device function calling really earns its keep is offline operation. A user on an airplane says "schedule a team sync for Tuesday at 2pm." Gemini Nano parses the intent locally, but the calendar API requires connectivity.

kotlin
@entity(tableName = "agent_actions")
data class AgentAction(
@PrimaryKey(autoGenerate = true) val id: Long = 0,
val toolName: String,
val paramsJson: String,
val status: ActionStatus = ActionStatus.PENDING,
val createdAt: Long = System.currentTimeMillis()
)

val request = OneTimeWorkRequestBuilder()
.setConstraints(
Constraints.Builder()
.setRequiredNetworkType(NetworkType.CONNECTED)
.build()
)
.setInputData(workDataOf("action_id" to action.id))
.build()

WorkManager.getInstance(context).enqueue(request)


Gemini Nano produces a structured `AgentAction`. Room persists it with status `PENDING`. WorkManager enqueues a `OneTimeWorkRequest` with network constraints. An executor processes the action when connectivity returns, updating status to `COMPLETED` or `FAILED`. You get immediate user feedback ("Got it, I'll create that event when you're back online") while guaranteeing eventual execution. Room provides durability across process death, and WorkManager handles retry with exponential backoff.

| Dimension | Gemini Nano (on-device) | Gemini Flash (cloud) |
|---|---|---|
| Context window | ~32K tokens | 1M+ tokens |
| Latency (first token) | 80–200ms | 300–800ms (network dependent) |
| Function call reliability | Degrades with schema complexity | Stable across complex schemas |
| Structured JSON consistency | Requires validation + retry | Generally reliable |
| Availability | Always-on, no network needed | Requires connectivity |
| Cost per call | Zero marginal | Per-token API pricing |

## Gotchas

- **Token budget blowout.** Porting your cloud tool schemas directly to Nano will silently consume your context window. You will get incoherent responses with zero error messages. Keep schemas under 1,200 tokens total.
- **Markdown-wrapped JSON.** The model frequently wraps valid JSON in explanatory text or markdown code fences. A solid JSON extractor is table stakes — without one, you will reject roughly half of perfectly good responses.
- **Invented parameters.** Nano will hallucinate parameter names that do not exist in your schema. Always validate against your registered tool definitions before executing anything.
- **Skipping the offline queue.** Adopt the WorkManager + Room pattern early, even if your initial use case is online-only. This architecture lets you go offline with zero refactoring. The persistence layer also doubles as an audit log of every agent action — useful for debugging and showing users what the agent did on their behalf.

## Wrapping Up

The latency advantage of on-device inference (80–200ms vs 300–800ms) matters for interactive mobile UX. But the reliability gap is the architectural challenge you actually need to design around. Shrink your schemas, validate in layers, and queue actions with Room + WorkManager. That combination turns Gemini Nano from a demo into a production-grade offline agent pipeline.

DEV Community

Gemini Nano On-Device Function Calling for Android

Top comments (0)