How to Use AI Tool Calling on Your Phone Without Paying for a Single API

#softwaredevelopment #ai #privateai

Tool calling is what separates a chatbot from an assistant. A chatbot gives you text. An assistant takes action - searches the web, runs a calculation, looks up a date, checks your files. When an AI model can decide on its own which tools to use and chain them together, it stops being a toy and starts being useful.

The problem is that tool calling has always been a cloud feature. OpenAI function calling, Anthropic tool use, Google Gemini tools - all of them require an API key, a cloud connection, and a per-token bill. Even if you run a local model, most local AI apps do not support tool calling at all.

Off Grid supports tool calling on your phone, with both on-device and remote models. No API keys. No cloud. No cost per call.

What tool calling actually does

When a model supports function calling, it can do more than generate text. It can recognize when it needs external information, call the right tool to get it, read the result, and incorporate it into its response. Automatically.

Off Grid ships with built-in tools:

Web search. The model can search the web when it needs current information. Ask "What is the weather in Tokyo right now?" and the model recognizes it does not have real-time data, triggers a web search, reads the result, and gives you an answer with actual current information.

Calculator. For math that language models are bad at. "What is 17.3% of $84,500?" The model calls the calculator instead of guessing. The answer is exact.

Date and time. "What day of the week is December 25th this year?" The model calls the date tool and gives you the right answer instead of hallucinating one.

Device info. Battery level, network status, storage. The model can check these when relevant.

The key is that the model decides which tools to use. You do not have to say "search for this." You just ask your question naturally, and the model figures out that it needs a tool, calls it, and uses the result.

How it works in Off Grid

Off Grid discovering models on your network - tool calling works with both on-device and remote models.

With on-device models

Smaller models that support function calling format can use tools directly on your phone. The inference runs on your phone's hardware, the tool calls execute locally, and the results feed back to the model. No network needed for any of it (except web search, obviously).

This is tool calling on airplane mode. The model runs a calculation, checks the date, or looks up device info without any server involved.

With remote models

If you have Ollama or LM Studio running on your Mac or PC, Off Grid auto-discovers your server and connects to it. Remote models like Qwen 3.5 9B, Llama 3.1, or Mistral that support function calling can use the same tools.

The difference is power. A 9B model is significantly better at deciding when to use tools, which tools to use, and how to chain multiple tool calls together. Qwen 3.5 9B scores 66.1 on the BFCL-V4 function calling benchmark, which puts it ahead of many models three times its size.

You can switch between on-device and remote models mid-conversation. Quick question that might need a calculation? The on-device model handles it. Complex multi-step research question? Switch to the 9B on your desktop. Same chat, same tools, different compute.

Why this matters

Tool calling is the bridge between "AI that answers questions" and "AI that does things." Every major AI assistant has it - Siri, Google Assistant, ChatGPT, Claude. But they all require a cloud connection and come with a subscription fee or data tradeoff.

Running tool calling locally means:

No API costs. Cloud function calling charges per token for both the request and the tool results. That adds up fast when the model is making multiple tool calls per conversation. Local tool calling costs nothing.

No data exposure. When you ask a cloud AI to search for something, the AI provider sees your query. When Off Grid's local model triggers a web search, the search happens from your phone. The model provider never sees your question.

No dependency. Cloud APIs go down. Rate limits get hit. Pricing changes. Your local setup works as long as your hardware works.

Chaining tool calls

The most powerful use of tool calling is chaining - where the model makes multiple tool calls in sequence, using the result of one to inform the next.

Example: "How much would I save per year if I switched from ChatGPT Plus to running AI locally?"

A model with tool calling might:

Call web search to find the current ChatGPT Plus pricing
Call calculator to compute the annual cost
Call web search to find electricity costs for running a Mac
Call calculator to compute the difference
Generate a response that synthesizes all of this

Off Grid supports this loop with automatic runaway prevention - the model cannot chain infinitely. It gets a reasonable number of tool calls per turn, uses them, and then generates its final response.

You already have everything you need

If you have a phone, you have tool calling today with on-device models. If you have a Mac or PC, add Ollama or LM Studio and you get dramatically more capable tool calling with larger models.

Qwen 3.5 9B on a machine with 16GB RAM gives you function calling that rivals cloud APIs. You paid for the hardware. The model is free. The tools are built in. Off Grid is free and open source.

Where Off Grid is heading

Tool calling is one piece of a personal AI operating system we are building. The vision is every device you own, every model you have access to, working together as a private, intelligent system. Projects with RAG, vision, voice, network discovery - all live today. Custom tool creation, more tool types, and automatic model routing are next.

Join the Off Grid Slack from our GitHub.