DEV Community

Denis Babkevich
Denis Babkevich

Posted on

I built an AI agent with 57 tools that actually does stuff on your iPhone

Every AI app on the App Store right now is basically the same thing: a text box that talks to GPT. You type, it replies, you type again. That's a chatbot, not an agent.

I've been building Spectrion for the past year and the whole point was to make something that actually does things. Not just talks about doing things. You say "find me a good restaurant nearby, check the weather, and set a reminder for 7pm" and it figures out the steps, calls the right tools, handles the results, and tells you when it's done. No hand-holding.

Spectrion chat interface

The agent loop

The core idea is dead simple. Instead of one request -> one response, you have a loop:

messages = [userMessage]
loop {
    response = llm.generate(messages)
    if response.hasToolCalls {
        results = execute(response.toolCalls)
        messages.append(response)
        messages.append(results)
        continue
    } else {
        display(response.text)
        break
    }
}
Enter fullscreen mode Exit fullscreen mode

The model calls tools, sees the results, decides what to do next. Maybe it takes 3 iterations, maybe 15. The loop runs until the model decides it's done (safety limit at 200, but I've never seen it go past ~30).

Here's what a real interaction looks like internally:

User: "Find the best Italian place near me and remind me at 7pm"

Iteration 1: calls device_info(type: "location") -> gets coordinates
Iteration 2: calls web_search("italian restaurant near 37.78,-122.41") -> gets results
Iteration 3: calls web_fetch(url of top result) -> checks reviews
Iteration 4: calls reminders(create, "Dinner at Trattoria Roma", 7pm) -> done
Iteration 5: responds with summary
Enter fullscreen mode Exit fullscreen mode

Four tool calls, zero user input after the first message. That's the difference between a chatbot and an agent.

Compound requests and the todo system

Simple 5-step tasks are easy. The hard part is compound requests -- "research three competitors, compare pricing, build a tracker, and write a summary." That's where most agent implementations fall apart. The model does step 1, maybe step 2, then returns a half-baked answer.

We solved this with a todo system. When the agent gets a compound request, it creates a todo list with all subtasks, works through them one by one, checks items off as it goes. If the model stops responding but there are still pending items, the runtime nudges it: "you've got unfinished items, keep going." Up to 20 auto-continues before giving up.

There's a hierarchy of nudges that keep the agent on track:

Nudge When it fires What it does
Todo-aware Model stopped, pending items exist "Uncompleted items remain, continue"
Todo nag 3 iterations without updating todo Reminder to update the list
Action-promise Model says "let me try" without calling tools Deletes the empty promise, re-nudges
Auto-continue Empty response after tool execution Injects "[continue]"

The result: you can throw a 7-step request at it and walk away. It grinds through all of them.

Watchdogs and self-healing

An agent that hangs is worse than one that fails fast. So there's a watchdog system:

  • Iteration watchdog -- 600-second timeout per iteration. If the model is stuck, it aborts and retries (up to 3 attempts with backoff).
  • Repetition detector -- catches when the model outputs 600+ characters of repeated text. Truncates and retries.
  • Context overflow recovery -- when the conversation gets too long and the API returns 400/413, the runtime runs a 4-step recovery: truncate oversized tool results, compact old messages, emergency trim, retry. All automatic.
  • Consecutive error tracking -- if 5 tool calls in a row return errors, the loop aborts instead of burning tokens in circles.

Every retry uses exponential backoff. The model never sees that any of this happened -- it gets a clean context and continues.

Parallel everything

When the model calls 3 tools in one turn, they run simultaneously via TaskGroup. Not sequentially -- in parallel. A web search, a calendar check, and a location lookup all fire at the same time. This cuts latency by 2-3x on multi-tool turns.

You can also run multiple conversations in parallel. The runtime uses per-conversation state isolation, so conversation A's agent loop doesn't interfere with conversation B. Three main execution lanes, three sub-agent lanes, separate lanes for cron and nested calls. Each conversation is pinned at entry -- no cross-chat message leaks.

Execution Lanes:
  main:     [conv A] [conv B] [conv C]     <- 3 parallel conversations
  subagent: [task 1] [task 2] [task 3]     <- 3 parallel sub-agents
  cron:     [scheduled task]               <- background jobs
  nested:   [nested call] [nested call]    <- agent-in-agent
Enter fullscreen mode Exit fullscreen mode

57 tools across 8 categories

An agent without tools is just a chatbot with extra steps. Here's what Spectrion can actually do on your phone:

Web & Search -- search the web, fetch and parse pages, extract content

Communication -- send iMessages/SMS, make phone calls, read/write contacts, send emails

Calendar & Reminders -- full CRUD on events and reminders, recurring tasks, cron-style scheduling

Files & Documents -- browse filesystem, cloud files, zip/unzip, parse XLSX/DOCX/CSV natively, local notes with tags

Media & Vision -- take photos, record audio, AI vision (describe/OCR images and video), AI image editing (background removal, object removal, style transfer), AI image generation (text-to-image), scan/generate barcodes and QR codes

System & Device -- device info, screen brightness, location, app launcher, Siri Shortcuts, Apple Health data (steps/heart rate/sleep/workouts), music playback, maps (directions/nearby/geocoding), clipboard, share sheet, passwords, text transforms (base64/hashing/regex/JSON), translation, calculator, timers

AI & Meta -- create tools at runtime, manage skills, long-term memory, delegate to sub-agents, render dynamic UI (Canvas)

57 tools across 8 categories

Not all 57 tools are loaded into the prompt at once -- that would waste tokens. There's a ToolCatalog that keeps core tools always active and activates others on demand. The catalog matches keywords from your message in 11 languages to figure out which tools you need. Say "photo" in English, "foto" in German, or "фото" in Russian -- the camera tool activates automatically.

Canvas: the agent builds UI

The agent can render interactive UIs from JSON -- not just text responses. It's called A2UI (Agent-Driven UI).

The agent outputs a declarative JSON structure, and the runtime renders it as native SwiftUI:

  • Layout: VStack, HStack, ZStack, ScrollView, Grid
  • Input: Buttons, TextFields, Toggles, Sliders, Pickers, Steppers
  • Data: Lists, Tables, Charts
  • Media: Maps, WebViews
  • Composite: Cards, Alerts, Sheets, Forms, Progress indicators

There's a state management system and event bus so the UI is actually interactive -- form submissions, button clicks, slider changes all feed back to the agent. Ask it to "build me a unit converter" and it renders a working app-within-the-app.

Sub-agents

Some tasks need more than one brain. The main agent can spin up sub-agents -- each gets its own tool executor, its own message history, its own loop (up to 50 iterations).

Four built-in personas:

Agent What it does
Researcher Deep web research across multiple sources
Coder Writes and debugs code
Writer Content, creative writing, editing
Tool Builder Creates new JS tools from scratch

You can dispatch up to 5 tasks in parallel. Sub-agents can message each other across sessions -- one agent's output feeds into another's next run via SessionMessenger.

Runtime tool creation

The agent can build its own tools. Not in theory -- it literally writes JavaScript, tests it, fixes bugs, and registers a working tool. All at runtime.

The JS sandbox has three API tiers:

  • Core: HTTP requests, key-value storage, crypto, tool chaining (call_tool())
  • Extended: CommonJS modules, filesystem, HTML parsing, polling
  • Advanced: SQLite database, device queries, image processing

Each tool gets isolated storage (its own KV store + SQLite). Everything is versioned -- every edit saves the old version with full rollback. There are 20+ pre-built templates to start from.

// A tool the agent might create:
const response = http({
    url: `https://api.github.com/repos/${owner}/${repo}`,
    method: 'GET',
    headers: { 'Accept': 'application/vnd.github.v3+json' }
});
const data = JSON.parse(response.body);
return `${data.full_name}: ${data.stargazers_count} stars`;
Enter fullscreen mode Exit fullscreen mode

Workflows: automation chains

Beyond single tool calls, you can build multi-step workflows that chain tools together:

  • Trigger nodes: manual, scheduled (cron), or event-driven
  • Action nodes: any tool call with templated arguments (${variable})
  • Condition nodes: branching logic based on previous results
  • LLM nodes: inject AI reasoning at any point
  • HTTP / JavaScript nodes: call external APIs or run custom logic
  • Loop and parallel branches: repeat or fan out

The agent can build workflows for you or you can design them manually.

Device Mesh: your devices become one agent

This is the big one. Spectrion runs on iPhone, iPad, and Mac -- and with Device Mesh, all your devices become a single unified agent.

Here's how it works: you pair two devices (QR code scan or pairing code). They perform an ECDH key exchange (Curve25519), derive a shared AES-256-GCM key, and establish an encrypted WebSocket tunnel through the server. From that point on, everything syncs in real time:

  • Conversations and messages -- start a chat on iPhone, continue on Mac
  • Settings and persona -- change the agent's personality on one device, it updates everywhere
  • Custom tools -- create a JS tool on Mac, it appears on iPhone
  • Memory -- what the agent learned on one device is available on all
  • Scheduled tasks -- cron jobs sync across devices
  • Knowledge base -- documents indexed on one device are searchable from another

But the killer feature is cross-device tool execution. When your iPhone is paired with your Mac, the agent on Mac can call remote_camera -- and it fires the actual camera on your iPhone. The agent on iPhone can call remote_file_manager -- and it reads files from your Mac's filesystem.

Every remote tool appears in the catalog with a remote_ prefix. The LLM doesn't know or care that the tool runs on a different device -- it just calls it. Under the hood, it's an RPC call over the encrypted WebSocket:

Mac agent calls remote_camera(action: "capture")
  -> MeshManager sends RPC request via encrypted WebSocket
  -> iPhone receives, checks tool policy (allowList)
  -> iPhone executes camera tool locally
  -> Result (including image data) sent back encrypted
  -> Mac agent receives result, continues loop
Enter fullscreen mode Exit fullscreen mode

The sync engine uses a Hybrid Logical Clock (HLC) for conflict-free merging -- no "last write wins" disasters. Deltas are batched (200ms debounce) and sent as they happen. Offline changes queue up and sync when the device reconnects.

All of this is end-to-end encrypted. The server relays WebSocket messages but can't read them -- it never has the symmetric key.

Remote CLI: deploy your agent to Linux servers

With an Ultra subscription, you can deploy Spectrion CLI instances as Docker containers on your own Linux servers. The remote_cli tool handles everything:

  • Deploy: spin up a new container on any server via SSH
  • Manage: list, status, logs, restart, stop, remove
  • Execute: run shell commands on the remote instance
  • Update: pull latest container image

This means your agent can run 24/7 on a VPS, executing long-running tasks, monitoring services, or doing heavy compute -- all controlled from your phone.

Evolution: the agent improves itself

The Evolution Engine is an autonomous self-improvement system. Every 24 hours (configurable), it:

  1. Collects metrics -- tool success rates, tools per conversation, common task patterns, user satisfaction signals
  2. Analyzes via LLM -- sends metrics to the model with a prompt asking for improvements
  3. Safety gate -- filters out any dangerous or nonsensical changes
  4. Applies improvements -- updates system prompt, persona config, temperature
  5. Auto-creates tools -- if it notices you frequently do something that could be automated, it builds a tool for it

Every evolution cycle saves a versioned snapshot. Don't like what it did? Roll back to any previous version instantly. The entire evolution history syncs across devices via Mesh.

The agent literally gets better at helping you the more you use it. Not through fine-tuning -- through runtime self-modification of its own prompt and configuration.

The proxy: bring your own model or use ours

Out of the box, Spectrion works through our own proxy infrastructure -- no API keys needed, just open the app and go. The proxy handles account management, rate limit avoidance, model routing, and token refresh transparently.

If you want full control, plug in your own providers:

  • Anthropic / OpenAI / any OpenAI-compatible API key
  • Local models via Ollama (fully offline, nothing leaves your phone)
  • Apple Foundation Models (on-device, instant for simple tasks)

The system auto-falls back between providers if one fails. Vision tasks route to vision-capable models. Sub-agents get cheaper models so you're not burning premium tokens on web lookups.

The backend maintains a pool of provider accounts with utilization-aware load balancing. If one account gets rate-limited, the system switches to another mid-conversation. A tier system (HIGH/MEDIUM/LOW) routes requests to the right model class. Token refresh runs every 5 minutes in the background. Two production servers with replica sync. The result: zero user-facing rate limit errors.

Heartbeat: your agent works while you sleep

Schedule background tasks and the agent runs them autonomously:

  • "Check my email summary at 8am daily"
  • "Monitor this GitHub repo for new issues every hour"
  • "Remind me about pending tasks when I open the app"

Smart check-in logic -- only pings the model when there's actual work to do, only during active hours (default 8am-11pm). On app launch, it checks for unfinished tasks and picks up where it left off.

Voice mode

A wake-word engine runs on-device ML, detects your trigger phrase, captures speech (15 seconds), runs it through the full agent loop, and reads back the response via TTS. Fully hands-free. Wake-word detection, speech recognition, and TTS all run on-device.

Voice mode

Memory, knowledge base, and everything else

Long-term memory -- the agent decides what to remember through a dedicated tool. The model determines what's worth storing. Memories persist across conversations. Auto-compacts when too large. PII gets filtered.

Knowledge base (RAG) -- upload PDFs, DOCX, images. The system chunks them, computes vector embeddings, and indexes everything. Hybrid search: semantic + keyword matching.

MCP support -- connect any Model Context Protocol server and its tools appear in the catalog instantly. Streamable HTTP transport with SSE, automatic retries, protocol versions 2024-11-05 and 2025-03-26.

Plugins -- a hot-reload plugin system. Discover, load, enable/disable, unload. Plugin tools show up in the same catalog as everything else.

Channels -- connect a Telegram bot and the agent responds there too. Bidirectional -- full tool loop runs on incoming messages. Infrastructure supports SMS/WhatsApp gateways.

Skills -- reusable instruction sets that inject into the system prompt when triggered. Built-in: web_researcher, scheduler. Create your own or download from the community store.

iOS integration -- Siri App Intents, home screen widgets, deep links (spectrion://new-chat), Spotlight search.

Extensions and connections

The stack

For the curious:

  • App: SwiftUI, @Observable (no Combine), async/await, SwiftData, Keychain. Zero third-party dependencies -- no SPM, no CocoaPods. Everything is first-party Apple frameworks.
  • Mesh: WebSocket transport, Curve25519 ECDH key agreement, AES-256-GCM encryption, Hybrid Logical Clock sync, RPC for remote tool execution.
  • Concurrency: Tool executor is an actor. Sendable everywhere. TaskGroup for parallel tools. Execution lanes for resource control. Per-conversation state isolation.
  • Backend: Node.js + Express + SQLite (WAL) + Redis. Account pooling, tier routing, replica sync between servers.
  • Reliability: Watchdog timers, retry with backoff, context overflow recovery, repetition detection, todo-aware nudges. The agent self-heals.
  • Privacy: Keys in Keychain only. Speech/wake-word on-device. E2E encryption for mesh. No conversation logging on server. No telemetry.
  • Localization: 11 languages. Tool auto-activation works in all of them.

What's next

  • Android and desktop clients (architecture separates platform code from agent logic)
  • Community marketplace for sharing custom tools and skills (infrastructure built, store is live)
  • More MCP integrations as the ecosystem grows
  • Deeper OS integration

Try it

App Store | spectrion.app

If you've been looking for an AI app that goes past the text box -- that actually takes action, chains tasks, builds its own tools, connects your devices into one brain, and evolves to serve you better -- give it a shot. Happy to answer architecture questions in the comments.

Top comments (0)