<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Denis Babkevich</title>
    <description>The latest articles on DEV Community by Denis Babkevich (@denisbabkevich).</description>
    <link>https://dev.to/denisbabkevich</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3808634%2Fc5e920dc-5b48-46ba-b7ec-e86c17e2e06e.png</url>
      <title>DEV Community: Denis Babkevich</title>
      <link>https://dev.to/denisbabkevich</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/denisbabkevich"/>
    <language>en</language>
    <item>
      <title>I Designed the AI Agent as a Runtime from Day One, Not as a Chat with Functions</title>
      <dc:creator>Denis Babkevich</dc:creator>
      <pubDate>Wed, 06 May 2026 10:16:18 +0000</pubDate>
      <link>https://dev.to/denisbabkevich/i-designed-the-ai-agent-as-a-runtime-from-day-one-not-as-a-chat-with-functions-5g4n</link>
      <guid>https://dev.to/denisbabkevich/i-designed-the-ai-agent-as-a-runtime-from-day-one-not-as-a-chat-with-functions-5g4n</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5mn6mrwabohxc06xzk5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5mn6mrwabohxc06xzk5.jpeg" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three months ago I sat down, sketched the architecture of Spectrion in my head, and started writing code.&lt;/p&gt;

&lt;p&gt;From the outside, the first version could be described very simply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;an AI agent for iPhone that does not only answer, but can act
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But internally I did not want to build "a chat with tools".&lt;/p&gt;

&lt;p&gt;I wanted an environment where an agent could keep working beyond a single message: create a reminder, continue tomorrow, watch a page, create a workflow, raise an alert, verify its own output, hand part of the work to a Mac or CLI runner, keep task state, remember unfinished items, and refuse to execute an action if policy says no.&lt;/p&gt;

&lt;p&gt;So the starting point was not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;build a chat
then add functions
then somehow bolt automation on top
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It was closer to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the agent is the runtime
the chat is only one interface to it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;iPhone was the first user-facing shell. But the idea was broader from the first day: tools, memory, task board, background jobs, workflows, approvals, policies, watchdog, self-created tools, device mesh, and managed business mode should not be separate islands. They should live inside one agent runtime.&lt;/p&gt;

&lt;p&gt;In this article I will walk through why I started from an execution loop rather than a chat loop, what subsystems this required, and why function calling is only a small part of a real agent system.&lt;/p&gt;



&lt;h2&gt;
  
  
  What I Wanted
&lt;/h2&gt;

&lt;p&gt;I did not want an assistant that says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;here is how you can create a reminder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wanted an agent that creates the reminder.&lt;/p&gt;

&lt;p&gt;Not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;here is how you can search and compare options
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search, compare, save the result, and return the conclusion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I can help you make a plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;keep the task, check that steps were not forgotten,
continue if the work stopped too early,
and remind me when the next step is needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important difference is not that the agent can call a tool.&lt;/p&gt;

&lt;p&gt;The important difference is that a task can live longer than one message.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Watch this page.
If a new version appears, explain the changes,
create a task to update the project,
and remind me tomorrow.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a normal AI chat, this looks like a single request.&lt;/p&gt;

&lt;p&gt;At execution level, it is a process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Understand the task
2. Check the page now
3. Store a baseline
4. Create a monitoring loop
5. Wake up on schedule
6. Compare changes
7. Run reasoning when an alert appears
8. Create a task
9. Send a notification
10. Keep the follow-up for tomorrow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is why I did not treat Spectrion as a mobile chat with functions. I needed a runtime for agentic work.&lt;/p&gt;

&lt;p&gt;The shortest version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;not a chat with functions,
but a runtime for tasks that last longer than one message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Function Calling Is Not Enough
&lt;/h2&gt;

&lt;p&gt;The basic architecture of an AI chat is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user message
  -&amp;gt; LLM
  -&amp;gt; assistant message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user message
  -&amp;gt; LLM
  -&amp;gt; tool call
  -&amp;gt; tool result
  -&amp;gt; LLM
  -&amp;gt; assistant message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For simple requests this is enough.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is the weather tomorrow?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a reminder at 10:00.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But once the agent has to actually work, problems appear that cannot be reliably solved with a prompt.&lt;/p&gt;

&lt;p&gt;A model can return a tool call as text. A provider can hang. A tool result can be too large. Context can overflow. The user can write a follow-up while the run is still executing. A scheduled task can arrive in the middle of another run. A proactive monitor can raise an alert. A subagent can finish a background task. A workflow can require approval. The model can say "done" while the todo list is still open.&lt;/p&gt;

&lt;p&gt;I designed the system as if these cases were not exceptions, but the normal working environment of an agent.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Why a prompt is not enough&lt;/th&gt;
&lt;th&gt;What the runtime needs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool hangs&lt;/td&gt;
&lt;td&gt;The model no longer controls the process&lt;/td&gt;
&lt;td&gt;Timeout, cancellation, retry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool result is too large&lt;/td&gt;
&lt;td&gt;Context overflows&lt;/td&gt;
&lt;td&gt;Truncation, artifacts, summaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User writes a follow-up during a run&lt;/td&gt;
&lt;td&gt;Chat loop does not model competing events&lt;/td&gt;
&lt;td&gt;Queues and execution lanes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduled task arrives during other work&lt;/td&gt;
&lt;td&gt;This is not just another message&lt;/td&gt;
&lt;td&gt;Unified event queue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model says "done" but task is not done&lt;/td&gt;
&lt;td&gt;Model optimizes the answer, not state&lt;/td&gt;
&lt;td&gt;Todo/task board + watchdog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool is forbidden by policy&lt;/td&gt;
&lt;td&gt;Hiding the schema is not enough&lt;/td&gt;
&lt;td&gt;Executor-level policy gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workflow may be dangerous&lt;/td&gt;
&lt;td&gt;LLM may miss edge cases&lt;/td&gt;
&lt;td&gt;Preflight review / approvals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory accumulates noise&lt;/td&gt;
&lt;td&gt;"Remember everything" is unsafe&lt;/td&gt;
&lt;td&gt;Scope, TTL, confidence, rollback&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Function calling is a way to ask the model to choose an action.&lt;/p&gt;

&lt;p&gt;It does not answer these questions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;when should a task run again?
what happens if a tool hangs?
who checks whether the agent stopped too early?
where is task state stored?
how do you avoid showing the model a thousand tools at once?
how do you block a tool at execution level, not just in the prompt?
how does a proactive alert enter the reasoning loop?
how do iPhone, Mac, and CLI coordinate work?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM does not answer those questions.&lt;/p&gt;

&lt;p&gt;The runtime does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Runtime as an Operating Loop
&lt;/h2&gt;

&lt;p&gt;In Spectrion, a normal chat turn is only one input.&lt;/p&gt;

&lt;p&gt;Different events can enter the same runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;manual user message
scheduled task
proactive alert
workflow node
subagent result
nested call
channel message
heartbeat check-in
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each event type has its own settings, but the pipeline is similar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event input
  -&amp;gt; classify event
  -&amp;gt; prepare context
  -&amp;gt; choose agent / model / tools
  -&amp;gt; inject memory, skills, project context
  -&amp;gt; stream LLM
  -&amp;gt; parse tool calls
  -&amp;gt; execute tools with policy, approval, timeout, progress
  -&amp;gt; add tool results
  -&amp;gt; loop again if needed
  -&amp;gt; post-run tail:
       watchdog
       proactive queue
       task board
       memory proposals
       workflow routing
       channel routing
       cleanup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main difference from a normal chat is that an assistant message is not the only output.&lt;/p&gt;

&lt;p&gt;The output can be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;created task
updated memory
scheduled workflow
tool artifact
notification
alert
subagent run
pending approval
task board update
audit event
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the agent is not "a model that answers".&lt;/p&gt;

&lt;p&gt;It is a system that carries work.&lt;/p&gt;

&lt;p&gt;In code, this loop is bounded and controlled.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AgentRuntime&lt;/code&gt; keeps &lt;code&gt;ConversationRunState&lt;/code&gt; per conversation: streaming text, status, token budget, todo state, abort flag, and rolling summary. &lt;code&gt;sendMessage&lt;/code&gt; runs preflight, directives, hooks, user-message persistence, active task envelope, media/link context, and only then enters the outer todo loop.&lt;/p&gt;

&lt;p&gt;The tool loop is limited by &lt;code&gt;clampedMaxToolIterations&lt;/code&gt;. If the model keeps calling tools for too long, the runtime stops the loop at a safety cap. Inside each iteration there are watchdog timeouts: a longer one for a normal turn and a shorter mode when queued user input or tool calls already exist.&lt;/p&gt;

&lt;p&gt;Tool calls are not simply executed one by one. The runtime groups serial-only tools by name, runs other calls through a task group in parallel, and then restores result order. That lets independent calls run faster without breaking tools that require sequential execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lives Inside the Runtime
&lt;/h2&gt;

&lt;p&gt;If you only look at the top-level pipeline, it is easy to underestimate how much infrastructure sits around it.&lt;/p&gt;

&lt;p&gt;The runtime contains several layers that are not there for a nice demo, but for keeping long tasks from falling apart.&lt;/p&gt;

&lt;h3&gt;
  
  
  ContextManager and Compaction
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ContextManager&lt;/code&gt; builds provider-visible context.&lt;/p&gt;

&lt;p&gt;It can include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;system prompt
active skills
trusted project context
Memory V2 retrieval block
session working memory
Device Mesh context
recent messages
tool results
active task envelope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The context is not just "cut from the beginning".&lt;/p&gt;

&lt;p&gt;There is &lt;code&gt;MessageCompactor&lt;/code&gt;, &lt;code&gt;TranscriptSummarizer&lt;/code&gt;, rolling summary, proactive compaction threshold, rehydration budget, and protected tail. Old tool results do not always need to be carried in full: &lt;code&gt;MicroCompactor&lt;/code&gt; can replace old read-heavy tool results with short summaries while preserving recent results and important anchors.&lt;/p&gt;

&lt;p&gt;There is also token-estimator calibration: the runtime compares its estimate with actual provider context usage and adjusts scale by provider/model.&lt;/p&gt;

&lt;p&gt;The goal is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;preserve the meaning of old work
  -&amp;gt; avoid context-window overflow
  -&amp;gt; keep the active objective visible
  -&amp;gt; avoid asking the user to repeat the task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Active Task Envelope
&lt;/h3&gt;

&lt;p&gt;Long tasks need more than a todo list.&lt;/p&gt;

&lt;p&gt;There is an &lt;code&gt;ActiveTaskEnvelope&lt;/code&gt;: a provider-visible anchor for the active task.&lt;/p&gt;

&lt;p&gt;It stores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;original visible user request
latest visible user update
current objective
constraints
exact final markers
pending todos
artifact promises
tool progress
lifecycle
awaiting user question
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lifecycle can be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;active
background
subagent
scheduled
proactive
waiting_for_user
completed
cancelled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If context is compressed, if a queued follow-up arrives, if a task goes into background mode or a subagent, the runtime can recover the active objective from the envelope instead of guessing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Lanes
&lt;/h3&gt;

&lt;p&gt;Another boring but important thing: lanes.&lt;/p&gt;

&lt;p&gt;The code has &lt;code&gt;ExecutionLaneManager&lt;/code&gt; with lane types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main
cron
subagent
nested
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each lane has its own concurrency limit. For example, scheduled work should not consume every execution slot, and subagents can work in parallel but not without limit.&lt;/p&gt;

&lt;p&gt;Lane tokens have a generation. If a lane is cleared or cancelled, old tokens become stale and cannot accidentally release capacity for a newer generation of tasks.&lt;/p&gt;

&lt;p&gt;This is not a flashy product feature, but without details like this, background execution turns into races very quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Queued Follow-Ups
&lt;/h3&gt;

&lt;p&gt;A user can write a new message while the agent is still working.&lt;/p&gt;

&lt;p&gt;That should not corrupt the current run.&lt;/p&gt;

&lt;p&gt;Spectrion has queued follow-ups: the runtime stores the follow-up message, snapshot, delivery order, and can cancel, reorder, or edit queued messages. After the current run finishes, the next input is delivered cleanly.&lt;/p&gt;

&lt;p&gt;So a follow-up during execution is not "the user interrupted the stream". It is a separate queued event.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approvals
&lt;/h3&gt;

&lt;p&gt;Approvals are not just a UI button.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ApprovalManager&lt;/code&gt; stores pending requests, keeps the last 100 approval records, waits up to five minutes, and supports persisted auto-approve rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;always
same arguments
until date
count N times
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together with Business policy this gives two levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;policy says whether the action is allowed at all
approval says whether it may happen now
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Skills, Project Context, Hooks, and Directives
&lt;/h3&gt;

&lt;p&gt;Runtime can inject skills without stuffing one huge prompt into the model. Enabled skills are listed lightly, while full instructions are injected only for activated skills and within budget.&lt;/p&gt;

&lt;p&gt;Project context is not blindly read from the folder. There is trust state, skipped sources, parse errors, project skills, project agents, and name conflicts. Untrusted project context should not become runtime instruction.&lt;/p&gt;

&lt;p&gt;There are hooks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agentBootstrap
agentBeforeRun
agentAfterRun
sessionCreate
sessionDelete
toolBeforeExecute
toolAfterExecute
memoryFlush
messageReceived
messageSent
preCompact
postCompact
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should be described carefully: plugin hook APIs and registries exist, but not every hook is deeply wired through every LLM/tool loop path yet. It is an extension surface, not a claim that the whole runtime is already middleware-driven end to end.&lt;/p&gt;

&lt;p&gt;Directives give fast runtime control from chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/model
/think
/elevated
/verbose
/reset
/status
/compact
/skill
/agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together, this turns Spectrion from a single LLM call into a controllable execution environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Provider Layer, Capability Routing, and Diagnostics
&lt;/h3&gt;

&lt;p&gt;Another layer that is almost invisible from the outside: the runtime should not be tied to one model or one API format.&lt;/p&gt;

&lt;p&gt;There is a provider layer for Spectrion Pro proxy, direct Anthropic/OpenAI providers, Ollama/local models, custom OpenAI-compatible endpoint, custom Anthropic-format endpoint, and Apple Foundation/on-device provider.&lt;/p&gt;

&lt;p&gt;This is not only about picking "the smartest model".&lt;/p&gt;

&lt;p&gt;Different providers handle these things differently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tool schemas
streaming
vision
max context
reasoning/thinking settings
prompt caching
multimodal content
fallback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So &lt;code&gt;ToolDefinition&lt;/code&gt; can format schemas for OpenAI, Anthropic, and Ollama. OpenAI strict schema is enabled only when it is actually safe: when all properties are required. Otherwise, making a schema strict can make the tool harder for the model to call or break the call entirely.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ProviderRequestBuilder&lt;/code&gt; does boring but important work: it cleans system/history messages, prevents the active task envelope from being duplicated, preserves it even in emergency-compaction mode, and builds a provider-visible request where the model can see the correct task state.&lt;/p&gt;

&lt;p&gt;There is also &lt;code&gt;ProviderVisibleContextDiagnostics&lt;/code&gt;. It breaks visible context into categories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;system
skills
project context
memory
tools
MCP
history
images
rehydration
active envelope
free space
compact threshold
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps avoid guessing why the model did not see an important instruction. You can inspect what actually went to the provider.&lt;/p&gt;

&lt;p&gt;There is also exact request diagnostics: before OpenAI/Anthropic/Proxy HTTP calls, the provider saves a fingerprint, redacted request-body preview, token estimates, tool/image counts, and flags such as context-management. This is not a permanent full raw-request log, but it is much better for debugging provider-visible behavior than looking only at the chat transcript.&lt;/p&gt;

&lt;p&gt;The server layer is not just a thin proxy for chat completions either. It has endpoints for completions, media analysis, embeddings, rerank, media encoding, image edit, STT, TTS, voice cloning, video generation, watchdog, and steward. On top of that sit capability resolver, tier/account/provider fallback, retries, SSE sanitization, usage logging, and business/subscription accounting.&lt;/p&gt;

&lt;p&gt;The idea is that the model should be a replaceable part of the system. The agentic loop should live above a specific provider API.&lt;/p&gt;

&lt;h2&gt;
  
  
  ToolCatalog and ToolExecutor
&lt;/h2&gt;

&lt;p&gt;Tools are the obvious part of an agent system. But early on I separated two different problems.&lt;/p&gt;

&lt;p&gt;First:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;which tools should the model see?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;how do we safely execute the selected tool?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first is handled by &lt;code&gt;ToolCatalog&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The second is handled by &lt;code&gt;ToolExecutor&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why You Cannot Put All Tools Into the Prompt
&lt;/h3&gt;

&lt;p&gt;When there are only a few tools, you can pass all schemas to the model.&lt;/p&gt;

&lt;p&gt;But if there are tens, hundreds, or thousands of tools, this starts hurting quality.&lt;/p&gt;

&lt;p&gt;Context grows. The model chooses worse. Similar tools compete. Tool descriptions take space that should belong to the task, memory, and working state.&lt;/p&gt;

&lt;p&gt;So tools are split into layers.&lt;/p&gt;

&lt;p&gt;There is a built-in core: native/system tools for web, files, device, calendar, reminders, notes, notifications, memory, knowledge base, workflows, subagents, Shortcuts, Health, maps, weather, PDF, Office, ZIP, cloud files, and other surfaces.&lt;/p&gt;

&lt;p&gt;On macOS there are desktop capabilities on top: shell, filesystem, Git, AppleScript, screen capture, browser automation, Docker sandbox, process control, patching, and other tools that are impossible or should live in a different execution environment on a phone.&lt;/p&gt;

&lt;p&gt;There is also Shop / external capability layer: today it has more than 1495 tools on top of the built-in core. But this does not mean 1495 schemas are inserted into the prompt at once.&lt;/p&gt;

&lt;p&gt;The important part is activation.&lt;/p&gt;

&lt;p&gt;The principle is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the model should not see every available tool,
it should see the relevant tools for the current step
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Activation considers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;keywords
categories
explicit activation
user language
project context
policy
negations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, if the user says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;do not use the web, check only local files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then web tools should not activate just because the query looks like research.&lt;/p&gt;

&lt;p&gt;That is a small detail, but it matters for trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Executor Matters More Than the Catalog
&lt;/h3&gt;

&lt;p&gt;The catalog decides what to show the model.&lt;/p&gt;

&lt;p&gt;Real guarantees live in the executor.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ToolExecutor&lt;/code&gt; checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whether the tool is allowed by policy
whether approval is required
whether arguments are too large
whether result is too large
whether the task has been cancelled
whether timeout expired
whether the tool can run in this environment
whether an audit event should be recorded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key principle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;policy does not only hide the tool from the LLM,
policy blocks execution in ToolExecutor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a tool is forbidden, removing its schema from the prompt is not enough. The model may try to call it from older context, a nested call, a workflow, or a malformed tool call.&lt;/p&gt;

&lt;p&gt;The boundary must live at execution level.&lt;/p&gt;

&lt;p&gt;Otherwise it is not a security boundary. It is decoration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Artifacts and Native UI
&lt;/h3&gt;

&lt;p&gt;A tool result in Spectrion is not only a string.&lt;/p&gt;

&lt;p&gt;A tool can return:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;text
summary
image
file
artifact reference
structured payload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is an artifact contract. It gives stable artifact IDs, attachment references, edit references, storage scope, and validators that prevent the model from accidentally answering with a raw absolute path like &lt;code&gt;/Users/.../file.png&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This sounds small until the agent starts working with images, PDFs, Office files, downloaded archives, generated media, and long-running tool results.&lt;/p&gt;

&lt;p&gt;If an artifact is already attached to the message, the model gets guidance not to attach it again. If a file or image needs editing, the model should use an artifact reference rather than inventing a path.&lt;/p&gt;

&lt;p&gt;There is also another layer: &lt;code&gt;render_ui&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The agent can return not only text, but a JSON spec for native UI. The A2UI parser supports layout/content/input/data/media/composite components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vstack, hstack, zstack, scroll, grid
text, image, icon, divider, spacer
button, textfield, toggle, slider, picker, stepper
list, table, chart
map, webview
card, alert, sheet, form, progress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are limits on depth and node count, so the model cannot generate an infinite UI tree.&lt;/p&gt;

&lt;p&gt;The practical value: the agent can return a small working interface inside the answer. For example, an approval form, comparison table, task dashboard, workflow card, or research result panel.&lt;/p&gt;

&lt;p&gt;This is another reason Spectrion is not limited to an assistant message. Output can be UI state, artifact, or action.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Workspace, Mutation Journal, and Rollback
&lt;/h3&gt;

&lt;p&gt;Code/workspace tasks have their own project layer.&lt;/p&gt;

&lt;p&gt;Workspace does not mean "the agent can read and write anywhere". The project model has capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;browse files
edit files
view git changes
search files
load project context
load project skills
load project agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is a path policy that should keep the agent inside the selected workspace root. File tree service handles hidden files, binary files, symlinks, unreadable entries, max depth, max entries, max editable bytes, and &lt;code&gt;.gitignore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Project context is loaded separately from ordinary memory. It has trust state, skipped sources, parse errors, project skills, project agents, and conflict detection. If project context is not trusted, it should not become runtime instruction.&lt;/p&gt;

&lt;p&gt;For mutations there is &lt;code&gt;WorkspaceMutationJournal&lt;/code&gt; and &lt;code&gt;WorkspaceChangeTracker&lt;/code&gt;. They record file changes with before/after hashes, snapshots, and rollback eligibility. If a change can be rolled back, the runtime knows where the snapshot is. If not, that is explicit too.&lt;/p&gt;

&lt;p&gt;This matters for macOS/CLI scenarios. When the agent edits a project, "I changed files" is not enough. You need a trace: what changed, how it is evidenced, and whether it can be rolled back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tasks the Agent Should Not Drop
&lt;/h2&gt;

&lt;p&gt;One of the main problems with AI agents: the model can end with a nice answer even though the task is not done.&lt;/p&gt;

&lt;p&gt;The user says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prepare the feature launch.
Check the copy, gather bugs,
prepare the changelog, write the post,
make the release checklist,
and do not stop until every item is closed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A normal assistant often turns this into a good list.&lt;/p&gt;

&lt;p&gt;But the user did not ask for a list. They asked the agent to carry the work.&lt;/p&gt;

&lt;p&gt;That requires task state.&lt;/p&gt;

&lt;p&gt;Simplified:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TodoManager
  -&amp;gt; pending
  -&amp;gt; in_progress
  -&amp;gt; completed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a more project-level layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TaskStore
  -&amp;gt; status
  -&amp;gt; priority
  -&amp;gt; dependencies
  -&amp;gt; blockers
  -&amp;gt; assigned agent
  -&amp;gt; claim session
  -&amp;gt; claim expiration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the agent loop, the runtime checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;are there pending items?
is there in_progress work without result?
were there tool failures?
are there blockers?
did the agent say "done" too early?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the task is not closed, the runtime can continue instead of stopping.&lt;/p&gt;

&lt;p&gt;My principle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the agent should not stop only because
the model produced a nice final message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This changes UX.&lt;/p&gt;

&lt;p&gt;The user should feel not "I received text", but "the task is being carried".&lt;/p&gt;

&lt;h2&gt;
  
  
  Watchdog and Steward
&lt;/h2&gt;

&lt;p&gt;The task board stores state, but there also needs to be a checking layer.&lt;/p&gt;

&lt;p&gt;Spectrion has &lt;code&gt;ChatWatchdog&lt;/code&gt; and &lt;code&gt;AgentSteward&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Watchdog looks at the agent stopping and asks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;is it actually okay to finish this run?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User:
  Check 5 sources and compare options.

Agent:
  Checked 3 sources and wrote "done".

Watchdog:
  User requirement was 5 sources.
  Evidence exists only for 3.
  Completion rejected.

Runtime:
  Continue with the remaining sources.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Steward is a more general verification layer.&lt;/p&gt;

&lt;p&gt;It can run in modes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;completionJudge
toolResultVerify
taskBoardGroom
subagentSupervise
workflowPreflight
workflowPostflight
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whether the task is actually complete
whether a tool result looks incomplete
whether a workflow is safe before launch
whether approval is required
whether a subagent result is valid
whether the task board should be updated
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a magical quality guarantee.&lt;/p&gt;

&lt;p&gt;But architecturally it is better than hoping the main agent always realizes what it forgot.&lt;/p&gt;

&lt;p&gt;The core idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the executor should not be the only judge of its own work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How deeply can the steward challenge the main model?&lt;/p&gt;

&lt;p&gt;Not infinitely and not without rules.&lt;/p&gt;

&lt;p&gt;It can return verdicts such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;no_action
continue_now
create_tasks
blocked
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For &lt;code&gt;completionJudge&lt;/code&gt;, that means: if the main agent says "done", but evidence does not match the original task, the steward can reject completion and ask the runtime to continue.&lt;/p&gt;

&lt;p&gt;For workflow/subagent work, it can suggest follow-up tasks, flag a problem, block a dangerous result, or send work back for revision.&lt;/p&gt;

&lt;p&gt;But there are limits around this.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ChatWatchdog&lt;/code&gt; makes at most two nudges per user request. In steward mode it waits two seconds after the turn stops, and idle checks run after five minutes. Watchdog context is limited to recent messages, steward request uses a low budget and &lt;code&gt;maxTokens: 10000&lt;/code&gt;, and network timeout is 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AgentSteward&lt;/code&gt; also has client-side budget gates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;do not run in Low Power Mode
do not run on bad network
do not run when runtime is busy
do not run when queued user input exists
do not repeat an already applied idempotency key
cool down after rejected actions
limit calls per hour / day
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If steward is unavailable, steward mode does not automatically fall back to an older, less governed judge. Skipping the check is safer than running a different uncontrolled loop.&lt;/p&gt;

&lt;p&gt;So steward is not "a second model arguing forever with the first".&lt;/p&gt;

&lt;p&gt;It is a policy-gated review layer with idempotency, budgets, cooldowns, and strict continuation limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proactive Tools: The Agent Can Raise Its Hand
&lt;/h2&gt;

&lt;p&gt;Normal tools are called when the model decides to call them.&lt;/p&gt;

&lt;p&gt;I wanted the agent to be able to observe.&lt;/p&gt;

&lt;p&gt;That is how proactive tools appeared.&lt;/p&gt;

&lt;p&gt;A proactive tool is a scripted tool running in a background polling loop. It can wake up on a schedule and check a page, API, file, metric, or some other source.&lt;/p&gt;

&lt;p&gt;If nothing happened, it returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If something happened, it returns an alert:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"price_drop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"oldPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;349&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"newPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;289&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The alert enters &lt;code&gt;ProactiveExecutionQueue&lt;/code&gt;, and the runtime handles it as a new input.&lt;/p&gt;

&lt;p&gt;The path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;JS monitor wakes up
  -&amp;gt; checks source
  -&amp;gt; returns alert
  -&amp;gt; ProactiveExecutionQueue
  -&amp;gt; AgentRuntime handles proactive alert
  -&amp;gt; agent reasons about it
  -&amp;gt; user gets result / task / notification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Watch the competitor changelog.
If a new release or pricing change appears,
write a short competitive note and create a task.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A normal chat does not wake itself up.&lt;/p&gt;

&lt;p&gt;An agent runtime can.&lt;/p&gt;

&lt;p&gt;In code, proactive scripted tools have their own polling manager: max concurrent checks, persisted running flags, auto-reconnect, structured JSON alerts, alert routing, and auto-stop after repeated errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Heartbeat: The Agent Exists Outside the Chat
&lt;/h2&gt;

&lt;p&gt;Proactive tools are observers.&lt;/p&gt;

&lt;p&gt;There also needs to be a clock.&lt;/p&gt;

&lt;p&gt;That is &lt;code&gt;HeartbeatManager&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It periodically performs service work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scheduled tasks
scheduled workflows
maintenance
morning briefing
periodic check-ins
Evolution cycle
Mesh leadership
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Periodic check-in looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runtime:
  check whether anything needs attention

agent:
  HEARTBEAT_OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If nothing is happening, the user sees nothing.&lt;/p&gt;

&lt;p&gt;If there are blockers, alerts, follow-ups, or claimable tasks, the runtime can continue work or send a notification.&lt;/p&gt;

&lt;p&gt;This matters because an agent should not exist only at the moment the user writes a message.&lt;/p&gt;

&lt;p&gt;Some tasks need to live for hours, days, or weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Channels and Native Surfaces
&lt;/h2&gt;

&lt;p&gt;The in-app chat is only one entry point.&lt;/p&gt;

&lt;p&gt;There is a &lt;code&gt;ChannelManager&lt;/code&gt; and channel implementations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Telegram bot
Telegram user channel
Slack
Discord
WhatsApp
Email
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ChannelType&lt;/code&gt; also includes SMS and custom channels, but those are reserved types. In the current &lt;code&gt;ChannelManager.connect&lt;/code&gt;, ready implementations exist for Telegram, Telegram User, Slack, Discord, WhatsApp, and Email; SMS/custom should be treated as unsupported until an adapter is added.&lt;/p&gt;

&lt;p&gt;A channel has config, credentials, enabled/autoConnect flags, and status. Configs are stored locally and synced through Mesh. Incoming messages arrive as &lt;code&gt;ChannelMessage&lt;/code&gt;, and responses go back through &lt;code&gt;ChannelManager.sendResponse&lt;/code&gt; with retry.&lt;/p&gt;

&lt;p&gt;Heartbeat checks channel connections separately, and auto-connect is performed only by the Mesh execution leader. This matters so two devices do not both reply in the same Telegram/Slack thread.&lt;/p&gt;

&lt;p&gt;From the outside:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Telegram / Slack / WhatsApp / Email
  -&amp;gt; ChannelManager
  -&amp;gt; AgentRuntime
  -&amp;gt; same tools / memory / policy / approvals
  -&amp;gt; response back to channel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are also native Apple ecosystem surfaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;widgets
Live Activities
Share Extension
App Intents / App Shortcuts
Watch companion
macOS menu bar
macOS Services
voice overlay
wake word daemon
camera stream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are important boundaries here too. WidgetKit, App Intents, WatchConnectivity, Share Extension, and Spotlight are wired as native surfaces. Live Activity UI exists, but starting a Live Activity is intentionally disabled in the manager right now, so the honest phrasing is "the surface is prepared", not "it always runs in production". Wake word is VAD + Apple Speech recognition for a phrase, not a separate embedded hotword model.&lt;/p&gt;

&lt;p&gt;These are not separate little assistants. They should lead the user into the same runtime.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;selected text in any macOS app
  -&amp;gt; Services -&amp;gt; Ask Spectrion
  -&amp;gt; runtime receives the task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;said wake phrase
  -&amp;gt; voice command captured
  -&amp;gt; runtime handles the task
  -&amp;gt; TTS responds
  -&amp;gt; wake-word listens again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sent a file through Share Extension
  -&amp;gt; attachment enters the app
  -&amp;gt; agent can analyze / save / create task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For me this is an important part of the idea: if the agent runtime is real, it can have many inputs and outputs, but state and rules must remain shared.&lt;/p&gt;

&lt;h2&gt;
  
  
  Server Layer
&lt;/h2&gt;

&lt;p&gt;Part of Spectrion lives outside the app.&lt;/p&gt;

&lt;p&gt;The server has routes for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;auth
subscription
chat proxy
webhooks
admin
OAuth
web
OpenAI plan proxy
CLI deploy
plugins
channels
Telegram user
community store
mesh
business
compat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;/v1/chat&lt;/code&gt; is not just "forward request to LLM". It has subscription/business usage checks, multi-provider routing, retries, SSE streaming, active stream limits, provider fallback paths, account pool, capability resolver, and separate endpoints for capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chat completions
media analysis
embeddings
rerank
media embeddings
image edit / generation
speech-to-text
text-to-speech
voice clone / delete
watchdog / steward
video generation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;/v1/channels&lt;/code&gt; stores channel registrations, encrypts credentials with AES-256-GCM, supports server-side polling for Telegram/email, and exposes pending messages to devices.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/v1/mesh&lt;/code&gt; handles pairing, device registration, device list, polling fallback, and ack for relay messages.&lt;/p&gt;

&lt;p&gt;There is also a community store for skills/tools/MCP, plugin routes, CLI deployment, Telegram userbot, desktop releases, app version/config, and a large Business API.&lt;/p&gt;

&lt;p&gt;This matters because Spectrion is not only a local app. The native runtime brings execution closer to the user and device, while the server layer handles provider routing, subscriptions, sync, channels, Mesh relay, store, business control plane, and capability endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflows Inside the Agent Runtime
&lt;/h2&gt;

&lt;p&gt;One tool call is enough for simple actions.&lt;/p&gt;

&lt;p&gt;But real tasks often become graphs.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Every Monday morning, check three sources.
Collect a summary.
If there are important changes, create a task.
If risk is high, notify me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a workflow.&lt;/p&gt;

&lt;p&gt;The user-facing surface for this in Spectrion is the &lt;code&gt;manage_workflows&lt;/code&gt; tool.&lt;/p&gt;

&lt;p&gt;Node types include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trigger
action
condition
delay
transform
llm
http
script
loop
parallel
notify
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part: workflow should not be a separate automation island.&lt;/p&gt;

&lt;p&gt;If a workflow calls a tool, it should go through the same &lt;code&gt;ToolExecutor&lt;/code&gt; as the normal agent.&lt;/p&gt;

&lt;p&gt;That means the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;approvals
policies
timeouts
audit
result limits
Business gates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simplified:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WorkflowEngine
  -&amp;gt; node execution
  -&amp;gt; ToolExecutor
  -&amp;gt; policy / approval / timeout / audit
  -&amp;gt; result
  -&amp;gt; next node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets workflows, proactive monitors, and chat agent live inside one runtime.&lt;/p&gt;

&lt;p&gt;There is a &lt;code&gt;parallel&lt;/code&gt; node inside workflow graphs, but the &lt;code&gt;WorkflowEngine&lt;/code&gt; itself is guarded as a single-run engine. So I would not describe it as "unlimited parallel workflow executions". The honest statement: a graph can have parallel branches within a run, while system-level concurrency is handled through heartbeat, proactive queue, and lanes.&lt;/p&gt;

&lt;p&gt;The user can write an ordinary sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Watch this API.
If the status changes, check details,
write a short explanation,
and create a task if a reaction is needed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent can:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search existing tools
  -&amp;gt; no matching tool
  -&amp;gt; create scripted monitor
  -&amp;gt; test it
  -&amp;gt; register it
  -&amp;gt; build workflow
  -&amp;gt; schedule it
  -&amp;gt; notify only when something changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not "AI wrote instructions for configuring automation".&lt;/p&gt;

&lt;p&gt;This is the agent building automation from the conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-Extension: The Agent Can Create Tools
&lt;/h2&gt;

&lt;p&gt;I wanted the agent not to be limited to the tools that shipped with the app.&lt;/p&gt;

&lt;p&gt;If the required tool does not exist, the agent can create it.&lt;/p&gt;

&lt;p&gt;There is a tool for that: &lt;code&gt;create_tool&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But the boundary matters.&lt;/p&gt;

&lt;p&gt;"The agent writes tools for itself" sounds good, but without restrictions it is dangerous.&lt;/p&gt;

&lt;p&gt;So self-extension is not just "save JS into a file".&lt;/p&gt;

&lt;p&gt;The process looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent writes tool
  -&amp;gt; checks similar tools in catalog / Shop
  -&amp;gt; reads sandbox API reference
  -&amp;gt; validates syntax
  -&amp;gt; runs sandbox/security audit
  -&amp;gt; checks built-in name collisions
  -&amp;gt; persists ScriptedToolDefinition
  -&amp;gt; registers in ToolCatalog
  -&amp;gt; activates tool
  -&amp;gt; runs auto-test with provided args
  -&amp;gt; keeps version history / rollback point
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restrictions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sandbox
versioning
secret fields
rollback
test runs
policy gates
approval requirements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On iOS, dynamic tools run through a JS sandbox. There is a limited API for HTTP, persistent KV, crypto, sandboxed FS, DB, HTML/image helpers, and other things needed for integrations.&lt;/p&gt;

&lt;p&gt;On macOS the surface is broader: scripted tools can be JavaScript, Python, Shell, Ruby, Node.js, Go, Rust, Swift, and Perl. If a runtime is not available locally, Docker fallback is possible.&lt;/p&gt;

&lt;p&gt;Proactive mode is also important: a tool can get an interval and instructions, wake up in the background, and return an alert only when something important happens.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;create_tool&lt;/code&gt; is not only create. In code it has actions for edit, list, delete, test, templates, api_reference, history, rollback, start/stop/status proactive tools. So self-extension is a lifecycle, not one-shot file generation.&lt;/p&gt;

&lt;p&gt;The broader the surface, the more important policy becomes.&lt;/p&gt;

&lt;p&gt;Main principle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the agent can expand capabilities,
but it must not grant itself new permissions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a tool needs a new secret, the user must fill it explicitly.&lt;/p&gt;

&lt;p&gt;If a tool performs a mutating action, approval may be required.&lt;/p&gt;

&lt;p&gt;If a tool is forbidden by policy, the executor must reject it even if the model created it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Personal Tools and Community Store
&lt;/h3&gt;

&lt;p&gt;Self-extension is not limited to local experiments, but it does not publish anything automatically.&lt;/p&gt;

&lt;p&gt;When the agent creates a tool through &lt;code&gt;create_tool&lt;/code&gt;, it is a local/project/user capability. It can be used immediately in the current runtime, workflow, or proactive loop, but publishing it externally should be a separate deliberate action.&lt;/p&gt;

&lt;p&gt;There is a Community Store / Shop layer.&lt;/p&gt;

&lt;p&gt;The server has routes for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;community skills
community tools
community MCP servers
search
install
reviews
downloads
my published items
pending / approved / rejected moderation states
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent creates personal/project tool
  -&amp;gt; user tests it locally
  -&amp;gt; tool can be reused in workflows
  -&amp;gt; tool can be published to community store separately
  -&amp;gt; other users install approved tools from Shop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This boundary matters. The agent can quickly build a missing tool for itself, but community distribution should not happen without a review/publish flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory V2: Scoped Memory, Retrieval, and Rollback
&lt;/h2&gt;

&lt;p&gt;It is easy to imagine agent memory as one large &lt;code&gt;memory.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For a strong agent, that is a poor model.&lt;/p&gt;

&lt;p&gt;In Spectrion, memory now consists of several layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;legacy markdown memory
structured Memory V2 records
MemoryProposal queue
snapshots / rollback
semantic memory
SQLite vector store
conversation recall FTS
session working memory
project context bridge
memory policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not one folder of notes. It is a runtime subsystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Memory
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;MemoryRecord&lt;/code&gt; has scope:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user
agent
conversation
project
global
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;preference
fact
instruction
decision
projectRule
workflow
correction
summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;source
confidence
sensitivity
expiresAt
status
provenance
linkedVectorChunkIds
createdAt
updatedAt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source is also typed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;manual
automaticExtraction
conversationSummary
document
skill
projectRule
migration
tool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sensitivity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;publicFact
personal
privateNote
secret
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;active
archived
rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why does this matter?&lt;/p&gt;

&lt;p&gt;Because "remember this" can mean different things.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remember that I dislike long answers.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a user preference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remember that this project cannot change the API without approval.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a project rule.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remember until the end of the release that we use feature flag X.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a temporary rule with TTL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remember this only for the current conversation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is conversation scope.&lt;/p&gt;

&lt;p&gt;A flat memory cannot reliably distinguish these cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Proposals
&lt;/h3&gt;

&lt;p&gt;Not every memory change should immediately become fact.&lt;/p&gt;

&lt;p&gt;There is &lt;code&gt;MemoryProposal&lt;/code&gt; with operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create
update
archive
delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And statuses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pending
approved
rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for automatic extraction and business scenarios: the agent can propose a record, but the runtime or user decides whether to apply it.&lt;/p&gt;

&lt;p&gt;Important not to overclaim here. The MemoryProposal pipeline exists, but this does not mean every model response automatically becomes a neat V2 proposal. A visible part of the current proposal flow is tied to legacy &lt;code&gt;MEMORY.md&lt;/code&gt; migration and sensitive candidates. Manual &lt;code&gt;memory.save&lt;/code&gt; does dual-write: legacy markdown, Memory V2 record, and semantic chunk for search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Store, Snapshots, and Rollback
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;MemoryV2Store&lt;/code&gt; stores records, proposals, and snapshots. It can:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;upsert record
deduplicate equivalent records
archive / delete
scoped reset
reset all
search records
export markdown
submit proposal
approve / reject proposal
sync project context
create snapshot
rollback snapshot
rollback with linked vector restore
produce vector repair report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Snapshots store records/proposals and linked vector chunks when those chunks are included in the snapshot. So rollback covers structured memory state and can restore linked semantic chunks through the vector restore path, but it is not a promise to restore every embedding everywhere.&lt;/p&gt;

&lt;p&gt;Before saving, a redactor runs. It is best-effort regex redaction for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;api keys
tokens
passwords
secrets
bearer tokens
/Users/... paths
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is important: the agent should not turn long-term memory into a random secret dump. But this is not a DLP system or a mathematical privacy guarantee. It is a runtime protection layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Tool
&lt;/h3&gt;

&lt;p&gt;The user-facing surface is the &lt;code&gt;memory&lt;/code&gt; tool.&lt;/p&gt;

&lt;p&gt;It supports actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;save
read
recall
search
list
delete_entry
clear
index
stats
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On &lt;code&gt;save&lt;/code&gt;, it performs dual-write: legacy markdown memory is saved for compatibility, while a &lt;code&gt;MemoryRecord&lt;/code&gt; is created in Memory V2 and a semantic chunk is added for search.&lt;/p&gt;

&lt;p&gt;On &lt;code&gt;recall&lt;/code&gt;, semantic search is used. If semantic memory is unavailable, there is keyword fallback.&lt;/p&gt;

&lt;p&gt;On &lt;code&gt;index&lt;/code&gt;, conversation history can be reindexed.&lt;/p&gt;

&lt;p&gt;On &lt;code&gt;stats&lt;/code&gt;, you can see how many chunks exist in memory, conversations, documents, and skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Policy and Conversation Modes
&lt;/h3&gt;

&lt;p&gt;Memory should not always behave the same way.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MemoryPolicy&lt;/code&gt; decides on operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;readPersistentMemory
recallMemory
recallConversationMemory
writeManualMemory
writeAutomaticMemory
indexPersistentMemory
indexConversation
summarizeConversation
flushSemanticMemory
sameConversationRecall
toolEvidenceRecall
deleteMemory
clearMemory
readStats
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conversation memory modes change behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;full
auto
hybrid
standard
toolsOnly
off
isolated
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, in &lt;code&gt;standard&lt;/code&gt;/&lt;code&gt;auto&lt;/code&gt; cross-conversation recall can be blocked; in &lt;code&gt;toolsOnly&lt;/code&gt;, only same-conversation/tool-evidence recall can be allowed; in &lt;code&gt;off&lt;/code&gt;, almost every memory operation is blocked except stats.&lt;/p&gt;

&lt;p&gt;So memory is not global magic. It is a policy-controlled part of the runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval Planner
&lt;/h3&gt;

&lt;p&gt;When runtime prepares a prompt, it does not just insert all memories.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MemoryRetrievalPlanner&lt;/code&gt; builds a mixed plan from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;structured records
semantic vector results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The plan filters by:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;status
expiry
scope
cross-conversation rules
policy decisions
linked semantic chunks already covered by records
token budget
max items
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scoring for structured records considers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query overlap
scope priority
confidence
recency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For semantic candidates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vector score
keyword score
combined score
source priority
recency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;MemoryRuntimeContextBuilder&lt;/code&gt; then renders selected records into a dedicated Memory V2 block and can include a debug trace: what was included, what was excluded, and why.&lt;/p&gt;

&lt;p&gt;This is the difference from simple memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;good memory is not "insert more",
but select the right things for this task and token budget
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Semantic Memory and Vector Store
&lt;/h3&gt;

&lt;p&gt;This is a separate runtime layer.&lt;/p&gt;

&lt;p&gt;Semantic memory works with chunk sources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory
conversation
document
skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The current runtime indexes persistent memory, conversation transcripts, and documents from knowledge base. Source &lt;code&gt;skill&lt;/code&gt; is supported at the model/statistics level so the layer can be used for skill retrieval.&lt;/p&gt;

&lt;p&gt;The code has extraction patterns for multiple languages, including Russian, English, German, Spanish, French, Arabic, Hindi, Japanese, Korean, Portuguese, and Chinese.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;VectorStore&lt;/code&gt; stores chunks in SQLite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chunks table
chunk_embeddings table
FTS5 index
LSH buckets
ANN graph edges
provider/model/namespace metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So semantic memory is not just an array of embeddings in process memory. It is a local index with metadata, search, stats, migration, and repair paths.&lt;/p&gt;

&lt;h3&gt;
  
  
  Knowledge Base
&lt;/h3&gt;

&lt;p&gt;Next to memory there is &lt;code&gt;KnowledgeBase&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It is not exactly user memory; it is a RAG layer for documents.&lt;/p&gt;

&lt;p&gt;It can import:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pdf
rtf / rtfd
txt
md / markdown
swift
js
json
csv
xml
html
py
rb
log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Documents are chunked, indexed in &lt;code&gt;VectorStore&lt;/code&gt;, searched with vector search and keyword fallback, and synced through Mesh as metadata/text deltas.&lt;/p&gt;

&lt;p&gt;For the agent, memory and knowledge base are different things. Memory is rules, preferences, decisions, and state. Knowledge base is documents that can be searched and cited during work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conversation Recall and Session Working Memory
&lt;/h3&gt;

&lt;p&gt;Long conversations get two more layers.&lt;/p&gt;

&lt;p&gt;The first is &lt;code&gt;conversation_recall&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It is a tool for searching older messages in the current conversation. Under it there is a SQLite/FTS index, BM25, an include-tool-details option, and background indexing. It is useful when session working memory says: "the older transcript contains exact source id; fetch the original text if needed."&lt;/p&gt;

&lt;p&gt;The second is session working memory.&lt;/p&gt;

&lt;p&gt;It is built from saved transcript and contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;current state
task specification
structured memory facts
files and functions
workflow
recent files and tool artifacts
evidence snippets
errors and corrections
documentation references
learnings
key results
worklog
span summaries
source anchors
compacted context summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It has budgets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;max stored tokens
max section tokens
max snippet characters
max structured memory facts
max evidence snippets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If LLM extraction is unavailable, there is a deterministic fallback.&lt;/p&gt;

&lt;p&gt;So even if a long transcript no longer fits in context, the runtime keeps working memory and source anchors instead of asking the user to remind it what happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Context Bridge
&lt;/h3&gt;

&lt;p&gt;Project context can also become Memory V2 records, but only if the manifest is trusted.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MemoryProjectContextBridge&lt;/code&gt; converts project rules/instructions into scoped records, and &lt;code&gt;MemoryV2Store.syncProjectContext&lt;/code&gt; can archive stale records when a rule disappears from the project.&lt;/p&gt;

&lt;p&gt;This matters for code/workspace mode: project rules should not mix with the user's personal preferences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Short Formula
&lt;/h3&gt;

&lt;p&gt;Memory V2 is not just "long-term memory".&lt;/p&gt;

&lt;p&gt;More precisely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;structured memory
  + semantic retrieval
  + session working memory
  + conversation recall
  + project scoped rules
  + policy
  + rollback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good memory is not remembering everything.&lt;/p&gt;

&lt;p&gt;Good memory is remembering the right thing, in the right context, with the right boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subagents and Sessions
&lt;/h2&gt;

&lt;p&gt;Some tasks are naturally parallel.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;research competitors
check technical documentation
collect integrations
prepare a draft response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can do this sequentially, but it is better to delegate.&lt;/p&gt;

&lt;p&gt;There are two modes.&lt;/p&gt;

&lt;p&gt;Blocking delegation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;delegate_task
  -&amp;gt; parent agent sends task
  -&amp;gt; child agent works
  -&amp;gt; parent waits for result
  -&amp;gt; parent continues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Background session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sessions_spawn
  -&amp;gt; child session starts
  -&amp;gt; parent does not wait
  -&amp;gt; result can be checked later
  -&amp;gt; mailbox / status / history / kill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A subagent should not receive the parent's full permission set.&lt;/p&gt;

&lt;p&gt;For example, it does not need dangerous tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory writes
scheduling
self config
session management
tool/plugin management
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works in a more limited context.&lt;/p&gt;

&lt;p&gt;Important detail: current native subagents are managed child sessions inside the app, backed by hidden conversations and async loops. They are not separate OS processes or containers. Isolation is mainly runtime/tool/context/workspace policy, not process isolation.&lt;/p&gt;

&lt;p&gt;On macOS this is especially useful for coding/workspace tasks: a subagent can work in a separate workspace or git worktree, return a diff/artifacts, and the parent decides what to accept.&lt;/p&gt;

&lt;p&gt;But for me the key is not "another coding mode". The important part is that subagents are part of the same runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tasks
memory
tools
watchdog
policies
mesh
audit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Device Mesh: One Agent Across Devices
&lt;/h2&gt;

&lt;p&gt;I started with iPhone as the first interface, but I did not want to lock the agent into one device.&lt;/p&gt;

&lt;p&gt;Different devices have different strengths.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iPhone
  -&amp;gt; close to the user
  -&amp;gt; notifications
  -&amp;gt; quick decisions
  -&amp;gt; personal context

Mac
  -&amp;gt; desktop tools
  -&amp;gt; filesystem
  -&amp;gt; browser automation
  -&amp;gt; coding tasks
  -&amp;gt; shell / git / docker

CLI / Linux runner
  -&amp;gt; can live 24/7
  -&amp;gt; good for monitoring
  -&amp;gt; background execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This led to Device Mesh.&lt;/p&gt;

&lt;p&gt;The idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;several devices become one agentic loop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In code this is not just a device list. There is pairing, cryptography, sync deltas, remote tools, handoff, and leader election.&lt;/p&gt;

&lt;p&gt;Pairing and transport are built around encrypted Mesh: X25519/Curve25519 key agreement, HKDF-SHA256, AES-256-GCM, Keychain keys, WebSocket relay, short-lived WS token, reconnect/ping, and HTTP polling fallback. The server relay should not understand the payload: it forwards opaque nonce/payload/tag and can buffer offline messages.&lt;/p&gt;

&lt;p&gt;Scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLI on a server watches an API at night.
If an error appears, it creates an incident task.
iPhone receives a push.
If desktop action is needed, the task goes to Mac.
The user answers from the phone.
Runtime continues where the next step is best executed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;iPhone is not necessarily the main executor.&lt;/p&gt;

&lt;p&gt;It can be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;notification target
approval device
personal context device
decision interface
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mac can be a desktop executor.&lt;/p&gt;

&lt;p&gt;CLI can be a 24/7 runner.&lt;/p&gt;

&lt;p&gt;The runtime decides where a concrete step should run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Leader Election, Handoff, and Conflicts
&lt;/h3&gt;

&lt;p&gt;Mesh has two tasks that are easy to confuse.&lt;/p&gt;

&lt;p&gt;The first: choose who does background work.&lt;/p&gt;

&lt;p&gt;The second: synchronize changes between devices.&lt;/p&gt;

&lt;p&gt;Leader election is based on device priority. Each device type has execution priority and notification priority. Execution leader is chosen among online peers by highest execution priority, with a stable tie-break by device id.&lt;/p&gt;

&lt;p&gt;Roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLI / Linux runner -&amp;gt; high execution priority
Mac                -&amp;gt; desktop execution
iPhone             -&amp;gt; high notification priority
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So proactive tools, scheduled workflows, and auto-connected channels should not start on every device at once. The execution leader starts them.&lt;/p&gt;

&lt;p&gt;Notification routing is calculated separately: if several iPhones have the highest notification priority, a notification can be sent to all devices on that level.&lt;/p&gt;

&lt;p&gt;Handoff is a separate path.&lt;/p&gt;

&lt;p&gt;If a device goes background/offline in the middle of work, runtime can send &lt;code&gt;MeshTaskHandoff&lt;/code&gt; to the best online peer. Handoff contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;handoffId
conversationId
originalMessage
completedIterations
sourceDeviceId
sourceDeviceName
timestamp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The receiver adds a system note to the conversation and continues the agent loop on its own device.&lt;/p&gt;

&lt;p&gt;Sync conflicts are not solved by one global CRDT.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;MeshSyncEngine&lt;/code&gt;, each delta has an HLC timestamp. When remote deltas arrive, runtime merges HLC, applies changes by entity type, and updates &lt;code&gt;lastSyncHLC&lt;/code&gt;. While remote deltas are being applied, &lt;code&gt;isApplyingRemote&lt;/code&gt; is enabled so local hooks do not create an echo loop.&lt;/p&gt;

&lt;p&gt;Then logic depends on data type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conversations/messages -&amp;gt; idempotent create by id, update fields when delta arrives
scheduled tasks        -&amp;gt; upsert by task id
knowledge base docs    -&amp;gt; create/delete by document id
memory                 -&amp;gt; append remote sections not already present locally
evolution              -&amp;gt; take newer/higher version history or newer reset
channel/workflow/tool configs -&amp;gt; apply config delta by id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So this is eventual sync with HLC ordering and entity-specific merge, not "magical merge of any two edits".&lt;/p&gt;

&lt;p&gt;If the user edits a task on iPhone while Mac has already started working on it, the change arrives as a new message/update in conversation sync. The active run on Mac does not have to instantly rewrite an already started tool call, but at the next runtime boundary it can see the new visible task through conversation state, queued follow-up, and active task envelope.&lt;/p&gt;

&lt;p&gt;Task board items also have claim/session/expiration so multiple workers do not keep taking the same item forever. But I do not consider this a replacement for a full distributed transaction layer.&lt;/p&gt;

&lt;p&gt;The principle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;background execution -&amp;gt; leader election
mid-run continuity   -&amp;gt; handoff
state convergence    -&amp;gt; HLC + entity-specific merge
dangerous actions    -&amp;gt; policy / approval / fail-closed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is more honest than promising perfect conflict resolution. A real agent system is safer with clear boundaries than hidden magic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Policies: Governance Must Live in the Runtime
&lt;/h2&gt;

&lt;p&gt;The more an agent can do, the more governance matters.&lt;/p&gt;

&lt;p&gt;For a personal agent, simple approvals may be enough.&lt;/p&gt;

&lt;p&gt;For a company, that is not enough.&lt;/p&gt;

&lt;p&gt;Companies need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;roles
departments
managed prompts
tool policies
approval policies
audit
locked UI
provider restrictions
revocation
signed manifests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main architecture idea is the same as with tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;policy must be applied not only in the prompt,
but in the executor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If an employee is not allowed to send emails without approval, it is not enough to tell the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;do not send emails without approval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ToolExecutor&lt;/code&gt; must physically block the send action until approval exists.&lt;/p&gt;

&lt;p&gt;In personal mode, approval is simple: &lt;code&gt;ApprovalManager&lt;/code&gt; creates a pending request, waits up to five minutes, and then marks it expired. It keeps history of the last 100 requests and supports auto-approve rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;always
same arguments
until date
count N times
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can approve/deny one request, approve/deny all pending requests for an agent, clear pending requests, and inspect approval stats.&lt;/p&gt;

&lt;p&gt;In Business mode, manifest policy sits on top.&lt;/p&gt;

&lt;p&gt;Managed mode uses a manifest approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Organization
  -&amp;gt; Workspace
  -&amp;gt; Department
  -&amp;gt; Member
  -&amp;gt; ManagedEnvironmentManifest
  -&amp;gt; Runtime policy gate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The manifest contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;allowed tools
approval-required tools
model policy
memory policy
UI locks
audit disclosure
managed prompt rules
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;toolPolicy&lt;/code&gt; contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;allowedTools
deniedTools
defaultDecision
approvalRequiredTools
reasonCodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a tool is in &lt;code&gt;approvalRequiredTools&lt;/code&gt;, runtime returns &lt;code&gt;requiresApproval&lt;/code&gt;. If a tool is not allowed and default decision is deny, executor must not run it at all.&lt;/p&gt;

&lt;p&gt;Business layer has department packages for Sales, Support, HR, Finance, Operations, Marketing, Executive, Client Workspace, and vertical packages such as beauty/nails studio, fitness/gym, hotel/small property, cafe/restaurant, retail/marketplace, accounting office, and blank managed department.&lt;/p&gt;

&lt;p&gt;But runtime entitlement does not come directly from a nice template. It is materialized into a published department profile and signed manifest.&lt;/p&gt;

&lt;p&gt;The server model behind this is not decorative. &lt;code&gt;/v1/business&lt;/code&gt; has organizations, members, invites, subscriptions, workspaces, departments, department profile versions, manifests, providers, integration setups, proactive review queue, audit events, guardian rules/reports, app clients, and automation items.&lt;/p&gt;

&lt;p&gt;Department profile is versioned: draft, publish, rollback. Publish/rollback invalidates old manifests. Manifest endpoint checks org/member/seat/subscription, workspace/department membership, minimum app version, signs the manifest with Ed25519, includes &lt;code&gt;policyHash&lt;/code&gt;, TTL, &lt;code&gt;toolPolicy.defaultDecision = deny&lt;/code&gt;, approval-required tools, and integration setup refs without secrets.&lt;/p&gt;

&lt;p&gt;Fail-closed matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if manifest is stale,
signature is invalid,
user is revoked,
or policy failed to load,
mutating actions are blocked
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially important for agents.&lt;/p&gt;

&lt;p&gt;The stronger the agent, the less you can rely on prompt-level goodwill.&lt;/p&gt;

&lt;p&gt;Business provider surface is also guarded: Spectrion Managed provider is enabled, while company-managed provider is blocked without a pilot flag. So I cannot honestly claim that any company can already plug in an arbitrary LLM provider and immediately use it in production. The code protects this with egress guard, redacted audit, and manifest invalidation when provider policy changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Store Factory
&lt;/h3&gt;

&lt;p&gt;Another part of the business layer is the Store with ready automation capabilities.&lt;/p&gt;

&lt;p&gt;The "1495+ tools" number in Spectrion is not made up. In the current seed it is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;7 curated base tools
62 verticals * 24 tool blueprints = 1488 generated tools

total: 1495 approved Store tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are also official skills and MCP profiles for business automation.&lt;/p&gt;

&lt;p&gt;Base tools cover typical business flows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;omnichannel intake router
booking grid connector
1C accounting bridge
inventory reorder planner
marketplace order triage
file storage ingest
proactive follow-up watch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mass catalog is generated by verticals and blueprints. Verticals include beauty/nails, fitness/gym, hotel, cafe/restaurant, retail/marketplace, accounting, legal, healthcare, real estate, education, logistics, auto service, and others. Connector families include Telegram Bot API, WhatsApp Cloud API, Instagram Messaging API, VK, SMTP/OAuth, TravelLine, Bnovo, OPERA Cloud, YCLIENTS, Google Sheet/CSV, 1C Fresh/OData/file exchange, QuickBooks, iiko, r_keeper, Bitrix24, amoCRM, Ozon, Wildberries, Shopify, Google Drive, SharePoint, and SFTP.&lt;/p&gt;

&lt;p&gt;Important: this does not mean every tool immediately performs dangerous write actions in an external system without setup.&lt;/p&gt;

&lt;p&gt;Business Store is review-first and dry-run-first. Many templates return:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;setup_required
dry_run
fetch_sample
provider_review_required
external_write_pending_approval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So Store gives a company a fast start for integrations and workflows, but runtime still has to consider secrets, setup, approvals, audit, and fail-closed behavior.&lt;/p&gt;

&lt;p&gt;This is essential for business scenarios: a large capability catalog is useful only if it does not bypass governance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approval Pipeline and Review Queue
&lt;/h3&gt;

&lt;p&gt;Two things should be separated honestly.&lt;/p&gt;

&lt;p&gt;The code already has:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;approval-required tools in manifest
client pending approvals with timeout
auto-approve rules
proactive review queue
guardian rules
business notification channels
audit events
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Guardian rules can have response mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;monitor_only
approval_required
deny
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And notification mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;none
digest
immediate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Proactive review queue is used when automation/proactive run prepared an external action but should not execute it immediately. Admin can approve/decline/archive the item, and the external action is not executed automatically as a side effect of review. In the dry-run/review path, it is explicitly recorded that external action and mutation were not executed.&lt;/p&gt;

&lt;p&gt;So business runtime can:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deny an action
require approval
place a proactive item into review queue
write an audit event
send digest/immediate notification
block mutating action on stale/invalid manifest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But it is not a universal BPMN engine for approval chains.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if approver A does not answer in N minutes,
escalate to approver B,
then C,
then auto-decline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;should not be described as a generic ready-made mechanism. The current runtime has a safer base behavior: request expires, mutating action is not executed, event remains in history/audit/review surface.&lt;/p&gt;

&lt;p&gt;For me this is an intentional boundary. An agent runtime needs fail-closed and audit first; complex approval graphs can come after.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evolution: The Agent Can Improve, but Not Grant Itself Permissions
&lt;/h2&gt;

&lt;p&gt;Self-improvement sounds attractive, but it is a dangerous area.&lt;/p&gt;

&lt;p&gt;You cannot simply let the model rewrite its prompt, change providers, expand permissions, or enable destructive workflows.&lt;/p&gt;

&lt;p&gt;So Evolution is built through signals, proposals, policy gate, and rollback.&lt;/p&gt;

&lt;p&gt;Important detail: this is a native runtime mechanism, not a business server that rewrites the app by itself. In code it is typed signals, signal store, proposer/critic, deterministic policy gate, rule store snapshots, and rollback.&lt;/p&gt;

&lt;p&gt;The flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runtime events
tool failures
watchdog misses
user feedback
subagent results
workflow issues
  -&amp;gt; EvolutionSignalProducer
  -&amp;gt; EvolutionSignalStore
  -&amp;gt; proposals
  -&amp;gt; critic / policy gate
  -&amp;gt; snapshot
  -&amp;gt; limited rule mutation
  -&amp;gt; rollback if needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What can be improved:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tool descriptions
planning rules
validation rules
workflow hints
project rules
runtime guidance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What cannot be done automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grant new permissions
change API keys
bypass approvals
enable forbidden tools
expand destructive actions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The principle is the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the agent can learn,
but it must not grant itself new rights
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  One Full Trace
&lt;/h2&gt;

&lt;p&gt;Suppose the user writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Watch the competitor changelog.
If a new release or pricing change appears,
write a short note, create a task, and notify me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a normal chat, the answer would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sure, here is how you can configure it...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In an agent runtime this becomes a trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. User message enters runtime
2. Runtime classifies it as long-running monitoring task
3. Context builder injects relevant memory/project rules
4. ToolCatalog activates web/proactive/workflow/task tools
5. Agent checks whether a suitable monitor already exists
6. If not, agent creates scripted monitor
7. Tool is validated, sandboxed, tested, and registered
8. Workflow is built:
     trigger -&amp;gt; fetch -&amp;gt; diff -&amp;gt; condition -&amp;gt; llm summary -&amp;gt; task -&amp;gt; notify
9. Approval is requested if needed
10. Workflow is scheduled
11. Baseline is stored
12. Heartbeat wakes workflow later
13. Monitor detects change
14. Alert enters ProactiveExecutionQueue
15. Runtime handles alert as event input
16. Agent summarizes change
17. Task board gets a new task
18. iPhone receives notification
19. Watchdog checks whether required steps are done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the difference.&lt;/p&gt;

&lt;p&gt;The agent did not just answer. It created a mechanism that continues working after the message ends.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Turned Out to Be Hardest
&lt;/h2&gt;

&lt;p&gt;The hardest part was not calling an LLM and not adding tools.&lt;/p&gt;

&lt;p&gt;The hard part was keeping state and recovering from an imperfect world.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Model Can Be Confidently Wrong
&lt;/h3&gt;

&lt;p&gt;It can say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Done.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even though a tool failed.&lt;/p&gt;

&lt;p&gt;Or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I checked all sources.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even though evidence exists only for some of them.&lt;/p&gt;

&lt;p&gt;That is why watchdog, steward, and task board exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Background Execution Is Not Magic
&lt;/h3&gt;

&lt;p&gt;Especially on mobile.&lt;/p&gt;

&lt;p&gt;You cannot just say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let the agent always run in the background
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have to deal with platform limits, schedules, notifications, device mesh, and moving execution to a more suitable device.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A Thousand Tools Are Worse Than Twenty Right Ones
&lt;/h3&gt;

&lt;p&gt;A large capability catalog is useful only when the model sees a relevant subset.&lt;/p&gt;

&lt;p&gt;Otherwise quality drops.&lt;/p&gt;

&lt;p&gt;That is why activation, suppression, categories, and policy matter more than the raw tool count.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Memory Without Boundaries Turns Into Noise
&lt;/h3&gt;

&lt;p&gt;If the agent "remembers everything", it starts dragging stale decisions into new contexts.&lt;/p&gt;

&lt;p&gt;You need scope, TTL, confidence, provenance, and rollback.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Governance Cannot Be Bolted on Later
&lt;/h3&gt;

&lt;p&gt;If an agent can act, policies must live in the executor from the beginning.&lt;/p&gt;

&lt;p&gt;Prompt-level restrictions are not a reliable boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Self-Extension Requires Discipline
&lt;/h3&gt;

&lt;p&gt;An agent that can create tools must pass through sandbox, tests, versioning, secrets, and rollback.&lt;/p&gt;

&lt;p&gt;Otherwise it is not extension. It is uncontrolled code generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Not Just n8n, Claude Code, or a Channel Gateway
&lt;/h2&gt;

&lt;p&gt;I find it useful to separate agent systems by center of gravity.&lt;/p&gt;

&lt;p&gt;There are coding agents. Their strongest loop is repository, files, shell, git, tests, patch, code intelligence.&lt;/p&gt;

&lt;p&gt;There are automation platforms. Their strongest loop is workflow graph, triggers, integrations, deterministic automation.&lt;/p&gt;

&lt;p&gt;There are channel gateways. Their strongest loop is Telegram, Slack, WhatsApp, email, webchat, and routing messages between channels.&lt;/p&gt;

&lt;p&gt;Spectrion is built from a different center:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;native agent runtime
  + mobile / desktop / CLI
  + tools
  + workflows
  + proactive monitors
  + memory
  + task state
  + watchdog / steward
  + device mesh
  + policies / approvals / audit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Coding can be part of the system.&lt;/p&gt;

&lt;p&gt;Workflow can be part of the system.&lt;/p&gt;

&lt;p&gt;Channels can be part of the system.&lt;/p&gt;

&lt;p&gt;But the center of gravity is the agent operating loop.&lt;/p&gt;

&lt;p&gt;Not "where does the user send a message", but "how does the task live, execute, get checked, and continue".&lt;/p&gt;

&lt;p&gt;In that sense, Spectrion is not trying to be only an n8n replacement, only a coding harness, or only a message gateway.&lt;/p&gt;

&lt;p&gt;n8n/Make/Zapier are good as external automation graphs. In Spectrion, workflow lives inside the agent runtime: it can call the same tools, pass through the same policies, request approvals, return proactive alerts into the reasoning loop, and use a tool the agent just created.&lt;/p&gt;

&lt;p&gt;Claude Code and similar coding agents are strong in the software engineering loop. Spectrion can also do code/workspace tasks on Mac, but it is not limited to IDE or terminal: the same runtime should live on iPhone, Mac, CLI, schedules, workflows, memory, device mesh, and Business policies.&lt;/p&gt;

&lt;p&gt;A channel gateway is useful as a message entry point. But if the message does not enter a shared execution layer with task state, watchdog, tools, memory, policies, and background loop, it remains message routing rather than a full environment for agentic work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Architecture in Short
&lt;/h2&gt;

&lt;p&gt;If you put everything together, it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inputs:
  chat message
  scheduled task
  proactive alert
  workflow event
  subagent result
  channel message
  voice command
  App Intent / widget / share extension
  heartbeat

        |
        v

AgentRuntime:
  context preparation
  active task envelope
  session working memory
  memory retrieval
  project context injection
  skill activation
  tool activation
  model selection
  provider request building
  execution lane acquire
  LLM loop
  tool parsing
  tool execution
  approval wait
  result handling
  artifact handling
  continuation logic
  compaction / rehydration

        |
        v

Execution layer:
  ContextManager
  MessageCompactor
  MicroCompactor
  ActiveTaskEnvelopeStore
  ProviderManager
  ProviderRequestBuilder
  ProviderVisibleContextDiagnostics
  ToolCatalog
  ToolExecutor
  AgentArtifactContract
  A2UIRenderer
  WorkflowEngine
  ProactiveExecutionQueue
  HeartbeatManager
  ChannelManager
  ApprovalManager
  ProjectContextLoader
  ProjectFileTreeService
  ProjectSearchService
  ProjectWorkspaceAuditLog
  WorkspaceMutationJournal
  TodoManager
  TaskStore
  AgentSteward
  ChatWatchdog
  Memory V2
  SemanticMemory / VectorStore
  KnowledgeBase
  ConversationRecall FTS
  SkillCatalog
  HookManager
  ExecutionLaneManager
  Device Mesh
  Business Policy Gate

        |
        v

Outputs:
  assistant message
  tool artifact
  rendered native UI
  notification
  task update
  memory proposal
  workflow schedule
  approval request
  workspace mutation snapshot
  subagent run
  channel response
  voice/TTS response
  Live Activity surface update
  audit event
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For me, this is the essence of Spectrion.&lt;/p&gt;

&lt;p&gt;Not a set of separate automation features.&lt;/p&gt;

&lt;p&gt;One runtime through which different forms of agentic work pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Architecture Is Useful
&lt;/h2&gt;

&lt;p&gt;It works best not for questions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;answer X
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But for tasks with a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;observe -&amp;gt; think -&amp;gt; act -&amp;gt; verify -&amp;gt; continue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Every morning at 8:30, look at my calendar,
reminders, and open tasks.
Make a short briefing:
what matters today,
where time conflicts exist,
what I promised but did not schedule.
If something requires a decision, create a task and notify me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Help me prepare a feature launch.
Check copy, gather bugs,
prepare the changelog, write the post,
make the release checklist,
and do not stop until every item is closed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Watch this page.
If a new version appears,
briefly explain the changes
and create a task to update the project.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Let CLI monitor the server,
Mac do heavy operations,
and iPhone only notify me and receive decisions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are tasks where a normal chat loop quickly becomes too weak an abstraction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I did not start with "build a chat and then attach tools".&lt;/p&gt;

&lt;p&gt;From day one I wanted an agent that can live beyond one message.&lt;/p&gt;

&lt;p&gt;Externally it looked like an AI agent for iPhone.&lt;/p&gt;

&lt;p&gt;Internally the architectural bet was different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent runtime first,
chat interface second
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model can change. Tools can expand through Shop or be created by the agent. Workflows can be built from conversation. Tasks can continue in the background. Memory needs boundaries. Watchdog should verify that the agent did not stop too early. Policies must apply not only in the prompt, but in the executor. iPhone, Mac, and CLI should be parts of one agentic loop.&lt;/p&gt;

&lt;p&gt;For me the main conclusion is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;an AI agent is not an LLM with functions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An AI agent is a runtime that accepts events, carries state, executes tools, verifies results, and continues work after a single message has ended.&lt;/p&gt;

&lt;p&gt;That is what I am building in Spectrion.&lt;/p&gt;

&lt;p&gt;Where to look:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Website: &lt;a href="https://spectrion.app" rel="noopener noreferrer"&gt;https://spectrion.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;App Store: &lt;a href="https://apps.apple.com/app/spectrion-agent-ai/id6759151825" rel="noopener noreferrer"&gt;https://apps.apple.com/app/spectrion-agent-ai/id6759151825&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Built an AI Agent with 57+ Tools That Actually Does Stuff on Your iPhone</title>
      <dc:creator>Denis Babkevich</dc:creator>
      <pubDate>Thu, 05 Mar 2026 21:46:19 +0000</pubDate>
      <link>https://dev.to/denisbabkevich/i-built-an-ai-agent-with-57-tools-that-actually-does-stuff-on-your-iphone-3345</link>
      <guid>https://dev.to/denisbabkevich/i-built-an-ai-agent-with-57-tools-that-actually-does-stuff-on-your-iphone-3345</guid>
      <description>&lt;p&gt;I got tired of AI chatbots that can only talk. So I built &lt;strong&gt;Spectrion&lt;/strong&gt; — an autonomous AI agent for iPhone that actually executes tasks: sends messages, manages calendar, searches the web, creates tools, and chains it all together without you lifting a finger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://apps.apple.com/app/spectrion-agent-ai/id6759151825" rel="noopener noreferrer"&gt;App Store&lt;/a&gt;&lt;/strong&gt; | &lt;strong&gt;&lt;a href="https://spectrion.app" rel="noopener noreferrer"&gt;spectrion.app&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Not Just Chat. A Real Agent.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuk1e3wmap8hkbnc1drw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuk1e3wmap8hkbnc1drw.png" alt="Onboarding — Capabilities" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most "AI apps" are glorified ChatGPT wrappers. Spectrion runs an &lt;strong&gt;agent loop&lt;/strong&gt; — the LLM calls tools, gets results, and keeps going until the task is done:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Find a pizzeria rated 4.5+ nearby and add it to my calendar"

Agent:
  → web_search("best pizzerias near me rated 4.5+")
  → Found 3 places
  → calendar("add dinner at Luigi's, 7 PM tonight")
  → Done!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One message. Multiple tools. Zero hand-holding.&lt;/p&gt;




&lt;h2&gt;
  
  
  57+ Built-in Tools
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxr84km9kpq0xdz7mkbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxr84km9kpq0xdz7mkbp.png" alt="47 Built-in Tools" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eight categories covering everything your phone can do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search &amp;amp; Web&lt;/strong&gt; — web_search, web_fetch, URL tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication&lt;/strong&gt; — iMessage, SMS, calls, email, contacts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Organization&lt;/strong&gt; — Calendar, reminders, scheduled tasks, cron&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Files&lt;/strong&gt; — Filesystem, cloud, XLSX/DOCX/CSV/PDF parsing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media&lt;/strong&gt; — Camera, vision (OCR), image generation &amp;amp; editing, audio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System&lt;/strong&gt; — Device info, brightness, location, maps, health, shortcuts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI &amp;amp; Meta&lt;/strong&gt; — Runtime tool creation, skills, memory, sub-agents, UI rendering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensions&lt;/strong&gt; — MCP servers, plugins, community tools&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multi-tool task execution
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3l441vo3m7t9dzwang2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3l441vo3m7t9dzwang2.png" alt="Meeting Reminder" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Remind me about the team standup tomorrow at 10am and add it to my calendar"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent calls &lt;code&gt;reminders&lt;/code&gt; and &lt;code&gt;calendar&lt;/code&gt; simultaneously, sets up both, and confirms — all in one turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web search &amp;amp; summarization
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxoru121w2q1dktme76il.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxoru121w2q1dktme76il.png" alt="Web Search" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Search the web for the latest AI agent news"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Calls &lt;code&gt;web_search&lt;/code&gt;, fetches results, and returns a structured summary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Runtime tool creation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8lkvp3nqlvnbxxkgi704.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8lkvp3nqlvnbxxkgi704.png" alt="Custom Tool" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Create a tool that converts temperatures between Celsius and Fahrenheit"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent writes JavaScript, tests it, and registers it as a new tool — usable immediately. Tools get versioning, rollback, and isolated storage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workflows — Visual Automation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff590idvui8mheczo4x50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff590idvui8mheczo4x50.png" alt="Daily Report Workflow" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Chain tools into multi-step workflows with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Triggers&lt;/strong&gt; — manual, scheduled (cron), event-driven&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actions&lt;/strong&gt; — HTTP requests, LLM calls, notifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logic&lt;/strong&gt; — Conditional branching, loops, parallel execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build them visually or let the agent generate them from a description.&lt;/p&gt;




&lt;h2&gt;
  
  
  Extensions: Store, Skills, Plugins, MCP
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu816bdvhhkau5jb5euy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu816bdvhhkau5jb5euy.png" alt="Store" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpj226e007658c1ttull8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpj226e007658c1ttull8.png" alt="Extensions" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Store&lt;/strong&gt; — Browse and install community tools &amp;amp; skills&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; — Reusable instruction sets (web_researcher, scheduler, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugins&lt;/strong&gt; — Hot-reload packages (Smart Summarizer, Code Sandbox, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt; — Model Context Protocol servers for external integrations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Tools&lt;/strong&gt; — JS sandbox with HTTP, KV storage, SQLite, device APIs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Autonomous Agent Features
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkawr08jpu9oz5nguz2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkawr08jpu9oz5nguz2p.png" alt="Settings — Autonomous Agent" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Heartbeat
&lt;/h3&gt;

&lt;p&gt;The agent wakes up periodically (configurable interval) to check pending tasks, process messages, and run maintenance — even when you're not looking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Morning Briefing
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3yljcx2e3lphuyr1mcjp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3yljcx2e3lphuyr1mcjp.png" alt="Morning Briefing" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Daily briefing with weather, calendar events, news headlines — customizable topics and time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Check-In
&lt;/h3&gt;

&lt;p&gt;Auto-resumes unfinished tasks. Configurable active hours (default 8am–11pm).&lt;/p&gt;

&lt;h3&gt;
  
  
  Chat Watchdog
&lt;/h3&gt;

&lt;p&gt;Auto-nudges the agent if it stops mid-task.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evolution Engine — Self-Improvement
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtmwdbycyui45e50tvc5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtmwdbycyui45e50tvc5.png" alt="Evolution Engine" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every 24 hours, the agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Analyzes tool usage patterns and success rates&lt;/li&gt;
&lt;li&gt;Refines its system prompt and persona parameters&lt;/li&gt;
&lt;li&gt;Auto-creates tools for repetitive tasks&lt;/li&gt;
&lt;li&gt;All changes versioned with instant rollback&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Semantic Memory
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxyzirr3ruq6i1s2io6u2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxyzirr3ruq6i1s2io6u2.png" alt="Memory Dashboard" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Long-term memory with vector search. The agent decides what to remember — conversations get indexed automatically. Semantic search across all stored knowledge.&lt;/p&gt;




&lt;h2&gt;
  
  
  Device Mesh — Multi-Device Agent
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fmiieeyk63kr8yqpnmh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fmiieeyk63kr8yqpnmh.png" alt="Device Mesh" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Connect iPhone + Mac into a single agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sync conversations, tools, settings, memory&lt;/li&gt;
&lt;li&gt;Execute tools cross-device (Mac agent can trigger iPhone camera)&lt;/li&gt;
&lt;li&gt;End-to-end encrypted (Curve25519 ECDH + AES-256-GCM)&lt;/li&gt;
&lt;li&gt;Offline queuing with conflict resolution&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Channels — Telegram, Discord, Slack, WhatsApp, Email
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8da4r906wvzl2ll4nepy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8da4r906wvzl2ll4nepy.png" alt="Channels" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Connect external messaging platforms. The agent can receive and respond through Telegram bots, Discord, Slack — with full tool access.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Personalization
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkd5nsvfnffl03inrmka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkd5nsvfnffl03inrmka.png" alt="Personalization" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configure the agent's personality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Role&lt;/strong&gt; — Assistant, Coder, Researcher, Writer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style&lt;/strong&gt; — Friendly, Professional, Casual&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personality Level&lt;/strong&gt; — Pure LLM, Human (natural), Realistic (full character)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User context&lt;/strong&gt; — Name, address style&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Choose Your AI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjzclah09bkl2cz77db5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjzclah09bkl2cz77db5.png" alt="Provider Selection" width="800" height="1738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Works with multiple providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Spectrion Pro&lt;/strong&gt; — All-in-one, 3-day free trial, no API key needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apple On-Device&lt;/strong&gt; — Private, free, works offline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic (Claude)&lt;/strong&gt; — Direct API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI (GPT-4o)&lt;/strong&gt; — Direct API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; — Local models, fully offline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom&lt;/strong&gt; — Any OpenAI-compatible endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Auto-fallback between providers. Utilization-aware load balancing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; SwiftUI, &lt;a class="mentioned-user" href="https://dev.to/observable"&gt;@observable&lt;/a&gt;, async/await, SwiftData — zero third-party dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency:&lt;/strong&gt; Actor-based tool executor, TaskGroup parallel execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Node.js, SQLite (WAL), Redis, account pooling, tier routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Keychain storage, E2E encrypted mesh, no server-side conversation logging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Localization:&lt;/strong&gt; 11 languages with automatic tool activation by keywords&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://apps.apple.com/app/spectrion-agent-ai/id6759151825" rel="noopener noreferrer"&gt;App Store&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://spectrion.app" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://us.spectrion.app/screenshots/" rel="noopener noreferrer"&gt;More Screenshots&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built as a solo dev. If you have questions about the architecture, agent loop, tool system, or anything else — ask away in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ios</category>
      <category>swift</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
