Here's something that's bothered us for the past year: the moment you ask an AI agent to do something, it disappears. You prompt, you wait, you get a result. If you realise halfway through that you forgot to mention something, you cancel and start over. If you want to know what it's doing, you can't. If something more urgent comes up, you can't pause it and come back later.
This is a fundamental limitation of how agent frameworks are built. One LLM, one loop, one tool call at a time. The model picks a tool, calls it, reads the result, picks the next tool. There's no interface for the outside world to interact with the loop while it's running.
We needed something different. We're building AI assistants that you onboard like new hires — share your screen, walk them through your tools, hop on a call. They need to be doing things while you're talking to them. They need to handle "actually, also check train options" without starting over.
So we built steerable tool loops. Today we're open-sourcing the engine under MIT: github.com/unifyai/unity
Every operation returns a handle
This is the core idea. When you ask the assistant to do something, you don't get a promise that eventually resolves. You get a live handle:
handle = await actor.act("Research flights to Tokyo and draft an itinerary")
# Twenty seconds later, while it's still working:
await handle.interject("Also check train options from Tokyo to Osaka")
# Something urgent comes up:
await handle.pause()
# ... deal with it ...
await handle.resume()
# Or just ask what's happening:
status = await handle.ask("Have you found anything under $800?")
ask, interject, pause, resume, stop. That's the interface. Every operation in the system returns one of these — from a simple contact lookup to a multi-hour task execution.
Handles nest
This is where it gets interesting. The assistant isn't one loop. It's a hierarchy of them.
The Actor receives your request and writes a Python program that calls typed primitives — await primitives.contacts.ask(...), await primitives.knowledge.update(...). Each of those calls starts its own LLM tool loop inside the relevant manager, which returns its own handle.
handle.pause()
│
▼
Actor (pauses)
├── ContactManager.ask (pauses)
│ └── inner search operation (pauses)
└── KnowledgeManager.update (pauses)
└── inner write operation (pauses)
All layers pause. Resume propagates the same way. So does stop.
You can steer a complex multi-step operation at any depth without knowing or caring about the internal structure. Pause the whole thing, or ask a specific sub-operation what it's doing.
What this actually enables
Talk to your assistant while it works. The system has a dual-brain architecture: a slow deliberation brain that sees the full picture and makes decisions, plus a fast real-time voice agent (on LiveKit) that handles the conversation at sub-second latency. They communicate over IPC. When the slow brain finishes a background task, it tells the fast brain to weave the results into whatever you're currently discussing. You never have to wait in silence.
Redirect mid-task. "Actually, don't send that email — call them instead." The interject mechanism injects new instructions into the running loop between LLM turns. If an LLM call is already in flight, it's cancelled and restarted with the interjection included. No restart, no lost context.
Run multiple things at once. The conversation manager tracks concurrent in-flight actions, each with its own steerable handle. You can say "how's the flight search going?" and it routes to the right handle's ask() method, while the other operations keep running.
Memory that doesn't reset. Every ~50 messages, a background process extracts contacts, relationships, domain knowledge, and task commitments into structured, queryable tables. After a month, the assistant has a working model of your world — not a chat log, but typed records it can filter, join, and search.
The code
The system has been in development for ~10 months. We're a YC company (Unify) and this powers our commercial product. The brain is the open-source part.
If you want to see how it works, start here:``
-
unity/common/async_tool_loop.py— theSteerableToolHandleprotocol andAsyncToolLoopHandleimplementation -
unity/common/_async_tool/loop.py— the loop engine: interjections, pausing, parallel tool execution, context compression -
ARCHITECTURE.md— the full technical walkthrough
We'd genuinely appreciate feedback — what we got right, what seems over-engineered, what you'd do differently. This is a complex system and outside perspective is valuable.
Top comments (0)