Most AI agent frameworks are chat wrappers with a loop bolted on.
They look capable in demos. They can feel impressive in short-lived workflows. But once you add retries, parallel workers, approval gates, failures, long-running tasks, and operator oversight, the whole thing starts to collapse into improvisation.
The reason is structural: many of these systems treat the conversation transcript as the coordination layer.
That is not a runtime. It is a liability.
Tandem is built around a different premise. The engine should own orchestration, state, approvals, artifacts, scheduling, replay, checkpoints, and memory access. Once workflows become parallel and durable, that shift stops being a design preference and starts becoming a requirement.
Chat is a good interface, but a weak coordination layer
There is nothing wrong with chat as a surface. It is intuitive, flexible, and useful for directing work.
The problem begins when chat becomes the authoritative system of record for execution.
When a transcript is the source of truth, concurrency becomes guesswork. There is no reliable way to let multiple agents work in parallel because neither knows, in a structured way, what the other has claimed. Failure handling turns into re-prompting and hoping. Debugging means re-reading threads. Replay is impossible. Operator visibility becomes a log scrape.
These are not unusual edge cases. They are normal conditions for any workflow that runs longer than a few moments or involves more than one worker.
That is the dividing line between a clever assistant and a serious execution platform.
What Tandem actually is
Tandem is an engine-owned workflow runtime for coordinated autonomous work.
That means the engine, not the UI, owns truth about execution. However you access the system (desktop, terminal, web, or API) you are talking to the same engine running the same execution model. There is no surface that holds state the others cannot see.
The engine owns:
- orchestration
- task state
- approvals
- artifacts
- scheduling
- replay
- checkpoints
- memory access
- policy enforcement
This matters because once workflows are parallel and long-running, you need infrastructure that can survive failure, coordinate work deterministically, and expose the same state consistently across every surface.
One runtime, multiple surfaces
A lot of agent systems end up splitting behavior across interfaces. One surface has one model, another has a different one, and the logic gradually fragments.
Tandem is being built around the opposite idea: one runtime, multiple clients.
The same engine powers:
- a desktop app for daily workflows and supervised approvals
- a TUI for terminal-native operation
- a web control panel for operations, automations, packs, and live oversight
- a headless HTTP + SSE runtime for API clients and server deployments
That means you are not rewriting the operating model for each surface. You are interacting with the same execution substrate from different environments.
The engine also exposes:
- HTTP + SSE APIs for sessions, runs, cancellation, and event streaming
- a TypeScript SDK:
@frumu/tandem-client - a Python SDK:
tandem-client - headless runtime support for server deployments, internal apps, and channel integrations
Blackboard first, not transcript first
At the center of Tandem is the idea that agents should coordinate through durable shared state, not just messages.
That shared state lives in a blackboard.
A blackboard is the engine’s shared execution map. It holds the structured state of the job: what exists, what changed, what is blocked, what is runnable, what failed, what artifacts were produced, and what decisions were made.
It is not a conversation history. It is runtime state.
Blackboard execution map
This is what allows the system to answer operational questions directly:
- Which tasks are blocked?
- Which tasks are runnable?
- Which tasks are already claimed?
- Which tasks require approval?
- Which tasks failed and should be retried?
Without a blackboard, those answers usually have to be inferred from logs or reconstructed from chat. That does not scale.
Workboards are execution, not just UI
On top of the blackboard, Tandem uses a workboard.
If the blackboard is the shared state layer, the workboard is the execution layer agents coordinate against. It is not just a Kanban view. It is the engine-owned task model that tracks state, ownership, dependencies, decisions, retries, artifacts, and reliability signals.
Agents do not ask each other in chat who wants the next task.
The board already knows.
At any moment, the board needs to track:
- which tasks are blocked and waiting on prerequisites
- which tasks are runnable and eligible for claiming
- which tasks are already claimed, with lease metadata
- which tasks require a specific role or gate
- which tasks failed and are eligible for retry
That is the baseline for safe concurrency. Without it, parallel agents trampling the same job is not a matter of if. It is a matter of when.
How claims and task transitions work
When a task becomes runnable, an agent claims it. That claim writes ownership and a lease into shared state. No other worker can take that same task unless the lease expires, the claim is released, or policy transitions it.
This is optimistic concurrency applied to workflow execution.
Role-aware routing goes further. Tasks can express an intent such as memory, builder, review, or test-gate, and the runtime routes them to the appropriate agent class. Not every worker is interchangeable, and the board should encode that rather than leaving it buried inside a prompt.
When a task fails, it moves into a failed or rework state, increments a retry counter, and becomes available for clean pickup later. The board records what happened, which then feeds into auditability, replay, and memory.
Claim and transition lifecycle
A fifty-task board in practice
Imagine a coding mission with fifty tasks.
Some are runnable immediately. Some are blocked by prerequisites. Some require approval. Some are intended for specific roles like memory retrieval, triage, implementation, review, or testing.
Ten agents can work on that board at once, but only tasks that are actually runnable should be claimable.
As tasks complete:
- blocked dependents are re-evaluated
- newly unblocked work becomes runnable
- free agents claim new tasks
- retries are tracked cleanly
- the board keeps a revisioned record of what happened
That is parallel execution with structure.
A simplified coding flow might look like this:
| Step | Task | Waits on |
|---|---|---|
| 1 | Inspect issue | — |
| 2 | Retrieve memory | 1 |
| 3 | Inspect repo structure | 1 |
| 4 | Reproduce bug | 1, 2, 3 |
| 5 | Identify duplicate issues | 1 |
| 6 | Review prior fixes | 1 |
| 7 | Draft triage summary | 1 |
| 8 | Write failing test | 4 |
| 9 | Propose patch | 4, 5, 6 |
| 10 | Validate patch | 8, 9 |
| 11 | Prepare final review artifact | 10 |
At runtime, triage can claim task 1 first. Once that completes, memory retrieval, repo inspection, duplicate checking, prior-fix review, and triage summary can branch from it. Reproduction cannot start until the required upstream tasks are done. Patch work cannot even become runnable until reproduction and prior analysis are complete.
No one has to ask in chat who owns task 9. The board already knows.
Engine truth, not transcript inference
For production coordination, the engine must own truth.
That means:
- structured event history instead of log scraping
- materialized run state instead of state inferred from a thread
- checkpoints and replay so you can rewind execution rather than restart blindly
- decision lineage so you know why something was routed or blocked
- deterministic task-state projection so every client sees the same reality
Without those primitives, debugging multi-agent workflows becomes archaeology.
With them, the system becomes operable.
Concurrency that actually holds together
A lot of conversational delegation systems suffer from concurrency blindness. They can delegate in theory, but once multiple workers are active, the coordination model becomes fuzzy.
Tandem is built around the opposite approach:
- explicit task claims
- revisioned shared state
- blackboard patch streams
- deterministic task transitions
- isolated execution paths
- replayable run history
These are not just implementation details. They are what make concurrent autonomous work tractable.
Browser automation belongs inside the runtime
Serious workflows often need the web.
That is why Tandem includes browser automation as part of the same engine-owned model as local files, APIs, and operator actions. It is not bolted on, and it does not live in a separate lane. It participates in the same coordination model, artifact flow, and runtime state as everything else.
Tandem also includes a custom web fetch tool that converts raw HTML into Markdown before handing it to the LLM. That makes web content easier for the model to work with and reduces a large amount of unnecessary markup and noise. In practice, that can cut token usage dramatically, in some cases by as much as 80%.
This works wherever the engine runs. On a headless Linux server with Chromium installed, such as a VPS, CI runner, or Docker container, browser tasks can execute without a display environment and without a GUI. The same workflows you build locally can run the same way remotely.
Tandem Coder is where this becomes especially obvious
One of the clearest places this model matters is coding workflows.
The next major slice is Tandem Coder: a memory-aware coding agent that runs inside the engine rather than as a frontend-owned feature layer.
It is being built on top of:
- context runs
- blackboard state
- artifacts
- approvals
- GitHub MCP
- engine memory
The near-term roadmap includes:
- coder run contracts and artifact taxonomy
- deterministic memory retrieval for issue triage
- memory-aware issue triage workflows
- failure-fingerprint memory candidates and duplicate detection
- a developer-mode run viewer with kanban projection
The goal is for coder workflows to learn from prior failures, fixes, and reviews by reusing engine memory, not by re-reading chat history and pretending that is durable context.
Where Tandem fits relative to assistant-first systems
Assistant-first systems are usually optimized for speed of setup, chat interaction, and personal productivity.
That is a valid design center, and it solves real problems.
Tandem is aimed at a different layer. It is not trying to be a better personal assistant. It is being built as orchestration infrastructure for cases where you need:
- durable shared task state
- parallel execution
- replay and checkpoints
- engine-owned truth
- structured artifacts
- approvals and policy gates
- memory-aware workflows
- multiple clients on the same runtime
- a headless platform that other tools can build on
That is a different problem, and it leads to different architecture choices.
Why this category needs better foundations
Too many AI agent systems still rely on improvisation once real complexity shows up.
They look smart on the happy path, then become fragile when you add concurrency, failures, approvals, long-lived workflows, or operator oversight.
If autonomous systems are going to do serious work, they need stronger primitives underneath them:
- blackboards and workboards
- task claiming
- optimistic concurrency
- checkpoints and replay
- engine-owned state
- reusable memory
- structured artifacts
- deterministic workflow control
- policy-gated mutation paths
- operator-grade visibility across clients
- stable platform APIs
That is what Tandem is being built around.
Not as a better chatbot.
As infrastructure for coordinated autonomous work.
Getting started
Desktop app:
https://tandem.frumu.ai/
Web control panel:
npm i -g @frumu/tandem-panel
tandem-control-panel --init
tandem-control-panel
Open http://127.0.0.1:39732.
Engine and TUI (WIP):
npm install -g @frumu/tandem @frumu/tandem-tui
The TUI is still work in progress. Start with:
tandem-tui
If it does not attach or bootstrap cleanly in your environment, run the engine manually and retry:
tandem-engine serve --hostname 127.0.0.1 --port 39731
tandem-tui
If engine API token auth is enabled, set the same token in your environment before launching TUI.
For setup and troubleshooting help, use the Tandem docs: https://tandem.docs.frumu.ai/.




Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.