Evan Green

Posted on Mar 9

Most AI Agent Frameworks Treat Chat as a Runtime. That’s the Problem

#ai #automation #agents #orchestration

Most AI agent frameworks are chat wrappers with a loop bolted on.

They look capable in demos. They can feel impressive in short-lived workflows. But once you add retries, parallel workers, approval gates, failures, long-running tasks, and operator oversight, the whole thing starts to collapse into improvisation.

The reason is structural: many of these systems treat the conversation transcript as the coordination layer.

That is not a runtime. It is a liability.

Tandem is built around a different premise. The engine should own orchestration, state, approvals, artifacts, scheduling, replay, checkpoints, and memory access. Once workflows become parallel and durable, that shift stops being a design preference and starts becoming a requirement.

Chat is a good interface, but a weak coordination layer

There is nothing wrong with chat as a surface. It is intuitive, flexible, and useful for directing work.

The problem begins when chat becomes the authoritative system of record for execution.

When a transcript is the source of truth, concurrency becomes guesswork. There is no reliable way to let multiple agents work in parallel because neither knows, in a structured way, what the other has claimed. Failure handling turns into re-prompting and hoping. Debugging means re-reading threads. Replay is impossible. Operator visibility becomes a log scrape.

These are not unusual edge cases. They are normal conditions for any workflow that runs longer than a few moments or involves more than one worker.

That is the dividing line between a clever assistant and a serious execution platform.

What Tandem actually is

Tandem is an engine-owned workflow runtime for coordinated autonomous work.

That means the engine, not the UI, owns truth about execution. However you access the system (desktop, terminal, web, or API) you are talking to the same engine running the same execution model. There is no surface that holds state the others cannot see.

The engine owns:

orchestration
task state
approvals
artifacts
scheduling
replay
checkpoints
memory access
policy enforcement

This matters because once workflows are parallel and long-running, you need infrastructure that can survive failure, coordinate work deterministically, and expose the same state consistently across every surface.

One runtime, multiple surfaces

A lot of agent systems end up splitting behavior across interfaces. One surface has one model, another has a different one, and the logic gradually fragments.

Tandem is being built around the opposite idea: one runtime, multiple clients.

The same engine powers:

a desktop app for daily workflows and supervised approvals
a TUI for terminal-native operation
a web control panel for operations, automations, packs, and live oversight
a headless HTTP + SSE runtime for API clients and server deployments

That means you are not rewriting the operating model for each surface. You are interacting with the same execution substrate from different environments.

The engine also exposes:

HTTP + SSE APIs for sessions, runs, cancellation, and event streaming
a TypeScript SDK: @frumu/tandem-client
a Python SDK: tandem-client
headless runtime support for server deployments, internal apps, and channel integrations

Blackboard first, not transcript first

At the center of Tandem is the idea that agents should coordinate through durable shared state, not just messages.

That shared state lives in a blackboard.

A blackboard is the engine’s shared execution map. It holds the structured state of the job: what exists, what changed, what is blocked, what is runnable, what failed, what artifacts were produced, and what decisions were made.

It is not a conversation history. It is runtime state.

Blackboard execution map

This is what allows the system to answer operational questions directly:

Which tasks are blocked?
Which tasks are runnable?
Which tasks are already claimed?
Which tasks require approval?
Which tasks failed and should be retried?

Without a blackboard, those answers usually have to be inferred from logs or reconstructed from chat. That does not scale.

Workboards are execution, not just UI

On top of the blackboard, Tandem uses a workboard.

If the blackboard is the shared state layer, the workboard is the execution layer agents coordinate against. It is not just a Kanban view. It is the engine-owned task model that tracks state, ownership, dependencies, decisions, retries, artifacts, and reliability signals.

Agents do not ask each other in chat who wants the next task.

The board already knows.

At any moment, the board needs to track:

which tasks are blocked and waiting on prerequisites
which tasks are runnable and eligible for claiming
which tasks are already claimed, with lease metadata
which tasks require a specific role or gate
which tasks failed and are eligible for retry

That is the baseline for safe concurrency. Without it, parallel agents trampling the same job is not a matter of if. It is a matter of when.

How claims and task transitions work

When a task becomes runnable, an agent claims it. That claim writes ownership and a lease into shared state. No other worker can take that same task unless the lease expires, the claim is released, or policy transitions it.

This is optimistic concurrency applied to workflow execution.

Role-aware routing goes further. Tasks can express an intent such as memory, builder, review, or test-gate, and the runtime routes them to the appropriate agent class. Not every worker is interchangeable, and the board should encode that rather than leaving it buried inside a prompt.

When a task fails, it moves into a failed or rework state, increments a retry counter, and becomes available for clean pickup later. The board records what happened, which then feeds into auditability, replay, and memory.

Claim and transition lifecycle

A fifty-task board in practice

Imagine a coding mission with fifty tasks.

Some are runnable immediately. Some are blocked by prerequisites. Some require approval. Some are intended for specific roles like memory retrieval, triage, implementation, review, or testing.

Ten agents can work on that board at once, but only tasks that are actually runnable should be claimable.

As tasks complete:

blocked dependents are re-evaluated
newly unblocked work becomes runnable
free agents claim new tasks
retries are tracked cleanly
the board keeps a revisioned record of what happened

That is parallel execution with structure.

A simplified coding flow might look like this:

Step	Task	Waits on
1	Inspect issue	—
2	Retrieve memory	1
3	Inspect repo structure	1
4	Reproduce bug	1, 2, 3
5	Identify duplicate issues	1
6	Review prior fixes	1
7	Draft triage summary	1
8	Write failing test	4
9	Propose patch	4, 5, 6
10	Validate patch	8, 9
11	Prepare final review artifact	10

At runtime, triage can claim task 1 first. Once that completes, memory retrieval, repo inspection, duplicate checking, prior-fix review, and triage summary can branch from it. Reproduction cannot start until the required upstream tasks are done. Patch work cannot even become runnable until reproduction and prior analysis are complete.

No one has to ask in chat who owns task 9. The board already knows.

Engine truth, not transcript inference

For production coordination, the engine must own truth.

That means:

structured event history instead of log scraping
materialized run state instead of state inferred from a thread
checkpoints and replay so you can rewind execution rather than restart blindly
decision lineage so you know why something was routed or blocked
deterministic task-state projection so every client sees the same reality

Without those primitives, debugging multi-agent workflows becomes archaeology.

With them, the system becomes operable.

Concurrency that actually holds together

A lot of conversational delegation systems suffer from concurrency blindness. They can delegate in theory, but once multiple workers are active, the coordination model becomes fuzzy.

Tandem is built around the opposite approach:

explicit task claims
revisioned shared state
blackboard patch streams
deterministic task transitions
isolated execution paths
replayable run history

These are not just implementation details. They are what make concurrent autonomous work tractable.

Browser automation belongs inside the runtime

Serious workflows often need the web.

That is why Tandem includes browser automation as part of the same engine-owned model as local files, APIs, and operator actions. It is not bolted on, and it does not live in a separate lane. It participates in the same coordination model, artifact flow, and runtime state as everything else.

Tandem also includes a custom web fetch tool that converts raw HTML into Markdown before handing it to the LLM. That makes web content easier for the model to work with and reduces a large amount of unnecessary markup and noise. In practice, that can cut token usage dramatically, in some cases by as much as 80%.

This works wherever the engine runs. On a headless Linux server with Chromium installed, such as a VPS, CI runner, or Docker container, browser tasks can execute without a display environment and without a GUI. The same workflows you build locally can run the same way remotely.

Tandem Coder is where this becomes especially obvious

One of the clearest places this model matters is coding workflows.

The next major slice is Tandem Coder: a memory-aware coding agent that runs inside the engine rather than as a frontend-owned feature layer.

It is being built on top of:

context runs
blackboard state
artifacts
approvals
GitHub MCP
engine memory

The near-term roadmap includes:

coder run contracts and artifact taxonomy
deterministic memory retrieval for issue triage
memory-aware issue triage workflows
failure-fingerprint memory candidates and duplicate detection
a developer-mode run viewer with kanban projection

The goal is for coder workflows to learn from prior failures, fixes, and reviews by reusing engine memory, not by re-reading chat history and pretending that is durable context.

Where Tandem fits relative to assistant-first systems

Assistant-first systems are usually optimized for speed of setup, chat interaction, and personal productivity.

That is a valid design center, and it solves real problems.

Tandem is aimed at a different layer. It is not trying to be a better personal assistant. It is being built as orchestration infrastructure for cases where you need:

durable shared task state
parallel execution
replay and checkpoints
engine-owned truth
structured artifacts
approvals and policy gates
memory-aware workflows
multiple clients on the same runtime
a headless platform that other tools can build on

That is a different problem, and it leads to different architecture choices.

Why this category needs better foundations

Too many AI agent systems still rely on improvisation once real complexity shows up.

They look smart on the happy path, then become fragile when you add concurrency, failures, approvals, long-lived workflows, or operator oversight.

If autonomous systems are going to do serious work, they need stronger primitives underneath them:

blackboards and workboards
task claiming
optimistic concurrency
checkpoints and replay
engine-owned state
reusable memory
structured artifacts
deterministic workflow control
policy-gated mutation paths
operator-grade visibility across clients
stable platform APIs

That is what Tandem is being built around.

Not as a better chatbot.

As infrastructure for coordinated autonomous work.

Getting started

Desktop app:

https://tandem.frumu.ai/

Web control panel:

npm i -g @frumu/tandem-panel
tandem-control-panel --init
tandem-control-panel

Open http://127.0.0.1:39732.

Engine and TUI (WIP):

npm install -g @frumu/tandem @frumu/tandem-tui

The TUI is still work in progress. Start with:

tandem-tui

If it does not attach or bootstrap cleanly in your environment, run the engine manually and retry:

tandem-engine serve --hostname 127.0.0.1 --port 39731
tandem-tui

If engine API token auth is enabled, set the same token in your environment before launching TUI.

For setup and troubleshooting help, use the Tandem docs: https://tandem.docs.frumu.ai/.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.