Batty

Posted on Mar 24

I Let AI Agents Manage Themselves with a Markdown File

#ai #opensource #devtools #productivity

Everyone's building sophisticated agent coordination protocols — shared memory, peer messaging, event buses, consensus algorithms. I tried all of them.

Then I replaced them with a Markdown file. A checklist. And it works better than everything I tried before.

Here's why.

The Coordination Problem

When you run multiple AI coding agents in parallel, they need to answer three questions:

What should I work on? (task discovery)
Is anyone else already doing this? (conflict avoidance)
What's done, what's in progress, what's left? (status visibility)

Most multi-agent frameworks solve this with peer messaging. Agents send each other status updates, negotiate task assignments, and maintain shared context. It works, but the cost is real.

With peer-to-peer coordination among N agents, you have up to N×(N-1) communication channels. Five agents means 20 potential message paths. Each message costs tokens. Each coordination round adds latency. CooperBench found that agents achieve ~50% lower success rates when collaborating versus working solo — the overhead of coordination often outweighs the benefit.

There's a simpler approach.

The Markdown Kanban Board

A task file with YAML frontmatter:

id: 27
title: Add JWT auth endpoint
status: backlog
priority: high
claimed_by:
depends_on: [12]
tags: [api, auth]

Create POST /api/auth/login endpoint accepting {email, password}.
Validate email format, hash password with bcrypt.
Return 201 with JWT token on success, 401 on failure.
Write tests for: success case, invalid email, wrong password, duplicate login.

That's the entire coordination protocol for one task. An agent reads the file, understands the work, and knows whether it's available (no claimed_by), blocked (has unfinished depends_on), or already taken.

Tasks live in a directory:

board/
  tasks/
    012-set-up-database-migrations.md      # status: done
    027-add-jwt-auth-endpoint.md           # status: backlog
    028-implement-user-registration.md     # status: in-progress, claimed_by: eng-1-2
    029-add-rate-limiting.md               # status: backlog

No database. No message queue. No service to keep running. Just files in a directory, tracked by git.

Why Agents Understand This Natively

Here's the thing about LLMs: they already understand Markdown. Perfectly. You don't need a custom serialization format, a protocol buffer definition, or an API client.

When an agent reads a task file, it parses the YAML frontmatter, understands the priority system, reads the Markdown description, and knows exactly what to do. Zero prompt engineering for the format itself — all your prompting effort goes into describing the work, not the protocol.

Compare this to giving an agent access to a coordination API:

# The API approach — agent needs to understand the SDK
from coordinator import TaskQueue
queue = TaskQueue(host="localhost:8080")
task = queue.claim(agent_id="eng-1", priority="high")
queue.update_status(task.id, "in_progress")
# ... agent works ...
queue.update_status(task.id, "done")

versus:

# The file approach — agent reads a file it already understands
Status: backlog → in-progress
Claimed by: eng-1-2

The file approach has zero integration overhead. The agent reads text. It writes text. That's what it's already best at.

How Task Claiming Works

The one hard problem with file-based coordination: what happens when two agents try to claim the same task simultaneously?

File locking. The same primitive your operating system uses for everything else.

# Agent claims a task (atomic operation)
kanban-md pick --claim eng-1-2 --move in-progress

Under the hood:

Acquire a file lock on the board directory
Scan for unclaimed tasks matching the agent's criteria
Pick the highest-priority unclaimed task with satisfied dependencies
Set claimed_by: eng-1-2 and status: in-progress in one atomic write
Release the lock

If another agent tries to claim at the same time, it gets a lock error and retries. The OS handles the queueing. No distributed consensus. No Raft protocol. Just a lock file.

The retry logic is simple:

fn is_transient_error(stderr: &str) -> bool {
    let lower = stderr.to_ascii_lowercase();
    lower.contains("lock")
        || lower.contains("resource temporarily unavailable")
        || lower.contains("try again")
}

// If transient, retry with backoff. If permanent, fail.

Fifty agents picking tasks becomes fifty retries on a file lock. The operating system handles the contention. Linear scaling, no global state server, no bottleneck.

The Task Lifecycle

A task moves through these states:

backlog → todo → in-progress → review → done
                      ↓
                   blocked

Each transition is a single field change in the YAML frontmatter:

# Task gets created
status: backlog
claimed_by:

# Manager prioritizes it
status: todo
claimed_by:

# Engineer claims it
status: in-progress
claimed_by: eng-1-2
claimed_at: 2026-03-22T14:30:00Z

# Tests pass, ready for review
status: review
tests_run: true
tests_passed: true
branch: eng-1-2/task-27
commit: abc1234

# Approved and merged
status: done

Every state change is a file edit. Every file edit is a git commit. You get a complete audit trail for free:

git log --oneline board/tasks/027-add-jwt-auth-endpoint.md
# a1b2c3d Set status: done, merged to main
# d4e5f6g Set status: review, tests passed
# g7h8i9j Set status: in-progress, claimed by eng-1-2
# j0k1l2m Created task: Add JWT auth endpoint

No event sourcing framework. No audit log database. Just git log.

What Makes This Better Than Message Passing

I ran both approaches. Here's the concrete comparison:

Token cost

Message passing: Each agent broadcasts status updates to every other agent. With 5 agents and 10 tasks, that's potentially hundreds of messages per work session. Each message consumes tokens — both to send and to process on the receiving end.

Kanban board: Each agent reads one directory listing to find available tasks. One file read per task claim. O(1) per agent per task cycle, regardless of how many other agents exist.

Failure recovery

Message passing: If an agent crashes mid-conversation, the coordination state is in the dead agent's context window. Other agents don't know what happened. You need heartbeats, timeouts, and state reconstruction.

Kanban board: If an agent crashes, its claimed tasks still show status: in-progress, claimed_by: eng-1-2 in the file. The daemon detects the dead agent, unclaims the task, and another agent picks it up. The board is the state — no reconstruction needed.

Human intervention

Message passing: To reprioritize a task or redirect an agent, you need to inject a message into the coordination channel. You need to understand the protocol.

Kanban board: Open the file in your editor. Change priority: medium to priority: critical. Move status: in-progress back to status: todo and clear claimed_by. Save. The next agent poll picks it up. You can vim your way into any coordination decision.

Debugging

Message passing: "Why did Agent 3 work on the wrong task?" requires tracing message history, examining agent context, reconstructing the decision chain.

Kanban board: git diff HEAD~5 board/tasks/ shows you exactly what changed, when, and by whom. The state transitions are in the file history. git blame tells you which agent claimed what.

Dependency tracking

# Task 28 depends on Task 12
id: 28
title: Add user registration endpoint
status: backlog
depends_on: [12]

The dispatch logic is straightforward:

// Only pick tasks whose dependencies are all done
.filter(|task| {
    task.depends_on.iter().all(|dep_id| {
        task_status_by_id
            .get(dep_id)
            .is_none_or(|status| status == "done")
    })
})

No dependency graph service. No topological sort at runtime. Just a list of IDs and a filter.

When This Breaks Down

The Markdown kanban approach has real limits. Being honest about them:

Real-time dependencies. If Agent A is building an API and Agent B needs to call it while A is still building it, the board can't help. B needs to know A's endpoint signatures before A is done. This requires actual communication — not file-based coordination.

The fix: decompose the work so that the interface definition is a separate, first task. A defines the interface, commits it, marks it done. B's task depends on A's interface task. The board handles the sequencing; the agents don't need to talk.

Very high throughput. With 50+ agents claiming tasks simultaneously, file lock contention becomes noticeable. Not unworkable — the OS handles queueing — but you'll see occasional retry delays. For most teams (3-10 agents), this is invisible.

Complex negotiation. "Agent A, can you refactor this differently so my module can use it?" requires a conversation. A kanban board doesn't do conversations. If your tasks genuinely need real-time negotiation, you need a messaging layer on top.

The honest take: About 80% of parallel coding tasks are independent enough for pure kanban coordination. The remaining 20% need something more. The mistake is building the 20% solution for 100% of your tasks.

Building It Yourself

The minimal version is embarrassingly simple:

# Create a task (anyone — human or agent)
cat > board/tasks/030-add-logout-endpoint.md << 'EOF'
id: 30
title: Add logout endpoint
status: todo
priority: medium
claimed_by:

Create POST /api/auth/logout that invalidates the JWT token.
Return 200 on success. Write test for valid and expired tokens.
EOF

# Claim a task (agent's script)
# In production, use file locking — this is the simplified version
sed -i '' 's/claimed_by:/claimed_by: eng-1/' board/tasks/030-*.md
sed -i '' 's/status: todo/status: in-progress/' board/tasks/030-*.md

# Find unclaimed work
grep -l "claimed_by:$" board/tasks/*.md | head -1

# Mark done
sed -i '' 's/status: in-progress/status: done/' board/tasks/030-*.md

For production use, you want proper file locking. The kanban-md crate handles this:

cargo install kanban-md --locked

# Initialize a board
kanban-md init

# Create a task
kanban-md create "Add logout endpoint" \
    --body "POST /api/auth/logout, invalidate JWT, return 200" \
    --priority medium

# Claim next available task (atomic: lock → find → claim → unlock)
kanban-md pick --claim eng-1-2 --move in-progress

# Move to done
kanban-md move 30 done

And if you want the full orchestration — agents reading the board, claiming tasks, working in worktrees, test gating, merge sequencing — Batty wraps kanban-md with a daemon that runs the entire loop:

cargo install batty-cli
batty init --template simple
batty start --attach

The daemon polls the board, dispatches to available engineers, runs tests on completion, and merges on success. You watch it in tmux. The board is still just files — you can cat, grep, git diff the entire team's state at any time.

The Deeper Point

The reason a Markdown file works better than a sophisticated coordination protocol isn't that Markdown is magic. It's that most parallel coding tasks don't need coordination at all.

They need:

A list of what to do (the board)
A way to avoid duplicate work (file locking on claims)
A way to verify the work (test suite)
A way to merge the results (git)

All of these are solved problems. File locking is an OS primitive. Git merge is a git primitive. Test suites are a project primitive. The board is just a directory of Markdown files.

The complexity in most multi-agent frameworks isn't solving the coordination problem — it's solving a coordination problem that doesn't exist for the majority of tasks. Simple beats clever when the simple solution actually fits the problem.

What's your coordination approach? If you're running parallel agents, I'm curious whether you've hit the point where file-based coordination breaks down, or whether you're still fighting with message-passing overhead. The boundary between the two is genuinely interesting.

Batty is open source, built in Rust, and published on crates.io. GitHub: github.com/battysh/batty

Top comments (1)

Apex Stack • Mar 24

The N×(N-1) framing is the key insight that most multi-agent tutorials skip entirely. They show you the happy path where agents hand off cleanly, but never model what happens when coordination itself becomes the bottleneck.

I run a fleet of scheduled agents for a side project — a niche scout that finds product ideas on Wednesdays, a builder that packages them into files on Thursday, an article publisher that promotes them Tuesday/Friday, and a weekly review that reads across all of them. The coordination layer is exactly what you describe: a flat markdown pipeline tracker. Each agent reads the file, checks stage status, does its job, and writes back. No inter-agent messaging, no shared runtime, no consensus round.

The one failure mode I've hit with the markdown approach: agents writing optimistically (assuming their write succeeded) without re-reading to confirm state before moving on. If two agents run close together and both read the same "SCOUTED" status, both can try to move it to "BUILDING" simultaneously. Worth building a simple lock check (re-read the file before committing a state transition) even if it feels paranoid. The file is truth — treat reads as cheap and always verify before writing.