When an AI agent does something wrong — and it will — you need to answer two questions fast: what happened, and why?
If your agent state lives in a database, the answer requires a SQL client, the right query, and knowledge of the schema. If it lives in an API, you need auth tokens, endpoint documentation, and a way to correlate events across services.
If it lives in files, the answer is ls and cat.
The Debugging Tax
Every layer of abstraction between you and the agent's state is a debugging tax. Each layer adds latency to your investigation:
| Architecture | To see what happened | Time to first insight |
|---|---|---|
| Database (SQLite/Postgres) | Open client, write query, parse results | 2-5 minutes |
| API-based state | Authenticate, find endpoint, decode response | 3-10 minutes |
| File-based state | ls .batty/inboxes/eng-1-1/new/ |
5 seconds |
At 2am when an agent has been looping for an hour, those minutes matter. File-based state gives you instant visibility with tools you already know.
What File-Based Looks Like
Batty stores every piece of agent state as a file:
.batty/
team_config/
team.yaml # Who does what, who talks to whom
prompts/ # Per-role instruction files
kanban/
board/tasks/ # Each task is a Markdown file
inboxes/
eng-1-1/
new/ # Undelivered messages
cur/ # Delivered messages
tmp/ # Atomic write staging
architect/
new/
cur/
worktrees/
eng-1-1/ # Full git worktree per engineer
logs/
events.jsonl # Every event, one JSON object per line
No database. No hidden state. Every piece of the system is a file you can read with standard Unix tools.
Four Formats, Four Purposes
YAML Config — Who Does What
roles:
- name: engineer
agent: claude
instances: 3
talks_to: [manager]
use_worktrees: true
YAML is human-readable configuration. You edit it in your editor, validate it at startup, and git diff it to see what changed. Configuration doesn't change during a session — it's the rules, not the state.
Markdown Kanban — What's Happening
Each task is a Markdown file with YAML frontmatter:
---
id: 27
status: in-progress
assigned_to: eng-1-1
---
# Add JWT authentication
Implement JWT middleware for protected routes.
Want to see all in-progress tasks? grep -l "status: in-progress" board/tasks/*.md. Want to see what changed? git diff board/. Want to edit a task while the daemon is running? Open the file in vim.
Maildir Inboxes — Who Said What
Messages between agents use the Maildir protocol — the same format email servers have used since 1995:
inboxes/eng-1-1/new/ → Messages waiting to be delivered
inboxes/eng-1-1/cur/ → Messages already delivered
inboxes/eng-1-1/tmp/ → Messages being written (atomic staging)
Each message is a JSON file: sender, recipient, body, timestamp. Delivery is atomic — write to tmp/, rename to new/. No partial writes, no corruption, no WAL.
Debugging a delivery failure:
# What messages are stuck?
ls inboxes/eng-1-1/new/
# What does the stuck message say?
cat inboxes/eng-1-1/new/1711108200.msg
# Who sent it?
cat inboxes/eng-1-1/new/1711108200.msg | jq .from
Compare this to debugging a message queue: connect to the broker, navigate the admin UI, find the right queue, decode the message format. With Maildir, it's cat.
JSONL Logs — What Happened When
Every significant event is appended to events.jsonl:
{"ts":1711108200,"event":"task_assigned","engineer":"eng-1-1","task_id":27}
{"ts":1711108890,"event":"test_executed","task_id":27,"passed":false}
{"ts":1711108950,"event":"message_delivered","from":"batty","to":"eng-1-1"}
{"ts":1711109400,"event":"test_executed","task_id":27,"passed":true}
{"ts":1711109405,"event":"merge","source":"eng-1-1/task-27","target":"main"}
One JSON object per line. Append-only. grep-able. jq-able.
# Which tasks failed tests?
cat events.jsonl | jq 'select(.event == "test_executed" and .passed == false)'
# Average time from assignment to completion?
cat events.jsonl | jq 'select(.event == "task_assigned" or .event == "task_completed")'
# Which engineer fails tests most often?
cat events.jsonl | jq -r 'select(.event == "test_executed" and .passed == false) | .engineer' | sort | uniq -c
No Grafana dashboard. No log aggregation service. Just jq.
Why Not a Database?
SQLite would work. It's fast, embedded, and well-understood. But it adds three problems:
Opaque state. You can't
cata SQLite database. You need a client and a query. When something breaks, the first step is figuring out how to inspect the state — not inspecting it.Merge complexity. Git can't meaningfully diff a binary database file. With file-based state,
git diffshows you exactly what changed between two points in time.Recovery complexity. If the daemon crashes mid-write, a database might need WAL recovery. With Maildir, the atomic rename protocol means messages are either fully written or not written at all. No recovery logic needed.
The tradeoff: files don't scale to millions of records. But an agent supervisor manages 5-20 agents with 50-100 tasks. At that scale, files are faster to inspect and equally fast to read.
The Compound Effect
When everything is a file:
-
Backups are
cp -r .batty/ /backup/ -
Version control is
git add .batty/ && git commit -
Debugging is
lsandcat -
Monitoring is
watch ls inboxes/*/new/ - Migration is copying a directory
- Testing is seeding files and checking results
No client libraries. No connection strings. No schema migrations. No ORM. The filesystem is the API.
When Files Don't Work
File-based architecture has real limitations:
- Concurrent writes from multiple machines — files assume a single host. For distributed agents, you need a coordination layer.
-
Complex queries — "show me all tasks assigned to eng-1-1 that failed tests in the last hour" is easier in SQL than with
grepandjq. - High-volume events — JSONL works for hundreds of events per session. For millions, you need a proper time-series database.
For a single-host agent supervisor managing 5-20 agents? Files are the right abstraction. They're not clever. They're debuggable.
Top comments (0)