Batty

Posted on Apr 5

How File-Based Architecture Makes AI Agents Debuggable

#ai #architecture #devtools #tutorial

When an AI agent does something wrong — and it will — you need to answer two questions fast: what happened, and why?

If your agent state lives in a database, the answer requires a SQL client, the right query, and knowledge of the schema. If it lives in an API, you need auth tokens, endpoint documentation, and a way to correlate events across services.

If it lives in files, the answer is ls and cat.

The Debugging Tax

Every layer of abstraction between you and the agent's state is a debugging tax. Each layer adds latency to your investigation:

Architecture	To see what happened	Time to first insight
Database (SQLite/Postgres)	Open client, write query, parse results	2-5 minutes
API-based state	Authenticate, find endpoint, decode response	3-10 minutes
File-based state	`ls .batty/inboxes/eng-1-1/new/`	5 seconds

At 2am when an agent has been looping for an hour, those minutes matter. File-based state gives you instant visibility with tools you already know.

What File-Based Looks Like

Batty stores every piece of agent state as a file:

.batty/
  team_config/
    team.yaml              # Who does what, who talks to whom
    prompts/               # Per-role instruction files
  kanban/
    board/tasks/           # Each task is a Markdown file
  inboxes/
    eng-1-1/
      new/                 # Undelivered messages
      cur/                 # Delivered messages
      tmp/                 # Atomic write staging
    architect/
      new/
      cur/
  worktrees/
    eng-1-1/               # Full git worktree per engineer
  logs/
    events.jsonl           # Every event, one JSON object per line

No database. No hidden state. Every piece of the system is a file you can read with standard Unix tools.

Four Formats, Four Purposes

YAML Config — Who Does What

roles:
  - name: engineer
    agent: claude
    instances: 3
    talks_to: [manager]
    use_worktrees: true

YAML is human-readable configuration. You edit it in your editor, validate it at startup, and git diff it to see what changed. Configuration doesn't change during a session — it's the rules, not the state.

Markdown Kanban — What's Happening

Each task is a Markdown file with YAML frontmatter:

---
id: 27
status: in-progress
assigned_to: eng-1-1
---
# Add JWT authentication
Implement JWT middleware for protected routes.

Want to see all in-progress tasks? grep -l "status: in-progress" board/tasks/*.md. Want to see what changed? git diff board/. Want to edit a task while the daemon is running? Open the file in vim.

Maildir Inboxes — Who Said What

Messages between agents use the Maildir protocol — the same format email servers have used since 1995:

inboxes/eng-1-1/new/   → Messages waiting to be delivered
inboxes/eng-1-1/cur/   → Messages already delivered
inboxes/eng-1-1/tmp/   → Messages being written (atomic staging)

Each message is a JSON file: sender, recipient, body, timestamp. Delivery is atomic — write to tmp/, rename to new/. No partial writes, no corruption, no WAL.

Debugging a delivery failure:

# What messages are stuck?
ls inboxes/eng-1-1/new/

# What does the stuck message say?
cat inboxes/eng-1-1/new/1711108200.msg

# Who sent it?
cat inboxes/eng-1-1/new/1711108200.msg | jq .from

Compare this to debugging a message queue: connect to the broker, navigate the admin UI, find the right queue, decode the message format. With Maildir, it's cat.

JSONL Logs — What Happened When

Every significant event is appended to events.jsonl:

{"ts":1711108200,"event":"task_assigned","engineer":"eng-1-1","task_id":27}
{"ts":1711108890,"event":"test_executed","task_id":27,"passed":false}
{"ts":1711108950,"event":"message_delivered","from":"batty","to":"eng-1-1"}
{"ts":1711109400,"event":"test_executed","task_id":27,"passed":true}
{"ts":1711109405,"event":"merge","source":"eng-1-1/task-27","target":"main"}

One JSON object per line. Append-only. grep-able. jq-able.

# Which tasks failed tests?
cat events.jsonl | jq 'select(.event == "test_executed" and .passed == false)'

# Average time from assignment to completion?
cat events.jsonl | jq 'select(.event == "task_assigned" or .event == "task_completed")'

# Which engineer fails tests most often?
cat events.jsonl | jq -r 'select(.event == "test_executed" and .passed == false) | .engineer' | sort | uniq -c

No Grafana dashboard. No log aggregation service. Just jq.

Why Not a Database?

SQLite would work. It's fast, embedded, and well-understood. But it adds three problems:

Opaque state. You can't cat a SQLite database. You need a client and a query. When something breaks, the first step is figuring out how to inspect the state — not inspecting it.
Merge complexity. Git can't meaningfully diff a binary database file. With file-based state, git diff shows you exactly what changed between two points in time.
Recovery complexity. If the daemon crashes mid-write, a database might need WAL recovery. With Maildir, the atomic rename protocol means messages are either fully written or not written at all. No recovery logic needed.

The tradeoff: files don't scale to millions of records. But an agent supervisor manages 5-20 agents with 50-100 tasks. At that scale, files are faster to inspect and equally fast to read.

The Compound Effect

When everything is a file:

Backups are cp -r .batty/ /backup/
Version control is git add .batty/ && git commit
Debugging is ls and cat
Monitoring is watch ls inboxes/*/new/
Migration is copying a directory
Testing is seeding files and checking results

No client libraries. No connection strings. No schema migrations. No ORM. The filesystem is the API.

When Files Don't Work

File-based architecture has real limitations:

Concurrent writes from multiple machines — files assume a single host. For distributed agents, you need a coordination layer.
Complex queries — "show me all tasks assigned to eng-1-1 that failed tests in the last hour" is easier in SQL than with grep and jq.
High-volume events — JSONL works for hundreds of events per session. For millions, you need a proper time-series database.

For a single-host agent supervisor managing 5-20 agents? Files are the right abstraction. They're not clever. They're debuggable.

Try it: cargo install batty-cli — GitHub | Demo

DEV Community