Helge Sverre

Posted on Mar 2 • Originally published at helgesver.re on Feb 24

Building Token: A Rust Text Editor with AI Agents

#agents #ai #rust #softwaredevelopment

Token is a text editor written in Rust. Multi-cursor editing, tree-sitter syntax highlighting across 20 languages, split views, CSV spreadsheet mode, configurable keybindings, docked panels with markdown preview — over 40,000 lines of code across 521 commits. Most of it was written through 170+ conversations with Amp Codeagents over three months.

This isn't about the editor. It's about the framework that made sustained AI collaboration work on a project too complex for any single context window.

Why Text Editors

Text editors look simple — display text, handle keystrokes — but hide real engineering problems. Cursor choreography with selections. Grapheme cluster boundaries where é might be one or two code points. Keyboard modifier edge cases across platforms. Viewport scrolling that needs to feel instantaneous. HiDPI display switching. Five different text input contexts (main editor, command palette, go-to-line, find/replace, CSV cells) that all need cursor navigation, selection, and clipboard support.

They're a good stress test for AI agent workflows because the complexity is interaction complexity, not algorithmic complexity. There's no single hard problem — there are hundreds of easy problems that all interact. Getting multi-cursor selection to work correctly while scrolling in a split view with tree-sitter highlighting active requires consistency across many subsystems. That consistency breaks when dozens of AI sessions each make changes without shared context.

The question: can you build something this interconnected primarily through AI agents, if you provide enough structure?

After three months and 170+ threads, the answer is yes — but the structure matters more than the prompting.

Three Work Modes

Not a taxonomy I invented upfront. It emerged from noticing which sessions went well and which spiraled.

Mode	Purpose	Inputs	Example
Build	New behavior that didn't exist	Feature spec, reference docs	"Implement split view (Phase 3)"
Improve	Better architecture without changing behavior	Organization docs, roadmap	"Extract modules from main.rs"
Sweep	Fix a cluster of related bugs	Bug tracker, gap doc	"Multi-cursor selection bugs"

Build sessions have the highest information density. You hand the agent a specification — data structures, invariants, keyboard shortcuts, message types — and ask it to make it exist. The spec does most of the communicating.

Improve sessions are the trickiest. You're asking an agent to restructure code without breaking it, which requires understanding both the current architecture and the target. Tests are your safety net. If you don't have good coverage before an Improve session, stop and write tests first.

Sweep sessions leverage AI's strongest capability: apply this pattern everywhere. You give the agent a bug, explain the fix, and ask it to find every other place the same bug exists. Agents are tireless at this. Humans miss the 14th instance.

The critical rule: don't mix modes in a single session. A Build session that turns into "also fix these bugs I noticed" produces messy patches that are hard to review. Note the bug, start a new thread.

Documentation as Interface

The real insight from building Token: documentation isn't for humans reading later. It's the API between you and your agents. Every session starts with the agent reading context documents. If those documents are vague, the output is vague. If they're precise, the output is precise.

Three types of documents drive the work:

Reference Documentation

A source of truth for cross-cutting concerns.EDITOR_UI_REFERENCE.md defines the "physics" of the editor: viewport math, coordinate systems, cursor behavior, scrolling semantics, how pixel positions map to text positions.

This document exists because without it, every agent session independently invents its own coordinate system. One session puts the origin at the top-left of the window. Another puts it at the top-left of the editor area, after the sidebar. A third accounts for the tab bar height, a fourth doesn't. You end up with code that works in each session's test case but breaks when features interact.

Before implementation, the Oracle reviewed this document and found 15+ issues: off-by-one errors in viewport calculations, division-by-zero edge cases in scrollbar thumb computations, preferredColumn documented as a column index but implemented as a pixel X value. Each would have been 1-3 hours of debugging later. The review cost minutes.

Feature Specifications

Written before implementation.SELECTION_MULTICURSOR.mddefined data structures, invariants, keyboard shortcuts, message enums, and a phased implementation plan — before any code was written.

The key is specificity. Not "add multi-cursor support" but:

// MUST maintain: cursors.len() == selections.len()
// MUST maintain: cursors[i].to_position() == selections[i].head

These invariants became the spec. Every agent session that touched cursor code could check its work against them. When a sweep found that Cmd+Shift+K (delete line) wasn't deduplicating cursors after the deletion, the invariant told the agent what "correct" looked like.

Gap Documents

For features at 60-90% completion — the dangerous zone where a feature mostly works and the remaining bugs are scattered and hard to articulate.MULTI_CURSOR_SELECTION_GAPS.mdlisted what was implemented vs. missing, design decisions needed, and success criteria for each gap.

This turns "multi-cursor is mostly working" into a concrete checklist that an agent can pick up cold and work through item by item. Without gap docs, you spend the first half of every session re-explaining what's already done and what's broken.

Agent Configuration

AGENTS.md tells agents how to work in your codebase: build commands, architecture, conventions. Specifying make testinstead of letting agents invent cargo test --all-features --no-fail-fast eliminates entire categories of friction. Specifying the Elm Architecture pattern (Message → Update → Command → Render) means agents add features using the existing architecture instead of inventing their own.

Token's AGENTS.md grew from a few build commands to a comprehensive architecture reference — module descriptions, the message/command pattern, file organization, release procedures. It's the cheapest investment with the highest return. Every session starts by reading it.

Case Study: Multi-Cursor

Adding multi-cursor to a single-cursor editor touches nearly every file. Every movement handler, every editing operation, every selection check. The wrong approach is doing it all at once. The right approach is to lie to the codebase.

Migration helpers:

impl AppModel {
    pub fn cursor(&self) -> &Cursor { &self.editor.cursors[0] }
}

This accessor lets all existing code keep working unchanged while the underlying data structure switches from a single cursor to a Vec<Cursor>. Old code calls .cursor() and gets cursors[0]. New code uses explicit indexing. Call sites migrate incrementally across sessions.

Phased implementation:

Phase 0: Per-cursor primitives (move_cursor_left_at(idx))
Phase 1: All-cursor wrappers (move_all_cursors_left())
Phase 2-4: Update handlers, add tests
Phase 5: Bug sweep

The issue was straightforward: all cursor movement handlers used .cursor_mut() which only returned cursors[0]. The fix was adding per-index primitives, then wrapping them in all-cursor helpers that call deduplicate_cursors() after each movement.

Threads: T-d4c75d42,T-6c1b5841,T-e751be48

Case Study: Split View

Split view was implemented across 7 phases in a single thread (T-29b1dd08):

Phase	Description
1	Core data structures: ID types, EditorArea, Tab, EditorGroup, LayoutNode
2	Layout system: `compute_layout()`, `group_at_point()`, splitter hit testing
3	Update AppModel: Replace Document/EditorState with EditorArea, add accessors
4	Messages: LayoutMsg enum, split/close/focus operations, 17 tests
5	Rendering: Multi-group rendering, tab bars, splitters, focus indicators
6	Document sync: Shared document architecture (edits affect all views)
7	Keyboard shortcuts: Cmd+\, Cmd+W, Cmd+1/2/3/4, Ctrl+Tab

Key architectural decision: documents are shared (HashMap<DocumentId, Document>), editors are view-specific (HashMap<EditorId, EditorState>). Multiple editors can view the same document with independent cursors and viewports. This decision was in the spec before any code was written — and it held up through every subsequent feature.

A research phase (T-35b11d40) had compared how VSCode, Helix, Zed, and Neovim handle splits and keymaps. Twenty minutes of research that prevented architectural dead ends.

Case Study: Module Extraction

By December 6th, main.rs had grown to 3,100 lines. A series of Improve sessions (T-ce688bab throughT-072af2cb) extracted it into modules:

update_layout and helpers → update/layout.rs
update_document and undo/redo → update/document.rs
update_editor → update/editor.rs
Renderer → view.rs
PerfStats → perf.rs
handle_key → input.rs
App and ApplicationHandler → app.rs

After: main.rs was 20 lines. All tests passing. This is Improve mode at its best — agents are excellent at mechanical extraction when you define the target module structure. No judgment calls, just move code and fix visibility modifiers.

Case Study: The Cmd+Z Sweep

Thread T-519a8c9d: Cmd+Z was inserting 'z' instead of undoing on macOS.

Root cause: the key handler only checked control_key(), not super_key() (macOS Command key).

// Before (broken on macOS)
if modifiers.control_key() && key == "z" { ... }

// After (cross-platform)
if (modifiers.control_key() || modifiers.super_key()) && key == "z" { ... }

A one-line fix. But the single bug triggered a Sweep: find every other keyboard shortcut that makes the same assumption. The agent checked all modifier handlers and found several more instances. This is the pattern — a bug isn't just a bug, it's evidence of a systematic issue. Sweep mode turns one fix into a class of fixes.

Development Timeline

Token's development spans three months across 15+ phases:

Phase	Dates	Focus
Foundation	Dec 3-5	Setup, reference docs, Elm Architecture
Feature Dev	Dec 5-6	Split view, undo/redo, multi-cursor
Refactor	Dec 6	Extract modules from main.rs (3100→20 lines)
Keymapping	Dec 15	Configurable YAML keybindings, 74 defaults
Syntax	Dec 15	Tree-sitter integration, 20 languages
CSV Editor	Dec 16	Spreadsheet view with cell editing
Workspace	Dec 17	Sidebar file tree, focus system
Unified Editing	Dec 19	EditableState system for all text inputs
Perf & Find	Dec 19-20	Event loop fix (7→60 FPS), find/replace
File Dialogs	Jan 6-7	Native open/save, config hot-reload
Panels & Preview	Jan 7-9	Docked panels, markdown/HTML preview
Themes	Feb 18	Dracula, Catppuccin, Nord, Tokyo Night, Gruvbox
Bracket Matching	Feb 18	Auto-surround, bracket highlighting
Syntax Perf	Feb 19	Highlight pipeline rewrite, deadline timers
Recent Files	Feb 19	Cmd+E modal, persistent MRU list, fuzzy filtering
Code Outline	Feb 19	Tree-sitter symbol extraction, dock panel

Each phase was 1-3 days. The longest gaps — Dec 20 to Jan 6, Jan 9 to Feb 17 — were periods where I worked on other projects (Sema, SQL Splitter). The codebase waited. When I came back, the documentation was the bridge — a new agent session reads AGENTS.md, the reference docs, and picks up exactly where the last one left off.

What I'd Do Again

Write invariants before code. The cursors.len() == selections.len() invariant was the most valuable line in the entire project. It gave every agent session a correctness criterion. When something broke, the invariant told you what broke and what "fixed" looked like.

Review reference docs before implementation. Having Oracle review EDITOR_UI_REFERENCE.md caught 15+ bugs that would have each cost hours of debugging. The document itself cost an afternoon. The review cost minutes.

Explicit modes. Declaring Build/Improve/Sweep at the start of each session prevented scope creep more reliably than any other technique. When an agent notices a bug during a Build session and you say "note it, don't fix it," the session stays focused.

Gap documents. Turning "this feature is mostly done" into a checklist is the highest-leverage documentation you can write. An agent can pick up a gap doc cold and produce useful work immediately.

What I'd Change

Write AGENTS.md on day one. Token's early sessions had friction because agents had to discover build commands and architecture patterns. Writing the configuration file upfront would have saved cumulative hours.

Test before Improve. Some Improve sessions ran without comprehensive test coverage. The module extraction worked because it was mechanical, but it was lucky. I'd insist on test coverage before any structural refactoring now.

Smaller threads. Some Build sessions tried to do too much in a single context window. The split view implementation worked as 7 phases in one thread, but several other features would have been cleaner as separate threads per phase. Context quality degrades as threads get long.

The Framework

The methodology generalizes beyond editors. The principles:

Declare a mode. Build, Improve, or Sweep. Don't mix.
Write the docs first. Reference documentation for cross-cutting concerns, feature specs for new behavior, gap docs for unfinished work.
State invariants explicitly. Give agents a correctness criterion they can check against.
Use migration helpers for incremental change. Don't rewrite everything at once. Create accessors that let old code work while new code uses the new structure.
Configure your agents. AGENTS.md with build commands, architecture patterns, and conventions.
Research before architecture. A twenty-minute thread comparing how other projects solved the same problem prevents dead ends.
Sweep systematically. One bug means more bugs like it. Fix the class, not the instance.

Token is the evidence for this framework, not the point. The same approach droveSema and every project since. The projects get more ambitious; the framework stays the same.

Token is MIT licensed at github.com/HelgeSverre/token. All 170+ conversation threads are public at ampcode.com/@helgesverre, with the full thread list and summaries in docs/BUILDING_WITH_AI.md.

DEV Community