Wes

Posted on Mar 14 • Edited on Mar 30 • Originally published at wshoffner.dev

Nobody Reviews Their Agent's Code

#opensource #go #ai #tooling

You tell your AI agent to implement a feature. It writes 150 lines across four files. You skim the diff, it looks reasonable, you commit. Two days later you're debugging an edge case the agent never tested, staring at a conditional that makes no sense, wondering why you didn't catch it.

The problem isn't the agent. The problem is that there's no review step. When a teammate opens a PR, you read the diff, leave inline comments, request changes, and approve when it's ready. When an agent writes code, you get a wall of terminal output and a vague sense that it probably worked. The review workflow that keeps human code honest doesn't exist for agent code.

Someone built it.

What Is crit?

crit is a PR-style review tool for LLM agent output, built by tomasz-tomczyk. It's a single Go binary that launches a localhost web UI, detects changed files in your git repo (or takes explicit file paths), and renders them with syntax-highlighted diffs. You click on lines to leave inline comments, just like a GitHub PR review. Then you tell your agent to address the feedback, and crit reloads the files with your comments carried forward into the next round.

That multi-round loop is the core idea. Leave comments. Agent fixes. Review again. Repeat until you're satisfied. The comments persist across rounds, so you can track whether your feedback was actually addressed.

46 stars. Created four weeks ago. Already has 32 Playwright E2E tests, SSE-powered real-time updates, dark mode, keyboard navigation, and a sharing feature. The velocity is unusual for a solo project this young.

The Snapshot


Project	crit
Stars	~46 at time of writing
Maintainer	Solo developer, daily commits
Code health	Clean Go backend, 2,500 lines of tests, 32+ E2E specs
Docs	One of the best CLAUDE.md files I've read (300+ lines covering architecture, API, testing, release)
Contributor UX	Same-day review, detailed design feedback, constructive tone
Worth using	Yes, if you use AI agents to write code

Under the Hood

crit's architecture is deliberately simple. The backend is six Go files: main.go for CLI and server setup, server.go for HTTP handlers, session.go for the core state machine, git.go for git operations, diff.go for LCS-based line diffing between rounds, and status.go for terminal formatting. The frontend is vanilla JavaScript and CSS. No React, no build step, no bundler. The assets get embedded into the binary via Go's embed.FS, so distribution is a single file.

session.go is where the interesting logic lives, and at 1,470 lines it's the largest file by a wide margin. It manages the review session state, watches files for changes (polling git status --porcelain every second in git mode, or checking mtimes in file mode), broadcasts updates over server-sent events, and handles the multi-round workflow. When you call crit go PORT from your agent's terminal, it signals the running session to advance to the next round, reloading all files while preserving comment state.

The frontend is a 3,900-line app.js and a 2,000-line style.css. That's a lot of vanilla JS in one file, and it'll eventually need splitting. But the code is well-organized internally: state management, rendering, comment handling, SSE listeners, and keyboard shortcuts are all in clearly separated sections. The comment forms use a gutter interaction model (mousedown, drag to select lines, mouseup to open the form) that feels natural once you discover it.

What surprised me most was the test coverage. 2,500 lines of Go tests plus 32 Playwright E2E specs for a project that's been public for four weeks. The E2E suite covers both git and file modes, comment CRUD, multi-round workflows, theme persistence, keyboard navigation, and the sharing feature. That kind of test investment this early usually means the developer is building something they actually use daily, not just demoing.

The CLAUDE.md deserves its own mention. At 300+ lines, it covers the full architecture, every REST endpoint, the SSE event protocol, testing conventions, the release process, and coding guidelines. It's the most comprehensive project instruction file I've seen on a repo this size. If crit is a tool for reviewing AI-generated code, its own development docs suggest the maintainer is eating his own cooking.

The rough edges are what you'd expect from a young project. The vanilla JS frontend will hit a complexity wall eventually. There's no CLI-only mode for terminal purists. Comment data lives in .crit.json in the working directory, which means it doesn't travel with the code unless you commit it. None of these are deal-breakers at this stage.

The Contribution

crit's ROADMAP listed "Comment templates" as a near-term feature, so I built it. The idea: clickable pill buttons that insert common review phrases ("This will fail when...", "Missing error handling for...") into the comment textarea. A small UX improvement that saves keystrokes during reviews.

My first implementation had five default templates, localStorage for persistence, and an always-visible template bar. It worked. The maintainer responded same-day with seven design changes.

His reasoning was specific and well-considered. No default templates, because he wasn't confident enough in universal defaults yet. Cookies instead of localStorage, because crit launches on a random port each session and localStorage is scoped per origin, meaning templates would vanish between runs. A "Save as template" button in the actions row instead of an always-visible bar, so the feature earns its screen space. The template bar should only appear once you've saved at least one. Hover delete on chips. Truncation for long text. And E2E tests for both git and file modes.

All seven points were fair. Some I wouldn't have caught on my own (the localStorage/random-port issue is subtle and specific to crit's architecture). I reworked the entire implementation: cookie-backed storage, a save dialog that pre-fills with your current comment text, chips that appear only when you have templates, hover-to-delete with a x button, ellipsis truncation with title tooltips, and 14 new Playwright E2E tests covering empty state, save flow, insert, delete, persistence across page reloads, and both operating modes.

PR #28 was merged the same day I pushed the rework. The maintainer said he'd merge it and follow up with a small tweak to make the delete button always visible instead of hover-only. That's the kind of interaction that makes contributing satisfying: the feedback improved the feature, and the follow-through was fast.

The Verdict

crit is for developers who use AI agents to write code and want the same review discipline they'd apply to human PRs. If you're using Claude Code, Cursor, Copilot Workspace, or any agent that modifies files, crit gives you a structured place to review those changes before they become your problem.

The project is very early. Four weeks old, solo-maintained, under 50 stars. But the foundations are solid: clean architecture, real tests, a maintainer who responds same-day with thoughtful feedback. The multi-round workflow is the differentiator. Other diff viewers can show you what changed. crit lets you have a conversation about it.

What would push it further? A VS Code extension for inline review without leaving the editor. A headless CLI mode for reviewing diffs in the terminal. Better discoverability of the gutter interaction (I clicked lines for a minute before realizing you need to click-drag). But the core loop already works, and the pace of development suggests these will come.

Go Look At This

If you use AI agents to write code, try crit. Run it against your next agent-generated changeset. Leave comments, tell the agent to fix them, review the next round. See if it changes how much you trust the output.

Star the repo. The maintainer is responsive and the contribution experience was one of the best in this series. Here's the PR.

This is Review Bomb #6, a series where I find under-the-radar projects on GitHub, read the code, contribute something, and write it up. If you know a project that deserves more eyeballs, drop it in the comments.

This post was originally published at wshoffner.dev/blog. If you liked it, the Review Bomb series lives there too.

Top comments (2)

Karlis • Mar 16

This approach sounds great, but for me, it's already a bit too late once the code is generated. I usually start by writing high-level intents and then ask, “Am I missing something?” From there, I iterate on refining those intents until I’m satisfied (or bored). Of course, I can’t possibly anticipate every question, so I try to focus only on the most important ones.

Wes • Mar 17

Absolutely agree, I think Crit is best as part of a layered approach. I follow a similar pattern where I define my high level intents and then drill down into each feature, almost making a semantic map of the codebase before the first line is ever written. Only then do I start. Tools like Crit help you stay in the loop but they're not one-stop shops.