Erik Israni

Posted on Mar 5 • Originally published at opensourceagents.hashnode.dev

Why the Best Open Source Teams Treat Writing, Testing, and Reviewing as Three Separate Jobs

#codereview #agents #ai #testing

You adopted AI to move faster, and it worked. Output went up. Features that used to take days landed in hours. Contributors shipped more. And then something broke: AI accelerated the part of development that creates work for reviewers, but didn't do anything about review itself.

The result is what I call the AI Throughput Gap.

Traditional workflow:
Write -----> Test -----> Review

AI-assisted workflow:
AI Writes Faster ---> PR Volume Explodes ---> Review Becomes the Bottleneck

PRs pile up faster than any one person can clear them
Edge cases slip through because reviewers are moving too fast
Contributors wait days for feedback and go quiet
You merge something that looked fine and spend the next week fielding issues

This isn't a productivity problem. It's a tooling problem. Somewhere along the way, the idea took hold that if AI can help you write the code, it can handle the rest too. Review included.

It can't. Not the same way. Understanding why is the difference between shipping confidently and shipping slop.

What "Agentic" AI Is Actually Optimized For

When you're building something, describing a feature, watching an AI scaffold a component, iterating on logic, the model is operating in creation mode. Its job is to generate, move forward, and produce the next thing based on what you want.

That's powerful. But it's directional. The model is optimized for momentum.

Reviewing code is the opposite. Instead of moving forward, you're scanning laterally across an entire change, looking for:

What's missing
What's fragile
What works in isolation but breaks under load
What conflicts with something three files away

The questions are fundamentally different:

Creation mode asks: Does this do what it's supposed to do?

Review mode asks: What does this do that we didn't intend? What did we forget? What's going to bite us six months from now?

These are different cognitive jobs. And that difference is exactly why context matters so much, which is what makes collapsing them into one tool so costly.

The Hidden Cost of One-Session Review

When an AI helps you write a feature, it absorbs your intent. It knows:

What you were trying to build
The decisions you made along the way
The tradeoffs you rationalized in real time

That context is valuable while building. It becomes a liability when reviewing.

A good code reviewer doesn't know what you meant to do. They only see what you actually did. The distance between intent and implementation is exactly where bugs live, where security gaps hide, where forgotten edge cases sit quietly waiting.

Using the same AI session that wrote your code to review it is like proofreading your own writing immediately after finishing it. Your brain fills in what should be there. You miss what isn't.

Human engineering teams figured this out long ago. That's why code review exists as a discipline separate from implementation. AI teams are relearning the same lesson. And the solution is the same: separate the jobs.

The Three Jobs Framework

Your development workflow is three genuinely distinct jobs, each with its own goal and failure mode.

Writing

Goal: Produce working code that implements intent
Failure mode: Building the wrong thing, or in a way that's hard to maintain

Testing

Goal: Confirm the code does what it claims
Failure mode: False confidence from tests that pass but miss the cases that matter

Reviewing

Goal: Find what writing and testing missed
Failure mode: A clean merge on code that causes problems nobody saw coming

What only review catches:

Security vulnerabilities
Performance edge cases
Style inconsistencies that become technical debt
Architectural drift that compounds over time

Collapsing these into one tool or one session doesn't make the workflow efficient. It makes each job worse. The teams shipping reliably with AI aren't using it to do everything at once. They're using it to do each job better, separately.

AI doesn't replace good process. It amplifies it when you separate the jobs.

Why This Matters More in Open Source

For a closed product, a bad merge is a bad day. You fix it, ship a patch, move on.

In open source, the blast radius is everyone who depends on you.

OSS maintainers are already stretched thin:

Often one or two people managing contributions from dozens
Reviewing PRs from contributors you've never met
Holding context across a codebase that keeps growing
Every merge carries downstream consequences for users who built on your project

A security vulnerability in a popular library doesn't just affect your users. It affects their users. A performance regression in a widely-adopted package ripples outward in ways that are hard to track and harder to undo.

And yet OSS maintainers are typically the least resourced to handle this well. No dedicated QA. No security review team. Just you, your contributors, and whatever time you can carve out.

AI increased code throughput for everyone. Maintainers still have the same number of hours. That gap has to close somewhere, and right now it's closing on your review queue.

What Purpose-Built AI Review Actually Looks Like

A coding assistant and a code reviewer are doing completely different jobs. Kilo's Code Reviewer doesn't pick up where your last prompt left off. It reads a completed change fresh, the way a senior engineer would.

Here's what that looks like in practice:

Reads diffs, not prompt history. It sees only what changed, with no prior context about what you intended.
Compares against codebase patterns. It flags when a change drifts from established patterns in the rest of the project.
Runs security checks automatically. Each PR gets checked for common vulnerabilities before it touches your main branch.
Runs on every PR, automatically. External contributor submissions get a thorough first pass before you ever have to look at them.

For maintainers, that translates to:

Issues caught before merge, not after
Contributors getting faster, more substantive feedback, which keeps them engaged
Consistent review quality even when you're offline or heads-down on something else
A review queue that doesn't require you to be everywhere at once

The Right Tool for Each Job

The goal isn't more AI in your workflow for its own sake. It's the right tool for each job, applied where it actually helps.

You've already seen what's possible when a tool is matched to the task. Review is just the next job in line, and it deserves the same intentionality.

The teams that look back on this period and feel good about how they built won't be the ones who used AI the most. They'll be the ones who used it most clearly.

Free Tooling for Open Source Maintainers

The most critical software in the ecosystem shouldn't lose to commercial projects because it can't afford the tooling. That's why we built the Kilo OSS Sponsorship Program.

We're already supporting over 280 open source projects with access to Kilo's full platform, including credits for the Code Reviewer.

Three sponsorship tiers based on project size and maturity:

Tier	Value	Who It's For
Seed	$9K/year	Early-stage or smaller OSS projects
Growth	$24K/year	Established projects with active contributors
Premier	$48K/year	High-impact projects with broad adoption