DEV Community

Brian Mello
Brian Mello

Posted on

What 10 Versions of an AI Code Review CLI Taught Me About Developer UX

You don't learn how developers think by reading docs. You learn by shipping something, watching it fail, and shipping it again.

I've been building 2ndOpinion, an AI code review tool where multiple models — Claude, Codex, Gemini — cross-check each other's reviews. Over the past few months and ten CLI versions, I've rewritten the developer experience more times than I'd like to admit. Here's what actually stuck.

Version 1: The "Just Ship It" Phase

The first version of the CLI did exactly one thing: send your code to three AI models and print their reviews. It worked. Technically.

npx 2ndopinion-cli --file src/auth.ts --models claude,codex,gemini --format json --output review.json
Enter fullscreen mode Exit fullscreen mode

Five flags to get a single review. Every run required you to specify models, format, and output. Nobody wants to think that hard before getting feedback on their code.

Lesson: If your CLI needs a manual, you've already lost.

The "Smart Default" Breakthrough

The single biggest improvement wasn't a feature — it was removing decisions. Version 0.5.0 introduced one command that just works:

2ndopinion review src/auth.ts
Enter fullscreen mode Exit fullscreen mode

That's it. The tool auto-detects your language, picks the best models for that language based on real accuracy data, and prints a formatted review. No flags required.

Downloads jumped immediately. Not because the tool got more powerful — it got simpler.

Behind the scenes, --llm auto routes your code to whichever models perform best for your specific language. TypeScript reviews go to different models than Python reviews, because we track which models actually catch bugs in each language. But the developer doesn't need to know any of that.

The Feedback That Changed Everything

A developer tried 2ndOpinion and told me: "I got my review. Now what?"

That question haunted me. Getting a list of issues is step one. But developers don't want a report — they want their code to be better. So I built fix:

2ndopinion fix src/auth.ts
Enter fullscreen mode Exit fullscreen mode

One command. It reviews your code, identifies the issues, generates fixes, and applies them. You can review the diff before accepting. The entire loop from "something's wrong" to "it's fixed" happens in your terminal.

Then came watch:

2ndopinion watch src/
Enter fullscreen mode Exit fullscreen mode

Continuous monitoring. Save a file, get a review. Like having a pair programmer who never takes a break and never gets passive-aggressive about your variable names.

Lesson: The best developer tool is the one that closes the loop. Don't hand developers a problem — hand them a solution.

The Multi-Model Insight Nobody Asked For

Here's something I didn't expect: individual AI models are unreliable in predictable ways. Claude is excellent at architectural reasoning but sometimes misses edge cases in error handling. Codex catches implementation bugs that Claude misses. Gemini often spots performance issues the others overlook.

No single model is "the best." But three models reviewing the same code? They catch what each other misses. That's the core thesis of 2ndOpinion — consensus-based review.

When all three models agree something is a problem, the confidence is high. When they disagree, that's where the interesting conversations happen. We built a confidence-weighted system that surfaces high-agreement issues first and flags disagreements for human review.

The consensus command makes this explicit:

2ndopinion review --consensus src/auth.ts
Enter fullscreen mode Exit fullscreen mode

Three models review in parallel. You get a unified report with confidence scores. Three credits, one command, and a review that's more thorough than any single model could produce.

What I Got Wrong About Developer UX

I over-indexed on power users. Early versions had flags for everything: model selection, temperature, output format, verbosity levels, custom prompts. Power users loved it. Everyone else bounced.

The fix was layered complexity. The default command (2ndopinion) requires zero configuration. Power users can add flags to customize. But the first experience is frictionless.

I underestimated CI/CD. Developers don't just run tools locally — they run them in pipelines. Version 0.10.0 added --ci, --json, and --plain flags specifically for non-interactive environments. It sounds obvious in retrospect, but I spent months building interactive terminal UI before realizing half my users needed the opposite.

# In your GitHub Actions workflow
2ndopinion review --pr $PR_NUMBER --ci --json
Enter fullscreen mode Exit fullscreen mode

I ignored the "try before you buy" instinct. Developers don't sign up for things. They install them, try them, and decide in under 60 seconds. The free playground on get2ndopinion.dev — no signup required — exists because I watched too many developers hit a registration wall and leave.

What's Next: The Skills Marketplace

The most surprising thing I've learned is that every team has domain-specific review needs. A fintech team cares about different patterns than a game studio. A team migrating from Python 2 to 3 needs a completely different lens.

So we're building a skills marketplace where developers can create custom audit skills — specialized review logic for specific domains — and sell them. Creators earn 70% of revenue. It turns tribal knowledge into something shareable and monetizable.

Think of it as npm for code review intelligence. Someone who's spent five years dealing with Django security footguns can package that knowledge into a skill that catches those issues for every Django developer.

The Takeaway

Ten versions in, the biggest lesson is this: developer tools win on defaults, not features. Every flag you add is a decision you're asking the developer to make. Every decision is friction. Every bit of friction is a reason to close the terminal and move on.

If you're building developer tools, here's my checklist: Does the zero-config experience work? Does the tool close the loop (find problem → fix problem)? Can it run in CI without modification? Can someone try it in under 60 seconds?

If you want to try multi-model AI code review, the CLI is one install away:

npm i -g 2ndopinion-cli
2ndopinion review your-file.ts
Enter fullscreen mode Exit fullscreen mode

Or try the playground at get2ndopinion.dev — no signup, no credit card, just paste code and see what three AI models think.

I'd love to hear what you've learned building developer tools. What UX lessons took you the longest to figure out? Drop a comment below.

Top comments (0)