DEV Community

zecheng
zecheng

Posted on • Originally published at lizecheng.net

AI Wrote the Code. Now AI Reviews It Too — And the Numbers Are Wild

Something clicked this week in how AI fits into software engineering — and I don't mean another "AI will replace developers" take. I mean something more specific and more useful: the toolchain is finally closing its own loop.

Here's the signal that made me sit up.

Claude Code Review: AI Solving the Problem AI Created

Anthropic shipped Code Review for Claude Code on March 9 — a multi-agent system that spins up a team of agents on every pull request, scans in parallel, cross-verifies findings to kill false positives, then surfaces a single high-signal comment with inline line-level annotations.

The internal numbers:

  • Before: 16% of Anthropic's own PRs received substantive review comments
  • After: 54%
  • On PRs with 1,000+ lines changed: 84% trigger findings, averaging 7.5 issues per PR
  • On PRs under 50 lines: 31% flagged, average 0.5 findings

Pricing is $15–25 per PR. Currently research preview for Team and Enterprise plans, averaging ~20 minutes per run.

But here's the part most coverage missed. Anthropic explained why they built this: engineer code output grew 200% in one year at Anthropic. AI-assisted generation created a volume problem that review capacity couldn't absorb. The bottleneck shifted from writing code to verifying code.

This is AI solving a problem AI created.

The HN community flagged that the cross-verification step — agents confirming findings with each other before surfacing them — is the actual innovation. Most AI review tools flood PRs with noise. Engineers learn to ignore noise. Signal gets buried. The filtering step is what makes the output actionable.

For builders, this changes the calculus on what "AI-assisted development" actually means. It's not just writing code faster. It's now: write, generate, review, constrain, ship. A full loop.

Mog: A Programming Language Designed for LLMs, Not Humans

On the same day, developer Ted shipped a Show HN for Mog — a statically typed, compiled language built from scratch for LLMs to write, not humans to read.

The design decisions are unusually coherent once you accept the premise:

  • Entire spec fits in ~3,200 tokens — small enough to live in a model's context window
  • No operator precedence — every expression requires explicit parentheses like (a + b) * c, eliminating LLM ambiguity
  • Capability-based permissions — host app explicitly controls which functions the Mog program can access; no surprise syscalls
  • Runtime plugin loading — agents can compile and inject new modules mid-session without restarting; relevant for long-running background agents
  • Compiled to native code via Rust — low latency, strong safety guarantees

The conceptual shift here is real. Programming languages have always been designed around human cognitive constraints: readability, maintainability, debuggability. Mog flips the assumption. The primary "developer" is an LLM, and the design optimizes for that.

The HN thread surfaced the obvious tension: language adoption requires ecosystem. A language written by something that doesn't need Stack Overflow answers sidesteps that friction in a weird way. The chicken-and-egg problem looks different when one side of the equation is a model, not a developer.

Mog is one person's project right now. But the design question it raises is legitimate: if agents are going to write significant volumes of code, a language designed for agent constraints — bounded spec, no ambiguity, capability isolation — could be meaningfully safer than generating Python or JavaScript with their full footgun suites.

Terence Tao Is Using Claude Code for Lean 4 Proofs

This one is worth pausing on.

Fields Medal winner Terence Tao — by most reasonable measures the greatest living mathematician — published a video demonstrating how he uses Claude Code to convert informal mathematical arguments into Lean 4 formal proofs.

For context: Lean 4 is a proof assistant where every logical step must be machine-verifiable. No handwaving. No "it's obvious that." If it compiles, it's correct. Error tolerance is essentially zero.

Tao uses Claude Code as what he calls the "translation layer" — taking his intuitive mathematical reasoning and converting it into the rigid syntax Lean demands. The mathematical insight (the hard part) stays human. The formalization (the tedious part) goes to Claude Code.

The "AI helps math genius" headline isn't the interesting part. The workflow model is. Tao is using AI the same way sophisticated engineers use it: as a force multiplier for mechanical steps, freeing cognitive bandwidth for the parts that require genuine reasoning.

For builders watching this: the convergence of AI code generation with symbolic reasoning systems like Lean or Coq could matter for safety-critical software. Formally verified code, AI-assisted. That's not today's product. But the trajectory is pointing there.

OpenClaw's 68K Stars Are Building an Ecosystem in Real Time

OpenClaw — a self-hosted personal AI agent framework — crossed 68,000 GitHub stars. The star count matters less than what's forming around it.

In about 48 hours this week, three independent teams shipped products directly to Hacker News targeting OpenClaw specifically:

Clawcard — gives AI agents real governed identities: email inbox, SMS number, virtual Mastercards with spend limits, encrypted credential vault with full audit trail. You hand your agent a card with guardrails and let it operate.

HELmR — every agent action passes through an authorization airlock enforcing mission budgets, capability tokens, and deterministic execution control. Nothing executes without clearing the checkpoint.

Time Machine — "Git for agent execution." When an agent fails at step 9 of a 10-step workflow, you don't re-run from step 1. Time Machine forks from step 8, lets you swap a model or edit a prompt, replays only downstream steps, diffs the two runs side by side. Explicit target: teams burning $100+ daily on complete re-runs after partial failures.

This is what platform formation looks like before anyone announces it. Not the framework — the services ecosystem being built on top. Think Stripe in 2011. The businesses that win this cycle won't build the agent framework. They'll provide governance infrastructure, specialized skills, and enterprise deployment for organizations that adopt OpenClaw but can't self-serve the operational complexity.

One risk the V2EX community in China is already flagging: Skills-level supply chain risk. Installing an unvetted skill from a stranger is essentially running arbitrary code. Their current advice: run skills through an AI audit before installing, use Docker sandbox mode by default. Worth taking seriously before your agent has a credit card.

What This Means for Builders

  • The bottleneck just moved again. AI-assisted generation solved the writing problem and created a review problem. Code Review is the first serious automated answer to that. Expect similar tools targeting testing, deployment validation, and spec verification within the next 6–12 months.

  • Capability isolation is the coming default. Mog, Clawcard, HELmR — different architectures, same underlying pressure. When agents have filesystem access, internet access, and credit cards, the industry will standardize on explicit capability grants rather than implicit trust. Build your tools with that assumption now.

  • AI citation is the new SEO signal. This week's data shows AI-referred traffic converts at 14.2% vs. 2.8% for traditional organic. When a brand appears in an AI Overview, its traditional organic CTR rises 35%. Structure your content with explicit answers and clear headers — not just for Google, but for the AI systems that are increasingly the first stop before Google.

  • Platform moments create services markets. OpenClaw at 68K stars is early. The businesses that build identity management, governance tooling, and enterprise deployment around it have a head start on a market that's forming right now. The same logic applies to any framework crossing that threshold.


Full report (including infrastructure finance analysis, Google's March core update breakdown, and the A-share capital flow data): Zecheng Intel Daily — March 10, 2026

Top comments (0)