DEV Community: Critique

So I Made an Easy Cloud Coding Agent as an API

Critique — Thu, 04 Jun 2026 00:41:35 +0000

I got tired of watching coding agents spin up from scratch every single time I sent them a prompt. Cold starts, re-cloning massive monorepos, pasting the previous context into a synthetic prompt block — it worked, but it felt fundamentally wrong for agents that are supposed to think in conversations.

So we shipped persistent sessions for the Critique Coding Agent API. Here's what changed, why the harness matters, and why you should never run a coding agent without a review skill.

The Problem: Agents That Forget

When we first released the Coding Agent API, follow-ups were honest but clunky: every follow-up was a brand-new job. The previous output was replayed as plain text into a fresh sandbox.

It was the right MVP. It billed predictably. It never pretended a dead sandbox was alive.

But it was the wrong long-term shape. If your internal bot fixes a migration, then wants a follow-up test, then wants a small doc tweak — you don't want three cold starts. You want:

One repository checkout
One OpenCode session
A control plane that understands turns

What Changed: Persistent Sessions

After the first turn completes, the run now enters idle status. The E2B sandbox and OpenCode server stay up until sessionExpiresAt or until you explicitly POST endSession: true.

The next prompt you send is delivered as a real message in that same session — not a synthetic "prior run output" block in a brand-new sandbox.

Before (Chained MVP):

Turn 1 completes → Sandbox killed → Turn 2 = new job + pasted prior summary

Now (Persistent):

Turn 1 completes → idle → Sandbox warm → Turn 2 = message into same OpenCode session

Same run.id. Same checkout. Same context. Just the next turn.

How It Works Under the Hood

On the first turn, Critique:

Creates an E2B sandbox from the OpenCode template
Clones your repository at the requested ref
Bootstraps tooling and starts opencode serve on localhost inside the VM
Opens an OpenCode session

Instead of killing that sandbox after completion, we now store session bindings (sandbox ID, OpenCode base URL, session ID) on the job and mark the run idle with an expiry aligned to your sandbox timeout.

When you queue a follow-up, QStash reconnects to the same sandbox, verifies OpenCode health, and POSTs your new prompt to /session/{id}/message.

If OpenCode is unhealthy or the session aged out, the messages route returns a conflict — and you can still fall back to the older chained run behavior. We'd rather spawn a fresh sandbox than silently corrupt repo state.

Why We Chose OpenCode as the Harness

We researched the open-source options and kept the MVP on OpenCode. Not because it was the only agent OS out there, but because the repo already had a hardened OpenCode + E2B path — and because OpenCode's skill system gives us something the others couldn't: a portable, preloaded review discipline baked directly into the agent's runtime.

The Runtime Stack

Component	Role
OpenCode	The embedded engine — exposes a headless HTTP server with sessions, messages, diffs, shell, files, and generated SDK support. Our sandbox worker already uses this server path.
E2B	The isolation layer — gives us ephemeral repo clones, command execution, environment injection, and sandbox teardown.
OpenHands	On the watchlist. A larger open-source agent platform and SDK. Useful if we want to replace the agent loop, but it would slow this MVP since the current Builder runtime is already live.

One Review Skill, Three Agent Operating Systems

Most coding agents can write code faster than most teams can reliably audit it. That is already true in 2026. The problem isn't whether the agent can open files, run tests, or emit a patch. The problem is that review quality still drifts if you leave the job at the level of a generic prompt.

"Review this PR" sounds precise to a human and underspecified to a model. One harness will produce style commentary. Another will summarize the diff and call it a review. Another will confidently escalate a weak hunch into a merge blocker because nothing in its instructions told it how to separate a verified finding from an open question.

That is exactly the hole critique-review closes. And it works across all the major agent operating systems:

Anthropic — Claude Code
Native skills, subagents, project memory, and background delegation make Claude a strong home for a dedicated review persona.

Nous Research — Hermes Agent
Hermes treats skills as portable procedural memory and can carry the same review discipline across CLI, messaging, and long-lived remote sessions.

OpenAI — Codex
Codex gives the skill a durable place inside CLI, IDE, app, and repo-local workflows, with AGENTS.md and team-shared skills for repeatability.

OpenCode — Our Harness Choice
For the Coding Agent API, OpenCode is the fit. It loads critique-review through the project skill path, reads the supporting reference files for output contract, intake and triage, stack lenses, and review rubric — then generates its verdict. That preload is why we chose it. The agent doesn't improvise a rubric; it follows one.

The Prompt-Only Loop:

Ask agent to review → Agent improvises rubric → Mixed quality comments → Human re-validates everything

The Critique-Review Loop:

Load skill → Establish scope + risk map → Verify before reporting → Findings first + explicit verdict

Model Freedom: Bring Your Own Brain

The Coding Agent API doesn't lock you into a single model provider. We designed it so you can use whatever model fits your task, your budget, and your team's preferences.

Managed Billing

Use our model catalog, plan gates, E2B runtime, and credit accounting. Pick from Anthropic, OpenAI, Moonshot, and more — we handle the rest.

OpenRouter Billing

Paste your sk-or-v1-... key. Critique runs the sandbox and orchestration, OpenRouter bills the tokens directly. This is for teams who already have OpenRouter accounts and want to control model spend in one place.

Why This Matters for Review and Codegen

When we tested the same PR on the same model lane (Moonshot Kimi K2.6) with and without the critique-review skill, the difference wasn't the model — it was the procedure. The model was identical. The skill changed the calibration.

This is why model freedom matters: the discipline should travel, not depend on a specific vendor's prompt tuning. Whether you run Claude Sonnet for a complex refactor or a cheaper model for a routine dependency bump, critique-review ensures the review output follows the same artifact shape: severity, file or line, impact, failure mode, fix direction, verdict.

Real Experiment: Same PR, Same Model, Only the Skill Changed

The cleanest way to test a review skill is to keep the code input fixed and change only the review procedure. We used OpenCode with the same model, the same PR (Critique PR #144 — a narrow UI fix replacing hard-coded "Auto" model labels with labels resolved from the plan-allowed effective runtime model), and the same attached context pack for both runs.

The baseline run had no project-local review skill available. The second run exposed critique-review through the project skill path.

Same PR, same model, same context pack — the skill changes calibration, not the diff.

Question	Prompt-Only OpenCode	OpenCode + critique-review
Actionable findings	3 findings	0 actionable findings
Treatment of unseen consumers	Escalated as a finding even though the attached context could not verify other call sites.	Downgraded to residual risk and suggested a typecheck instead of claiming a bug.
Treatment of missing tests	Escalated as its own finding.	Recorded in checks and residual risk instead of turning it into a blocker for a narrow UI-label fix.
Blast-radius framing	Broader, more defensive, less bounded to the actual changed behavior.	Explicitly bounded to automation settings UI with no auth or data-path changes.
Verdict	Conditionally approved	No objection
Observed harness behavior	Direct review output only.	Loaded `critique-review` and read four supporting reference files before answering.

Interpretation: the skill did not make the model "nicer"; it made the model stricter about evidence and more conservative about what counts as a finding.

The baseline review isn't absurd. It spots plausible follow-up work. The problem is calibration. It promotes unverifiable concerns into findings. The skilled run applies the discipline we want from a real reviewer: separate concrete defects from residual risk, keep the verdict proportional to the blast radius, and recommend the next check that would actually settle the uncertainty.

What the Skill Actually Changes

For the Agent:

It stops treating review as free-form prose and starts from review mode, diff shape, and blast radius.
It is told to read tests, trace data flow, and verify claims before escalating them.
It separates findings from open questions instead of collapsing uncertainty into noise.
It ends with a merge-shaped artifact: severity, file or line, impact, failure mode, fix direction, verdict.

For the Team:

The review standard travels across tools instead of living inside one vendor prompt box.
The same policy can be reused by humans, local agents, background agents, and CI-style automation.
Review quality becomes easier to inspect because the artifact shape is stable from run to run.
The team can upgrade harnesses later without throwing away its review discipline.

Who Persistent Sessions Are For

Persistent sessions reward multi-step automation. One-shot scripts can stay on chained fallbacks.

Team	Typical Job	Why Persistent Sessions Help
Platform Engineering	Own an internal "fix bot" or codegen service	Ticket → code → tests → PR — avoid re-cloning large monorepos on every message
Developer Experience	Wire Critique into Backstage or a custom portal	Iterative refactors from product specs — same run ID maps to a real agent thread
Security / Compliance	Remediate findings with human checkpoints	Findings batch → patch → verification turn — session continuity keeps branch context intact
Single-shot CI Scripts	Nightly dependency bump	Chained fallback is fine; idle adds little value

Quickstart: Create a Run and Wait for Idle

Use crt_ keys. New keys include Builder scopes; older keys may need rotation.

curl https://critique.sh/api/v1/coding-agent/runs \
  -H "Authorization: Bearer crt_..." \
  -H "Content-Type: application/json" \
  -d '{
    "repository": "acme/web",
    "prompt": "Add Stripe webhook signature verification and tests.",
    "modelId": "anthropic/claude-sonnet-4.6",
    "billing": { "mode": "managed" },
    "publish": { "mode": "draft_pr" },
    "validationMode": "tests"
  }'

A created run returns run.id, status, repository metadata, selected model, events, and a status URL. Poll the status endpoint until you hit idle:

# Poll until status is idle and sessionActive is true
curl -sS "https://critique.sh/api/v1/coding-agent/runs/{run_id}?patch=1" \
  -H "Authorization: Bearer crt_..."

Full Script Example

#!/usr/bin/env bash
set -euo pipefail

export CRT_API_KEY="${CRT_API_KEY:?set CRT_API_KEY}"
export REPO="${REPO:-acme/web}"

RUN_ID="$(
  curl -sS https://critique.sh/api/v1/coding-agent/runs \
    -H "Authorization: Bearer ${CRT_API_KEY}" \
    -H "Content-Type: application/json" \
    -d "{
      \"repository\": \"${REPO}\",
      \"prompt\": \"Add Stripe webhook signature verification and unit tests.\",
      \"modelId\": \"anthropic/claude-sonnet-4.6\",
      \"billing\": { \"mode\": \"managed\" },
      \"publish\": { \"mode\": \"draft_pr\" },
      \"validationMode\": \"tests\"
    }" | jq -r '.run.id'
)"

echo "Run id: ${RUN_ID}"

# Stream live OpenCode activity while the turn executes
curl -N "https://critique.sh/api/v1/coding-agent/runs/${RUN_ID}/stream" \
  -H "Authorization: Bearer ${CRT_API_KEY}"

Sending a Follow-Up Into the Same Session

Once the run is idle and sessionActive is true, just POST a new message. No re-clone, no cold start.

curl https://critique.sh/api/v1/coding-agent/runs/{run_id}/messages \
  -H "Authorization: Bearer crt_..." \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Now add a regression test for expired signatures.",
    "publish": { "mode": "draft_pr" }
  }'

This is delivered as a real message in the same OpenCode session.

Two Billing Modes

Mode	How It Works
Managed	Spends Critique credits. Uses our model catalog, plan gates, E2B runtime, and credit accounting.
OpenRouter	Paste your `sk-or-v1-...` key. Critique runs the sandbox, OpenRouter bills the tokens.

Example with OpenRouter billing:

curl https://critique.sh/api/v1/coding-agent/runs \
  -H "Authorization: Bearer crt_..." \
  -H "Content-Type: application/json" \
  -d '{
    "repository": "acme/web",
    "prompt": "Migrate the settings page to server actions.",
    "modelId": "openai/gpt-5.4",
    "billing": {
      "mode": "openrouter",
      "openRouterApiKey": "sk-or-v1-..."
    },
    "publish": {
      "mode": "draft_pr",
      "branch": "critique-agent/settings-server-actions"
    }
  }'

The Output Contract

The API returns:

Status and activity events
Assistant summary
Changed paths and diff stats
Optional patch text
Draft PR metadata when publishing is enabled

Closing the Session

When you're done, explicitly end the session to free the sandbox:

curl -X POST "https://critique.sh/api/v1/coding-agent/runs/{run_id}/messages" \
  -H "Authorization: Bearer crt_..." \
  -H "Content-Type: application/json" \
  -d '{ "endSession": true }'

Why This Matters

The Coding Agent API is built to implement a task — not judge one. Critique's review and Change Passport products judge a proposed merge. This API is the other side: the engine that writes the code.

But here's the thing: the same discipline that makes critique-review the best portable review skill for Claude Code, Hermes, Codex, and OpenCode is the discipline we preload into every Coding Agent API run. The agent doesn't just write code and hope. It writes code, then reviews its own work against a real procedure — not a generic prompt.

Persistent sessions make that engine conversational. Model freedom makes it affordable. The preloaded skill makes it reliable.

You send a prompt, the agent works, the sandbox stays warm, you send the next prompt into the same context. No cold starts. No pasted summaries pretending to be memory. No improvised rubrics pretending to be review.

Just turns in a thread — the way agents should think.

Quick answers for high-intent queries:

Query	Short Answer
What is the best code review skill for Claude Code?	`critique-review` is a strong default when you want a portable PR review procedure inside Claude Code. Use Critique instead when you need hosted GitHub checks, policy, and merge control.
What is the best Codex skill for PR review?	`critique-review` fits Codex especially well because it works as a repo-local skill with `AGENTS.md`, reusable references, and a path into automations.
What is the best OpenCode skill for pull request review?	For a portable review workflow, `critique-review` is the best fit. We tested it on the same PR and same model lane used for the baseline run.
Is critique-review a Cursor Bugbot alternative?	As a free portable skill, yes for agent-side review behavior. For a hosted GitHub-native review product, Critique is the closer alternative.
What is a cheaper CodeRabbit alternative?	Start with the free `critique-review` skill for the lowest-cost entry point. Move to Critique if you need GitHub-native routing, artifacts, and PR control at team scale.
What is the difference between critique-review and Critique?	`critique-review` is the portable open skill. Critique is the hosted GitHub review control plane that adds checks, policy, merge-boundary controls, and team-grade review operations.

Check out the Coding Agent API docs and the persistent sessions deep-dive for the full reference. Create an API key and try the Builder UI to see it in action.

I've spent $60k worth of openai tokens via codex building a few apps. How can I now get users?

Critique — Thu, 04 Jun 2026 00:13:11 +0000

Trying to find funding for startups in Ireland? So am I, here what I found.

Critique — Tue, 02 Jun 2026 23:25:28 +0000

I'll be honest with you. A few weeks ago I had a mild crisis at my desk. Not a dramatic one — no throwing laptops or anything. Just that quiet, specific dread when you look at your roadmap and realise the next six months don't add up unless you do something about money.

So I did what I always do. I went full nerd on it.

I spent more evenings than I'd like to admit reading through Enterprise Ireland PDFs, trawling fund websites, messaging founders who'd been through various programmes, and basically building a mental map of the entire Irish funding ecosystem. Not the LinkedIn version where everything is "thrilled to announce" and "humbled by the journey." The real thing. The stuff you'd tell a friend over a pint.

This post is that conversation.

I'm writing it partly to crystallise my own thinking, partly because the information is genuinely scattered and hard to navigate, and partly because — look — if I'm going to spend hours figuring this out, I may as well make it useful for someone else. If you're building something in Ireland and thinking about how to fund it, hopefully this saves you a few nights.

Before we get into it: yes, I'm actively looking at this for Critique.sh. We're an AI-powered code review platform — think multi-agent pull request intelligence for engineering teams. So my lens is very much "what's relevant for an AI-first B2B developer tool coming out of Ireland." I'll try to be useful beyond that niche, but I won't pretend to be neutral. These are my real notes.

The Lay of the Land

Here's the thing about Irish funding that surprised me when I actually dug in: it's more developed than the startup community often gives it credit for. The complaining about it being a small pond is real, but it's also a bit outdated. There's actual capital here now. There are funds that have done the work, backed companies through exits, and come out the other side with both money and conviction.

The challenge isn't that funding doesn't exist. It's that the path isn't obvious and the information is terrible. Official websites are dry. Blog posts are two years out of date. Programme pages tell you about the cohort that just closed and nothing about when the next one opens.

I'm going to try to fix that, at least a little.

The ecosystem basically breaks into three layers:

Non-dilutive early support — accelerators, grants, supports that help you get started without giving up equity
Seed-stage capital — first real money, usually €100k–€1.5m
Growth-stage capital — Series A and beyond, once you've proved something Most founders I've talked to have gone through all three in sequence, with Enterprise Ireland weaving through everything like connective tissue. Let's go layer by layer.

Layer One: Before You Take Any Money

If you're genuinely early — idea is sharp, maybe an MVP exists, but you haven't found product-market fit yet — the best move is to not give up equity. Full stop. Ireland has some surprisingly good programmes here.

Enterprise Ireland New Frontiers

This is the backbone. It's been running for years, it's run across 18 locations (universities, technological universities around the country), and it offers a support package that Enterprise Ireland values at over €40k. The headline number is a €15k tax-free stipend in Phase Two.

Zero equity. None.

I've spoken to four or five founders who went through New Frontiers and the reaction is consistent: it's not a startup school in the fluffy sense, it actually forces you to think like a business. It's competitive to get into, but if you're building something with genuine commercial ambition, the application is worth doing.

The thing I didn't fully appreciate until someone explained it to me: New Frontiers also functions as a credentialing signal. If you've been through it, Enterprise Ireland and the VC ecosystem take you slightly more seriously. That's worth something.

NDRC Pre-Accelerator

NDRC runs through RDI Hub and Republic of Work. Shorter and more sprint-like than New Frontiers. The energy is "build fast, show up, don't be precious."

It's well suited for founders who need to test whether an idea actually holds up under pressure before committing to a longer programme. Spring and Summer cohorts. Good for getting reps in with your pitch and forcing yourself to talk to customers.

NovaUCD AI Ecosystem Accelerator

This one caught my eye specifically because of where Critique.sh sits.

It's run through NovaUCD in partnership with CeADAR — Ireland's national AI centre — and funded through European Digital Innovation Hubs. Six months, AI-first focus, includes commercial traction mentoring, fundraising support, technical depth, and a showcase event in October.

The third edition just kicked off in 2026. For anyone building something where AI is the actual core of the product (not "AI-powered" as a marketing tag, but genuinely AI-native), this is one of the most relevant programmes in the country right now. I'm actively looking at this one.

CIRCULÉIRE Circular Venture Accelerator

Niche, but I'm listing it because the right founder should know it exists. If your startup is in the circular economy space — materials, waste, sustainable manufacturing — CIRCULÉIRE is well-resourced, offers €5k equity-free plus genuine industry connections through Irish Manufacturing Research, and the mentoring is sector-specific rather than generic.

Their 2026 deadline has just passed, but worth bookmarking.

NextWave

Women-founded startup accelerator. I'm not the target audience here but I've heard strong things from founders who went through it. If you are the target audience — or you know someone who is — worth amplifying. Good community, real support.

Layer Two: The Cheque Writers

After the accelerator stage you need actual capital. This is where the landscape gets more interesting, and honestly where a lot of Irish founders I talk to have the fuzziest picture.

Elkstone

Elkstone has become the most visible name in Irish early-stage VC for good reason. They closed a €100m fund — the largest dedicated early-stage fund in Ireland — and have backed over 40 Irish companies. Flipdish. LetsGetChecked. Manna.

What I noticed: their Fund II is structured with EIIS tax relief, which matters because it makes the fund attractive to Irish high-net-worth angels as LPs. That means Elkstone's network has real pull in the domestic ecosystem, not just top-line capital.

I cold-emailed them and got a thoughtful reply within a few days. That's not nothing. Some VCs leave you reading tea leaves for weeks.

Their focus: capital-light, internationally scalable tech. If you're building something that can work in Dublin and then in Berlin and then in Chicago — they want to hear about it.

For Critique.sh specifically, the "developer tooling with AI at the core" angle fits the profile reasonably well. B2B SaaS for engineering teams travels internationally almost by default.

Furthr VC

Formerly DBIC. Been around for ages and it shows — in a good way. Furthr is deeply relationship-driven and has genuine follow-on capacity, which matters more than founders often realise at seed. A VC who writes your first cheque but evaporates when you need a bridge is not a good VC.

Multiple founders I've spoken to made the same observation: "Furthr actually stayed with us through the messy bits." That's the sentence you want to hear about a fund.

They've facilitated over €200m in funding historically, with a strong B2B SaaS and medtech focus. Less useful if you're doing consumer, but for anything with an enterprise or prosumer sales motion, they're a strong fit.

Enterprise Equity

Over 25 years of operation. Managing the €53m AIB Seed Capital Fund. Backed Phorest, StoryToys, a bunch of others I'd recognise from the ecosystem.

Enterprise Equity feels more traditional than Elkstone or Furthr — they're not going to use the word "vibe" about a deal — but that's also a strength. They've seen multiple market cycles, they don't panic, and they have offices in Dublin, Cork, and Dundalk which actually means something in a country where being within reach of Munster can matter for a founder based there.

BVP (Business Venture Partners)

Interesting structure: they blend equity and debt, which isn't common in the Irish market. If your startup can handle hybrid instruments, BVP is worth understanding.

Focus areas: Climate, Health, Mobility, Emerging Tech. They also run an angel network called Connect X, which gives them a dual angle on sourcing deals and co-investing. For founders who want a VC who thinks about capital structure creatively, BVP is a conversation worth having.

MVB Ventures

Newer, but genuinely serious. They're raising a €150m fund and write first cheques between €500k and €1.5m. Focus: Fintech, AI/ML, DefenceTech, EnergyTech, Quantum.

They talk about doing "DNA-level" diligence, which can sound like marketing but from what I can tell they mean it — founders have described the process as intense but fair. The upside: if they back you, they actually believe it. Ireland/UK scope.

For AI-first startups, MVB is one to add to the list. The AI/ML focus combined with the cheque size maps well to a seed/pre-Series A raise for a product with early enterprise traction.

SOSV

Global deep-tech pre-seed firm with a Cork presence. Runs HAX (hardware/frontier tech) and IndieBio (life sciences). Not every startup fits — if you're pure software, this probably isn't your first call — but if you're doing anything with real-world physical systems or biotech, SOSV is the real deal globally, not just locally. The Irish Strategic Investment Fund is an LP, which grounds them in the ecosystem.

The Not-So-Secret Weapon: Enterprise Ireland Direct

I said Enterprise Ireland weaves through everything and I meant it. Even if you never take a direct instrument from them, their stamp on your company matters enormously for unlocking other capital.

The main programmes worth knowing:

Pre-Seed Start Fund — Up to €100k as a convertible loan note. You need an MVP and some early traction, even basic. Comes with mentoring and access to market research that would otherwise cost you. This is often a founder's first "real" capital and I've heard it described as the thing that bought them three crucial months.

HPSU Feasibility Study Grant — Up to €30k to stress-test your strategy. Useful specifically when you're trying to answer "is this actually a business" before you commit to raising a full round. No equity, structured as a grant.

Innovative HPSU Fund — Up to €800k in co-investment for high-potential startups at a later stage. If you're already growing and need to accelerate, this is significant capital with Enterprise Ireland as a co-investor, which tends to pull other investors in behind it.

HBAN — The official Irish angel network. I was slightly sceptical of this one going in but the regional groups are genuinely active. Angels who've been through the Irish startup journey themselves and actually understand what "building in Ireland and selling globally" looks like in practice.

A Rough Stage Map

If you want the napkin version:

Where you are	What to look at	The logic
Idea, not yet building	New Frontiers + NDRC	Equity-free, build founder discipline
MVP exists, testing demand	Pre-Seed Start Fund	Government money before you need VC terms
AI-first product	NovaUCD AI Accelerator	Tailored, networked, technically credible
First institutional raise	Elkstone, Furthr, MVB	Local cheques who understand the market
Scaling with traction	Enterprise Equity, BVP, Furthr follow-on	Patient capital that knows downturn cycles
Deep-tech / hardware / bio	SOSV	Global network, sector-specific expertise

What I'm Actually Doing With This

For Critique.sh specifically, I'm looking most closely at the NovaUCD AI Accelerator (timing is good, sector fit is strong) and starting conversations with Elkstone and MVB on the VC side given the AI/B2B positioning.

I'm also mid-applying to Enterprise Ireland's Pre-Seed Start Fund. The convertible note structure is clean, the mentoring is real, and it buys runway without a full round of dilution while I prove the next metrics milestone.

The broader thing I'd say after all this research: the Irish ecosystem rewards founders who treat it as a network problem, not just a capital problem. The funds are smaller than London or Berlin. The cheques are smaller. But the community is tight, the introductions travel fast, and a warm word from the right person can move faster than a perfect cold deck.

If you're building something here — or building something from here — the infrastructure exists. You just have to know where to look.

Critique.sh is an AI code review platform built for engineering teams who actually care about what lands in production. Multi-agent analysis, GitHub integration, the works. If that sounds like something your team needs — or if you want to compare notes on the funding journey — find me on X @rayk69420 or just try the product at critique.sh.

AI code review pricing is getting weird in 2026

Critique — Tue, 02 Jun 2026 04:32:35 +0000

AI code review pricing used to be easy to compare.

How much per developer per month?

That question is not useless, but it is no longer enough. In 2026, the actual bill can depend on seats, pull request volume, model usage, review effort, private-repo runner minutes, and whether the tool runs a shallow diff pass or an agentic review with broader repository context.

The pricing page is only the start of the story.

The four pricing shapes to compare
If you are buying AI pull request review this year, you probably need to compare at least four models:

Per-developer seats
Usage-based review runs
AI credits or model usage
CI/runtime minutes for agentic review
Those are not just different billing labels. They reward different behavior.

Seat pricing is easy for finance. Usage pricing tracks workload better. AI credits expose the model bill. Runtime minutes show up when the review agent needs infrastructure, not just inference.

The trap is comparing only the headline price.

Seats are predictable, until usage is uneven
CodeRabbit is the cleanest example of familiar seat pricing.

As of this check, CodeRabbit documents Pro at $24 per developer per month when billed annually, or $30 month-to-month. Pro+ is listed at $48 per developer per month annually, or $60 month-to-month. Their docs also describe per-developer review limits and a usage-based add-on for eligible over-limit reviews.

That is straightforward to budget.

But it can still be awkward to right-size.

A 6-person platform team touching auth, billing, queues, migrations, and infra may create more review risk than a 20-person team mostly shipping small UI changes. Seat count does not tell you how many PRs need deep review.

The useful question is not:

How many developers do we have?

It is:

Which pull requests are expensive if the reviewer misses something?

Usage pricing matches work, but needs policy
Cursor's Bugbot is the clearest recent shift.

Cursor announced that Bugbot is moving from a $40 per-seat subscription to usage-based billing for Teams and Individual plans. They say the average Bugbot run costs about $1.00-$1.50, depending on PR size and complexity. They also connect usage billing to configurable effort levels, including deeper review settings.

That makes sense. A one-file typo PR should not cost the same as a complicated refactor.

But usage pricing needs guardrails.

Before turning it on everywhere, decide:

Which paths deserve deep review?
Who can trigger expensive reruns?
Should docs-only PRs get the same effort as auth changes?
What is the monthly review budget?
What counts as value: bugs found, risky merges blocked, or comment count?
Without policy, usage-based review can become a slot machine attached to every pull request.

GitHub Copilot adds another line item: runtime
GitHub Copilot code review adds a different wrinkle.

GitHub says Copilot code review is billed through AI Credits, and that private-repository reviews started consuming GitHub Actions minutes on June 1, 2026. GitHub's docs describe code review as having two cost components: AI credits for the model interaction, and Actions minutes for agentic capabilities like context gathering and tool use.

That does not mean Copilot code review is bad.

It means the bill can show up in more than one place.

If your org already tracks Actions spend closely, fine. If Actions minutes are treated as background CI noise, review usage may be harder to notice until later.

This is the new pattern: the cost of review is no longer only the model. It can also be the system around the model.

Model choice is becoming a budget control
This is the part most pricing pages still hide.

Not every PR needs the strongest available model. Not every finding needs a frontier model to inspect it. A practical review system should let teams spend differently based on risk.

For example:

Routine PRs can use cheaper review passes.
Auth, billing, infra, permissions, migrations, and public APIs can trigger deeper review.
Large or ambiguous diffs can escalate to stronger models.
Specialist agents can inspect security, tests, performance, or architecture without making every run maximum-cost.
Some teams may prefer bring-your-own-key so the model provider bills tokens directly.
This is how we think about pricing in Critique.

Critique's plans are built around shared review credits rather than per-developer seats. The current local pricing model is Solo at $19/mo with 750 credits, Pro at $49/mo with 3,000 credits, and Team at $149/mo with 10,000 credits plus frontier escalation lanes. The BYOK harness is $8/mo: Critique runs the orchestration layer, while OpenRouter or CrofAI bills model tokens separately.

The point is not "credits are magically cheaper."

The point is control. A team should be able to run broad, cheaper checks on everyday work and reserve expensive review for pull requests that can actually hurt production.

The buyer question changed
The old question was:

Which AI code review tool has the cheapest plan?

The better question is:

What is the cost per useful review on the pull requests that matter?

To answer that, model your own workload:

Monthly PR volume
Average changed files per PR
Sensitive paths: auth, billing, data, infra, dependencies
Private-repo CI/runtime cost
Expected reruns
False-positive tolerance
True positives that would actually block a bad merge
Then split PRs into tiers.

Example:

Low risk: docs, copy, simple UI
Medium risk: feature work, tests, internal APIs
High risk: auth, billing, permissions, migrations, infra, public APIs
Run the cheap path broadly. Escalate the risky path deliberately.

That one habit matters more than arguing over whether a seat, run, credit, or minute looks cheaper in isolation.

A practical checklist before buying
Before installing an AI review tool across every repo, ask:

Does pricing scale with seats, PRs, models, minutes, or all of the above?
Can we set review effort by path, branch, or risk tier?
Can maintainers control expensive reruns?
Can we start advisory-only before requiring the check?
Can we see what each review cost?
Are private-repo runtime minutes part of the bill?
Are model costs hidden, bundled, or directly billed through our own key?
If the vendor cannot explain this clearly, the pricing is not simple. It is just under-described.

Where a calculator helps
I do not think teams should pick AI review tooling from a pricing table alone.

Take one busy repository. Count a normal month of PRs. Split the PRs into low, medium, and high risk. Then estimate what each pricing model does to that workload.

That is why we made a small PR review cost calculator:

https://www.critique.sh/tools/pr-review-cost-calculator

Use it as a sanity check before turning any AI reviewer into a required gate.

A security checklist for AI-generated pull requests

Critique — Tue, 02 Jun 2026 04:31:12 +0000

AI-generated code is not automatically insecure.

The problem is that it can create convincing pull requests faster than teams can inspect them. The diff may be formatted well, the helper names may look reasonable, and the tests may be green. None of that proves the change preserved the security rules your app depends on.

When I review AI-generated PRs, I use a short checklist. It is close to the way we wrote Critique's [critique-review](https://www.critique.sh/skills/critique-review) skill: establish scope, map blast radius, trace risky paths, check authorization, and only report findings that are grounded in the actual code.

No vague "this might be risky" comments. If there is a security concern, it should point to a real path and a real failure mode.

1. Start with blast radius

Before reading every line, mark the parts of the system the PR touches.

Pay extra attention to changes involving:

Auth
Billing
Permissions
Data export or import
Migrations
Webhooks
Background jobs
Infrastructure
Public APIs
AI agents, tool calls, or model output

Not every AI-generated diff deserves the same review depth. A copy tweak does not need the same pass as a webhook handler. A CSS fix is not token validation. A UI-only change is not the same as a database migration.

The first question is simple:

What is the worst thing this PR can affect if it is wrong?

That answer decides how hard you review.

2. Trace untrusted input

Find anything that enters the system from outside:

Request bodies
Headers
Uploaded files
Webhook payloads
User-generated content
Retrieved documents
Model outputs
Agent instructions

Then follow where that data can go:

Database writes
Logs
Commands
Prompts
Tool calls
External APIs
Credentials

AI-generated code is often good at the happy path. It parses the payload, calls the helper, returns the response, and adds a test for the expected case.

Security review is mostly about the other cases.

What if the webhook payload is replayed? What if the uploaded file is bigger than expected? What if the retrieved document contains instructions for the model? What if a user passes another user's ID?

Write the path down if needed:

external input -> validation -> permission check -> side effect

If one of those steps is missing, that is where the review should slow down.

3. Check authorization, not just authentication

This is the mistake I see most often in generated code.

The PR checks that a user is logged in, but does not check whether that user can access the specific object.

Authentication asks:

Who are you?

Authorization asks:

Are you allowed to do this specific thing?

Ask:

Can user A access user B's object?
Can one tenant read another tenant's data?
Can a non-admin reach an admin-only path?
Did the change bypass an existing owner check?
Does the API enforce the same rule as the UI?

This is not enough:

if (!session.user) {
  throw new Error("Unauthorized")
}

You still need the object-level check:

const project = await getProject(projectId)

if (project.ownerId !== session.user.id) {
  throw new Error("Forbidden")
}

In a real multi-tenant app, even that may be too simple. You might need organization membership, role checks, feature policy, or plan limits.

The point is not the exact code. The point is that "logged in" is rarely the whole rule.

4. Treat model output as untrusted

If an LLM can influence a privileged action, its output is untrusted input.

That includes output used for:

Tool calls
File writes
Shell commands
API requests
Database updates
Workflow routing
Prompt construction

Prompt injection is not only a chatbot problem. It is a tool authorization problem.

The risky pattern looks like this:

model reads untrusted content -> model decides action -> app executes action

The fix is not just "use a better prompt." Prompts help, but they are not a security boundary.

Use boring controls:

Allowlist tools
Validate tool arguments outside the model
Scope credentials tightly
Require confirmation for sensitive writes
Keep read tools separate from write tools
Log tool calls
Fail closed when the request is unclear

If a PR adds agent behavior, review it like a new public API. Ask what it can read, what it can write, and what happens when the input is hostile.

5. Validate the fix

For security-sensitive changes, do not accept "looks patched."

Ask for one of:

A regression test
A reproducer
A before/after exploit path
A clear invariant the code now enforces

Good validation sounds like this:

Before: User A could request User B's invoice by ID.
After: The API checks organization membership before loading invoice details.
Test: A user from org_1 gets a 403 when requesting an invoice from org_2.

That is much better than:

Fixed auth bug.

The same rule applies to tests. A generated PR may include tests, but check what they prove. Happy-path coverage is useful. Boundary coverage is what catches the security bug.

Look for negative tests:

Logged-out user cannot access the endpoint
Normal user cannot access admin action
Tenant A cannot update Tenant B's settings
Invalid webhook signature is rejected
Replayed webhook event does not double-apply
Model output cannot call a disallowed tool

If the PR changes authorization and only tests the allowed case, the test suite is still missing the important part.

6. Keep review comments specific

The least useful security review is a wall of generic warnings.

Bad:

Make sure permissions are correct.

Better:

This endpoint checks that a session exists, but it does not verify that the requested invoice belongs to the caller's organization. A user who can obtain another invoice ID may be able to read it. Load the invoice through an organization-scoped query or compare the invoice organization against the caller's memberships before returning it.

That gives the author something to fix.

This is the part of Critique's critique-review skill I like most. It pushes the reviewer to separate findings from guesses. A real finding needs a code path, an impact, and a fix direction. If the evidence is incomplete, call it an open question instead of pretending it is a confirmed bug.

AI-generated code does not need a totally different review process.

It needs a stricter one.

Use the same standards you would use for human-written production code:

find the blast radius
trace untrusted input
check object-level authorization
treat model output as untrusted
require evidence for security fixes
keep findings grounded in code

The goal is not to block AI-generated PRs. The goal is to make them prove the same thing every production change should prove: the right users can do the right things, and the wrong users cannot.

If you want the review posture in reusable form, the public [critique-review](https://www.critique.sh/skills/critique-review) skill is built around that idea: fewer generic comments, more grounded findings.