<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Critique</title>
    <description>The latest articles on DEV Community by Critique (@critiquedotsh).</description>
    <link>https://dev.to/critiquedotsh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3963709%2Ff1bad91b-1351-4843-803a-ffdeaad824ba.png</url>
      <title>DEV Community: Critique</title>
      <link>https://dev.to/critiquedotsh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/critiquedotsh"/>
    <language>en</language>
    <item>
      <title>So I Made an Easy Cloud Coding Agent as an API</title>
      <dc:creator>Critique</dc:creator>
      <pubDate>Thu, 04 Jun 2026 00:41:35 +0000</pubDate>
      <link>https://dev.to/critiquedotsh/so-we-made-a-easy-cloud-coding-agent-as-a-api-4m4f</link>
      <guid>https://dev.to/critiquedotsh/so-we-made-a-easy-cloud-coding-agent-as-a-api-4m4f</guid>
      <description>&lt;p&gt;I got tired of watching coding agents spin up from scratch every single time I sent them a prompt. Cold starts, re-cloning massive monorepos, pasting the previous context into a synthetic prompt block — it worked, but it felt fundamentally wrong for agents that are supposed to &lt;em&gt;think&lt;/em&gt; in conversations.&lt;/p&gt;

&lt;p&gt;So we shipped persistent sessions for the &lt;a href="https://www.critique.sh/blog/coding-agent-api" rel="noopener noreferrer"&gt;Critique Coding Agent API&lt;/a&gt;. Here's what changed, why the harness matters, and why you should never run a coding agent without a review skill.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Agents That Forget
&lt;/h2&gt;

&lt;p&gt;When we first released the Coding Agent API, follow-ups were honest but clunky: every follow-up was a brand-new job. The previous output was replayed as plain text into a fresh sandbox.&lt;/p&gt;

&lt;p&gt;It was the right MVP. It billed predictably. It never pretended a dead sandbox was alive.&lt;/p&gt;

&lt;p&gt;But it was the wrong long-term shape. If your internal bot fixes a migration, then wants a follow-up test, then wants a small doc tweak — you don't want three cold starts. You want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One&lt;/strong&gt; repository checkout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One&lt;/strong&gt; OpenCode session&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A control plane that understands turns&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Changed: Persistent Sessions
&lt;/h2&gt;

&lt;p&gt;After the first turn completes, the run now enters &lt;code&gt;idle&lt;/code&gt; status. The E2B sandbox and OpenCode server &lt;strong&gt;stay up&lt;/strong&gt; until &lt;code&gt;sessionExpiresAt&lt;/code&gt; or until you explicitly POST &lt;code&gt;endSession: true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The next prompt you send is delivered as a &lt;strong&gt;real message&lt;/strong&gt; in that same session — not a synthetic "prior run output" block in a brand-new sandbox.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (Chained MVP):&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Turn 1 completes → Sandbox killed → Turn 2 = new job + pasted prior summary&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Now (Persistent):&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Turn 1 completes → &lt;code&gt;idle&lt;/code&gt; → Sandbox warm → Turn 2 = message into same OpenCode session&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Same &lt;code&gt;run.id&lt;/code&gt;. Same checkout. Same context. Just the next turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works Under the Hood
&lt;/h2&gt;

&lt;p&gt;On the first turn, Critique:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates an E2B sandbox from the OpenCode template&lt;/li&gt;
&lt;li&gt;Clones your repository at the requested ref&lt;/li&gt;
&lt;li&gt;Bootstraps tooling and starts &lt;code&gt;opencode serve&lt;/code&gt; on localhost inside the VM&lt;/li&gt;
&lt;li&gt;Opens an OpenCode session&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of killing that sandbox after completion, we now store &lt;strong&gt;session bindings&lt;/strong&gt; (sandbox ID, OpenCode base URL, session ID) on the job and mark the run idle with an expiry aligned to your sandbox timeout.&lt;/p&gt;

&lt;p&gt;When you queue a follow-up, &lt;a href="https://upstash.com/docs/qstash" rel="noopener noreferrer"&gt;QStash&lt;/a&gt; reconnects to the same sandbox, verifies OpenCode health, and POSTs your new prompt to &lt;code&gt;/session/{id}/message&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If OpenCode is unhealthy or the session aged out, the messages route returns a conflict — and you can still fall back to the older chained run behavior. We'd rather spawn a fresh sandbox than silently corrupt repo state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why We Chose OpenCode as the Harness
&lt;/h2&gt;

&lt;p&gt;We researched the open-source options and kept the MVP on OpenCode. Not because it was the only agent OS out there, but because the repo already had a hardened OpenCode + E2B path — and because OpenCode's skill system gives us something the others couldn't: a portable, preloaded review discipline baked directly into the agent's runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Runtime Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenCode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The embedded engine — exposes a headless HTTP server with sessions, messages, diffs, shell, files, and generated SDK support. Our sandbox worker already uses this server path.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;E2B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The isolation layer — gives us ephemeral repo clones, command execution, environment injection, and sandbox teardown.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenHands&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On the watchlist. A larger open-source agent platform and SDK. Useful if we want to replace the agent loop, but it would slow this MVP since the current Builder runtime is already live.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  One Review Skill, Three Agent Operating Systems
&lt;/h3&gt;

&lt;p&gt;Most coding agents can write code faster than most teams can reliably audit it. That is already true in 2026. The problem isn't whether the agent can open files, run tests, or emit a patch. The problem is that &lt;strong&gt;review quality still drifts if you leave the job at the level of a generic prompt&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;"Review this PR" sounds precise to a human and underspecified to a model. One harness will produce style commentary. Another will summarize the diff and call it a review. Another will confidently escalate a weak hunch into a merge blocker because nothing in its instructions told it how to separate a verified finding from an open question.&lt;/p&gt;

&lt;p&gt;That is exactly the hole &lt;code&gt;critique-review&lt;/code&gt; closes. And it works across all the major agent operating systems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic — Claude Code&lt;/strong&gt;&lt;br&gt;
Native skills, subagents, project memory, and background delegation make Claude a strong home for a dedicated review persona.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nous Research — Hermes Agent&lt;/strong&gt;&lt;br&gt;
Hermes treats skills as portable procedural memory and can carry the same review discipline across CLI, messaging, and long-lived remote sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI — Codex&lt;/strong&gt;&lt;br&gt;
Codex gives the skill a durable place inside CLI, IDE, app, and repo-local workflows, with &lt;code&gt;AGENTS.md&lt;/code&gt; and team-shared skills for repeatability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenCode — Our Harness Choice&lt;/strong&gt;&lt;br&gt;
For the Coding Agent API, OpenCode is the fit. It loads &lt;code&gt;critique-review&lt;/code&gt; through the project skill path, reads the supporting reference files for output contract, intake and triage, stack lenses, and review rubric — then generates its verdict. That preload is why we chose it. The agent doesn't improvise a rubric; it follows one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Prompt-Only Loop:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ask agent to review → Agent improvises rubric → Mixed quality comments → Human re-validates everything&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The Critique-Review Loop:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Load skill → Establish scope + risk map → Verify before reporting → Findings first + explicit verdict&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Model Freedom: Bring Your Own Brain
&lt;/h2&gt;

&lt;p&gt;The Coding Agent API doesn't lock you into a single model provider. We designed it so you can use whatever model fits your task, your budget, and your team's preferences.&lt;/p&gt;
&lt;h3&gt;
  
  
  Managed Billing
&lt;/h3&gt;

&lt;p&gt;Use our model catalog, plan gates, E2B runtime, and credit accounting. Pick from Anthropic, OpenAI, Moonshot, and more — we handle the rest.&lt;/p&gt;
&lt;h3&gt;
  
  
  OpenRouter Billing
&lt;/h3&gt;

&lt;p&gt;Paste your &lt;code&gt;sk-or-v1-...&lt;/code&gt; key. Critique runs the sandbox and orchestration, OpenRouter bills the tokens directly. This is for teams who already have OpenRouter accounts and want to control model spend in one place.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why This Matters for Review and Codegen
&lt;/h3&gt;

&lt;p&gt;When we tested the same PR on the same model lane (Moonshot Kimi K2.6) with and without the &lt;code&gt;critique-review&lt;/code&gt; skill, the difference wasn't the model — it was the procedure. The model was identical. The skill changed the calibration.&lt;/p&gt;

&lt;p&gt;This is why model freedom matters: &lt;strong&gt;the discipline should travel, not depend on a specific vendor's prompt tuning&lt;/strong&gt;. Whether you run Claude Sonnet for a complex refactor or a cheaper model for a routine dependency bump, &lt;code&gt;critique-review&lt;/code&gt; ensures the review output follows the same artifact shape: severity, file or line, impact, failure mode, fix direction, verdict.&lt;/p&gt;


&lt;h2&gt;
  
  
  Real Experiment: Same PR, Same Model, Only the Skill Changed
&lt;/h2&gt;

&lt;p&gt;The cleanest way to test a review skill is to keep the code input fixed and change only the review procedure. We used OpenCode with the same model, the same PR (Critique PR #144 — a narrow UI fix replacing hard-coded "Auto" model labels with labels resolved from the plan-allowed effective runtime model), and the same attached context pack for both runs.&lt;/p&gt;

&lt;p&gt;The baseline run had no project-local review skill available. The second run exposed &lt;code&gt;critique-review&lt;/code&gt; through the project skill path.&lt;/p&gt;

&lt;p&gt;Same PR, same model, same context pack — the skill changes calibration, not the diff.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Prompt-Only OpenCode&lt;/th&gt;
&lt;th&gt;OpenCode + critique-review&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Actionable findings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 findings&lt;/td&gt;
&lt;td&gt;0 actionable findings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Treatment of unseen consumers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Escalated as a finding even though the attached context could not verify other call sites.&lt;/td&gt;
&lt;td&gt;Downgraded to residual risk and suggested a typecheck instead of claiming a bug.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Treatment of missing tests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Escalated as its own finding.&lt;/td&gt;
&lt;td&gt;Recorded in checks and residual risk instead of turning it into a blocker for a narrow UI-label fix.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Blast-radius framing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Broader, more defensive, less bounded to the actual changed behavior.&lt;/td&gt;
&lt;td&gt;Explicitly bounded to automation settings UI with no auth or data-path changes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Verdict&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conditionally approved&lt;/td&gt;
&lt;td&gt;No objection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observed harness behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct review output only.&lt;/td&gt;
&lt;td&gt;Loaded &lt;code&gt;critique-review&lt;/code&gt; and read four supporting reference files before answering.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Interpretation:&lt;/strong&gt; the skill did not make the model "nicer"; it made the model &lt;strong&gt;stricter about evidence and more conservative about what counts as a finding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The baseline review isn't absurd. It spots plausible follow-up work. The problem is calibration. It promotes unverifiable concerns into findings. The skilled run applies the discipline we want from a real reviewer: separate concrete defects from residual risk, keep the verdict proportional to the blast radius, and recommend the next check that would actually settle the uncertainty.&lt;/p&gt;


&lt;h2&gt;
  
  
  What the Skill Actually Changes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For the Agent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It stops treating review as free-form prose and starts from review mode, diff shape, and blast radius.&lt;/li&gt;
&lt;li&gt;It is told to read tests, trace data flow, and verify claims before escalating them.&lt;/li&gt;
&lt;li&gt;It separates findings from open questions instead of collapsing uncertainty into noise.&lt;/li&gt;
&lt;li&gt;It ends with a merge-shaped artifact: severity, file or line, impact, failure mode, fix direction, verdict.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For the Team:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The review standard travels across tools instead of living inside one vendor prompt box.&lt;/li&gt;
&lt;li&gt;The same policy can be reused by humans, local agents, background agents, and CI-style automation.&lt;/li&gt;
&lt;li&gt;Review quality becomes easier to inspect because the artifact shape is stable from run to run.&lt;/li&gt;
&lt;li&gt;The team can upgrade harnesses later without throwing away its review discipline.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Who Persistent Sessions Are For
&lt;/h2&gt;

&lt;p&gt;Persistent sessions reward multi-step automation. One-shot scripts can stay on chained fallbacks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;Typical Job&lt;/th&gt;
&lt;th&gt;Why Persistent Sessions Help&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform Engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Own an internal "fix bot" or codegen service&lt;/td&gt;
&lt;td&gt;Ticket → code → tests → PR — avoid re-cloning large monorepos on every message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Developer Experience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wire Critique into Backstage or a custom portal&lt;/td&gt;
&lt;td&gt;Iterative refactors from product specs — same run ID maps to a real agent thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security / Compliance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Remediate findings with human checkpoints&lt;/td&gt;
&lt;td&gt;Findings batch → patch → verification turn — session continuity keeps branch context intact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Single-shot CI Scripts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Nightly dependency bump&lt;/td&gt;
&lt;td&gt;Chained fallback is fine; idle adds little value&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Quickstart: Create a Run and Wait for Idle
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;crt_&lt;/code&gt; keys. New keys include Builder scopes; older keys may need rotation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://critique.sh/api/v1/coding-agent/runs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer crt_..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "repository": "acme/web",
    "prompt": "Add Stripe webhook signature verification and tests.",
    "modelId": "anthropic/claude-sonnet-4.6",
    "billing": { "mode": "managed" },
    "publish": { "mode": "draft_pr" },
    "validationMode": "tests"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A created run returns &lt;code&gt;run.id&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, repository metadata, selected model, events, and a status URL. Poll the status endpoint until you hit idle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Poll until status is idle and sessionActive is true&lt;/span&gt;
curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="s2"&gt;"https://critique.sh/api/v1/coding-agent/runs/{run_id}?patch=1"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer crt_..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full Script Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CRT_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CRT_API_KEY&lt;/span&gt;:?set&lt;span class="p"&gt; CRT_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;acme&lt;/span&gt;&lt;span class="p"&gt;/web&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nv"&gt;RUN_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;
  curl &lt;span class="nt"&gt;-sS&lt;/span&gt; https://critique.sh/api/v1/coding-agent/runs &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CRT_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"{
      &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;repository&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
      &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;prompt&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Add Stripe webhook signature verification and unit tests.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
      &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;modelId&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-sonnet-4.6&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
      &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;billing&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: { &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;mode&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;managed&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; },
      &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;publish&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: { &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;mode&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;draft_pr&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; },
      &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;validationMode&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;tests&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;
    }"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.run.id'&lt;/span&gt;
&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Run id: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RUN_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Stream live OpenCode activity while the turn executes&lt;/span&gt;
curl &lt;span class="nt"&gt;-N&lt;/span&gt; &lt;span class="s2"&gt;"https://critique.sh/api/v1/coding-agent/runs/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RUN_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/stream"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CRT_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Sending a Follow-Up Into the Same Session
&lt;/h2&gt;

&lt;p&gt;Once the run is &lt;code&gt;idle&lt;/code&gt; and &lt;code&gt;sessionActive&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;, just POST a new message. No re-clone, no cold start.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://critique.sh/api/v1/coding-agent/runs/&lt;span class="o"&gt;{&lt;/span&gt;run_id&lt;span class="o"&gt;}&lt;/span&gt;/messages &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer crt_..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "prompt": "Now add a regression test for expired signatures.",
    "publish": { "mode": "draft_pr" }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is delivered as a real message in the same OpenCode session.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Billing Modes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;How It Works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Managed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spends Critique credits. Uses our model catalog, plan gates, E2B runtime, and credit accounting.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenRouter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Paste your &lt;code&gt;sk-or-v1-...&lt;/code&gt; key. Critique runs the sandbox, OpenRouter bills the tokens.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Example with OpenRouter billing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://critique.sh/api/v1/coding-agent/runs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer crt_..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "repository": "acme/web",
    "prompt": "Migrate the settings page to server actions.",
    "modelId": "openai/gpt-5.4",
    "billing": {
      "mode": "openrouter",
      "openRouterApiKey": "sk-or-v1-..."
    },
    "publish": {
      "mode": "draft_pr",
      "branch": "critique-agent/settings-server-actions"
    }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Output Contract
&lt;/h2&gt;

&lt;p&gt;The API returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Status and activity events&lt;/li&gt;
&lt;li&gt;Assistant summary&lt;/li&gt;
&lt;li&gt;Changed paths and diff stats&lt;/li&gt;
&lt;li&gt;Optional patch text&lt;/li&gt;
&lt;li&gt;Draft PR metadata when publishing is enabled&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing the Session
&lt;/h2&gt;

&lt;p&gt;When you're done, explicitly end the session to free the sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://critique.sh/api/v1/coding-agent/runs/{run_id}/messages"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer crt_..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{ "endSession": true }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The Coding Agent API is built to &lt;strong&gt;implement a task&lt;/strong&gt; — not judge one. Critique's review and Change Passport products judge a proposed merge. This API is the other side: the engine that writes the code.&lt;/p&gt;

&lt;p&gt;But here's the thing: the same discipline that makes &lt;code&gt;critique-review&lt;/code&gt; the best portable review skill for Claude Code, Hermes, Codex, and OpenCode is the discipline we preload into every Coding Agent API run. The agent doesn't just write code and hope. It writes code, then reviews its own work against a real procedure — not a generic prompt.&lt;/p&gt;

&lt;p&gt;Persistent sessions make that engine conversational. Model freedom makes it affordable. The preloaded skill makes it reliable.&lt;/p&gt;

&lt;p&gt;You send a prompt, the agent works, the sandbox stays warm, you send the next prompt into the same context. No cold starts. No pasted summaries pretending to be memory. No improvised rubrics pretending to be review.&lt;/p&gt;

&lt;p&gt;Just turns in a thread — the way agents should think.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Quick answers for high-intent queries:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Short Answer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;What is the best code review skill for Claude Code?&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;critique-review&lt;/code&gt; is a strong default when you want a portable PR review procedure inside Claude Code. Use Critique instead when you need hosted GitHub checks, policy, and merge control.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What is the best Codex skill for PR review?&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;critique-review&lt;/code&gt; fits Codex especially well because it works as a repo-local skill with &lt;code&gt;AGENTS.md&lt;/code&gt;, reusable references, and a path into automations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What is the best OpenCode skill for pull request review?&lt;/td&gt;
&lt;td&gt;For a portable review workflow, &lt;code&gt;critique-review&lt;/code&gt; is the best fit. We tested it on the same PR and same model lane used for the baseline run.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is critique-review a Cursor Bugbot alternative?&lt;/td&gt;
&lt;td&gt;As a free portable skill, yes for agent-side review behavior. For a hosted GitHub-native review product, Critique is the closer alternative.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What is a cheaper CodeRabbit alternative?&lt;/td&gt;
&lt;td&gt;Start with the free &lt;code&gt;critique-review&lt;/code&gt; skill for the lowest-cost entry point. Move to Critique if you need GitHub-native routing, artifacts, and PR control at team scale.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What is the difference between critique-review and Critique?&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;critique-review&lt;/code&gt; is the portable open skill. Critique is the hosted GitHub review control plane that adds checks, policy, merge-boundary controls, and team-grade review operations.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Check out the &lt;a href="https://www.critique.sh/blog/coding-agent-api" rel="noopener noreferrer"&gt;Coding Agent API docs&lt;/a&gt; and the &lt;a href="https://www.critique.sh/blog/coding-agent-api-persistent-sessions" rel="noopener noreferrer"&gt;persistent sessions deep-dive&lt;/a&gt; for the full reference. Create an API key and try the &lt;a href="https://critique.sh" rel="noopener noreferrer"&gt;Builder UI&lt;/a&gt; to see it in action.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>automation</category>
    </item>
    <item>
      <title>I've spent $60k worth of openai tokens via codex building a few apps. How can I now get users?</title>
      <dc:creator>Critique</dc:creator>
      <pubDate>Thu, 04 Jun 2026 00:13:11 +0000</pubDate>
      <link>https://dev.to/critiquedotsh/ive-spent-60k-worth-of-openai-tokens-via-codex-building-a-few-apps-how-can-i-now-get-users-55e2</link>
      <guid>https://dev.to/critiquedotsh/ive-spent-60k-worth-of-openai-tokens-via-codex-building-a-few-apps-how-can-i-now-get-users-55e2</guid>
      <description></description>
      <category>discuss</category>
      <category>marketing</category>
      <category>openai</category>
      <category>startup</category>
    </item>
    <item>
      <title>Trying to find funding for startups in Ireland? So am I, here what I found.</title>
      <dc:creator>Critique</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:25:28 +0000</pubDate>
      <link>https://dev.to/critiquedotsh/trying-to-find-funding-for-startups-in-ireland-so-am-i-here-what-i-found-56d1</link>
      <guid>https://dev.to/critiquedotsh/trying-to-find-funding-for-startups-in-ireland-so-am-i-here-what-i-found-56d1</guid>
      <description>&lt;p&gt;I'll be honest with you. A few weeks ago I had a mild crisis at my desk. Not a dramatic one — no throwing laptops or anything. Just that quiet, specific dread when you look at your roadmap and realise the next six months don't add up unless you do something about money.&lt;/p&gt;

&lt;p&gt;So I did what I always do. I went full nerd on it.&lt;/p&gt;

&lt;p&gt;I spent more evenings than I'd like to admit reading through Enterprise Ireland PDFs, trawling fund websites, messaging founders who'd been through various programmes, and basically building a mental map of the entire Irish funding ecosystem. Not the LinkedIn version where everything is "thrilled to announce" and "humbled by the journey." The real thing. The stuff you'd tell a friend over a pint.&lt;/p&gt;

&lt;p&gt;This post is that conversation.&lt;/p&gt;

&lt;p&gt;I'm writing it partly to crystallise my own thinking, partly because the information is genuinely scattered and hard to navigate, and partly because — look — if I'm going to spend hours figuring this out, I may as well make it useful for someone else. If you're building something in Ireland and thinking about how to fund it, hopefully this saves you a few nights.&lt;/p&gt;

&lt;p&gt;Before we get into it: yes, I'm actively looking at this for Critique.sh. We're an AI-powered code review platform — think multi-agent pull request intelligence for engineering teams. So my lens is very much "what's relevant for an AI-first B2B developer tool coming out of Ireland." I'll try to be useful beyond that niche, but I won't pretend to be neutral. These are my real notes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Lay of the Land
&lt;/h2&gt;

&lt;p&gt;Here's the thing about Irish funding that surprised me when I actually dug in: it's more developed than the startup community often gives it credit for. The complaining about it being a small pond is real, but it's also a bit outdated. There's actual capital here now. There are funds that have done the work, backed companies through exits, and come out the other side with both money and conviction.&lt;/p&gt;

&lt;p&gt;The challenge isn't that funding doesn't exist. It's that the path isn't obvious and the information is terrible. Official websites are dry. Blog posts are two years out of date. Programme pages tell you about the cohort that just closed and nothing about when the next one opens.&lt;/p&gt;

&lt;p&gt;I'm going to try to fix that, at least a little.&lt;/p&gt;

&lt;p&gt;The ecosystem basically breaks into three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Non-dilutive early support&lt;/strong&gt; — accelerators, grants, supports that help you get started without giving up equity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seed-stage capital&lt;/strong&gt; — first real money, usually €100k–€1.5m&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Growth-stage capital&lt;/strong&gt; — Series A and beyond, once you've proved something
Most founders I've talked to have gone through all three in sequence, with Enterprise Ireland weaving through everything like connective tissue. Let's go layer by layer.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Layer One: Before You Take Any Money
&lt;/h2&gt;

&lt;p&gt;If you're genuinely early — idea is sharp, maybe an MVP exists, but you haven't found product-market fit yet — the best move is to not give up equity. Full stop. Ireland has some surprisingly good programmes here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise Ireland New Frontiers
&lt;/h3&gt;

&lt;p&gt;This is the backbone. It's been running for years, it's run across 18 locations (universities, technological universities around the country), and it offers a support package that Enterprise Ireland values at over €40k. The headline number is a €15k tax-free stipend in Phase Two.&lt;/p&gt;

&lt;p&gt;Zero equity. None.&lt;/p&gt;

&lt;p&gt;I've spoken to four or five founders who went through New Frontiers and the reaction is consistent: it's not a startup school in the fluffy sense, it actually forces you to think like a business. It's competitive to get into, but if you're building something with genuine commercial ambition, the application is worth doing.&lt;/p&gt;

&lt;p&gt;The thing I didn't fully appreciate until someone explained it to me: New Frontiers also functions as a credentialing signal. If you've been through it, Enterprise Ireland and the VC ecosystem take you slightly more seriously. That's worth something.&lt;/p&gt;

&lt;h3&gt;
  
  
  NDRC Pre-Accelerator
&lt;/h3&gt;

&lt;p&gt;NDRC runs through RDI Hub and Republic of Work. Shorter and more sprint-like than New Frontiers. The energy is "build fast, show up, don't be precious."&lt;/p&gt;

&lt;p&gt;It's well suited for founders who need to test whether an idea actually holds up under pressure before committing to a longer programme. Spring and Summer cohorts. Good for getting reps in with your pitch and forcing yourself to talk to customers.&lt;/p&gt;

&lt;h3&gt;
  
  
  NovaUCD AI Ecosystem Accelerator
&lt;/h3&gt;

&lt;p&gt;This one caught my eye specifically because of where Critique.sh sits.&lt;/p&gt;

&lt;p&gt;It's run through NovaUCD in partnership with CeADAR — Ireland's national AI centre — and funded through European Digital Innovation Hubs. Six months, AI-first focus, includes commercial traction mentoring, fundraising support, technical depth, and a showcase event in October.&lt;/p&gt;

&lt;p&gt;The third edition just kicked off in 2026. For anyone building something where AI is the actual core of the product (not "AI-powered" as a marketing tag, but genuinely AI-native), this is one of the most relevant programmes in the country right now. I'm actively looking at this one.&lt;/p&gt;

&lt;h3&gt;
  
  
  CIRCULÉIRE Circular Venture Accelerator
&lt;/h3&gt;

&lt;p&gt;Niche, but I'm listing it because the right founder should know it exists. If your startup is in the circular economy space — materials, waste, sustainable manufacturing — CIRCULÉIRE is well-resourced, offers €5k equity-free plus genuine industry connections through Irish Manufacturing Research, and the mentoring is sector-specific rather than generic.&lt;/p&gt;

&lt;p&gt;Their 2026 deadline has just passed, but worth bookmarking.&lt;/p&gt;

&lt;h3&gt;
  
  
  NextWave
&lt;/h3&gt;

&lt;p&gt;Women-founded startup accelerator. I'm not the target audience here but I've heard strong things from founders who went through it. If you are the target audience — or you know someone who is — worth amplifying. Good community, real support.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer Two: The Cheque Writers
&lt;/h2&gt;

&lt;p&gt;After the accelerator stage you need actual capital. This is where the landscape gets more interesting, and honestly where a lot of Irish founders I talk to have the fuzziest picture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Elkstone
&lt;/h3&gt;

&lt;p&gt;Elkstone has become the most visible name in Irish early-stage VC for good reason. They closed a €100m fund — the largest dedicated early-stage fund in Ireland — and have backed over 40 Irish companies. Flipdish. LetsGetChecked. Manna.&lt;/p&gt;

&lt;p&gt;What I noticed: their Fund II is structured with EIIS tax relief, which matters because it makes the fund attractive to Irish high-net-worth angels as LPs. That means Elkstone's network has real pull in the domestic ecosystem, not just top-line capital.&lt;/p&gt;

&lt;p&gt;I cold-emailed them and got a thoughtful reply within a few days. That's not nothing. Some VCs leave you reading tea leaves for weeks.&lt;/p&gt;

&lt;p&gt;Their focus: capital-light, internationally scalable tech. If you're building something that can work in Dublin and then in Berlin and then in Chicago — they want to hear about it.&lt;/p&gt;

&lt;p&gt;For Critique.sh specifically, the "developer tooling with AI at the core" angle fits the profile reasonably well. B2B SaaS for engineering teams travels internationally almost by default.&lt;/p&gt;

&lt;h3&gt;
  
  
  Furthr VC
&lt;/h3&gt;

&lt;p&gt;Formerly DBIC. Been around for ages and it shows — in a good way. Furthr is deeply relationship-driven and has genuine follow-on capacity, which matters more than founders often realise at seed. A VC who writes your first cheque but evaporates when you need a bridge is not a good VC.&lt;/p&gt;

&lt;p&gt;Multiple founders I've spoken to made the same observation: "Furthr actually stayed with us through the messy bits." That's the sentence you want to hear about a fund.&lt;/p&gt;

&lt;p&gt;They've facilitated over €200m in funding historically, with a strong B2B SaaS and medtech focus. Less useful if you're doing consumer, but for anything with an enterprise or prosumer sales motion, they're a strong fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise Equity
&lt;/h3&gt;

&lt;p&gt;Over 25 years of operation. Managing the €53m AIB Seed Capital Fund. Backed Phorest, StoryToys, a bunch of others I'd recognise from the ecosystem.&lt;/p&gt;

&lt;p&gt;Enterprise Equity feels more traditional than Elkstone or Furthr — they're not going to use the word "vibe" about a deal — but that's also a strength. They've seen multiple market cycles, they don't panic, and they have offices in Dublin, Cork, and Dundalk which actually means something in a country where being within reach of Munster can matter for a founder based there.&lt;/p&gt;

&lt;h3&gt;
  
  
  BVP (Business Venture Partners)
&lt;/h3&gt;

&lt;p&gt;Interesting structure: they blend equity and debt, which isn't common in the Irish market. If your startup can handle hybrid instruments, BVP is worth understanding.&lt;/p&gt;

&lt;p&gt;Focus areas: Climate, Health, Mobility, Emerging Tech. They also run an angel network called Connect X, which gives them a dual angle on sourcing deals and co-investing. For founders who want a VC who thinks about capital structure creatively, BVP is a conversation worth having.&lt;/p&gt;

&lt;h3&gt;
  
  
  MVB Ventures
&lt;/h3&gt;

&lt;p&gt;Newer, but genuinely serious. They're raising a €150m fund and write first cheques between €500k and €1.5m. Focus: Fintech, AI/ML, DefenceTech, EnergyTech, Quantum.&lt;/p&gt;

&lt;p&gt;They talk about doing "DNA-level" diligence, which can sound like marketing but from what I can tell they mean it — founders have described the process as intense but fair. The upside: if they back you, they actually believe it. Ireland/UK scope.&lt;/p&gt;

&lt;p&gt;For AI-first startups, MVB is one to add to the list. The AI/ML focus combined with the cheque size maps well to a seed/pre-Series A raise for a product with early enterprise traction.&lt;/p&gt;

&lt;h3&gt;
  
  
  SOSV
&lt;/h3&gt;

&lt;p&gt;Global deep-tech pre-seed firm with a Cork presence. Runs HAX (hardware/frontier tech) and IndieBio (life sciences). Not every startup fits — if you're pure software, this probably isn't your first call — but if you're doing anything with real-world physical systems or biotech, SOSV is the real deal globally, not just locally. The Irish Strategic Investment Fund is an LP, which grounds them in the ecosystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Not-So-Secret Weapon: Enterprise Ireland Direct
&lt;/h2&gt;

&lt;p&gt;I said Enterprise Ireland weaves through everything and I meant it. Even if you never take a direct instrument from them, their stamp on your company matters enormously for unlocking other capital.&lt;/p&gt;

&lt;p&gt;The main programmes worth knowing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-Seed Start Fund&lt;/strong&gt; — Up to €100k as a convertible loan note. You need an MVP and some early traction, even basic. Comes with mentoring and access to market research that would otherwise cost you. This is often a founder's first "real" capital and I've heard it described as the thing that bought them three crucial months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HPSU Feasibility Study Grant&lt;/strong&gt; — Up to €30k to stress-test your strategy. Useful specifically when you're trying to answer "is this actually a business" before you commit to raising a full round. No equity, structured as a grant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Innovative HPSU Fund&lt;/strong&gt; — Up to €800k in co-investment for high-potential startups at a later stage. If you're already growing and need to accelerate, this is significant capital with Enterprise Ireland as a co-investor, which tends to pull other investors in behind it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HBAN&lt;/strong&gt; — The official Irish angel network. I was slightly sceptical of this one going in but the regional groups are genuinely active. Angels who've been through the Irish startup journey themselves and actually understand what "building in Ireland and selling globally" looks like in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Rough Stage Map
&lt;/h2&gt;

&lt;p&gt;If you want the napkin version:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Where you are&lt;/th&gt;
&lt;th&gt;What to look at&lt;/th&gt;
&lt;th&gt;The logic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Idea, not yet building&lt;/td&gt;
&lt;td&gt;New Frontiers + NDRC&lt;/td&gt;
&lt;td&gt;Equity-free, build founder discipline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MVP exists, testing demand&lt;/td&gt;
&lt;td&gt;Pre-Seed Start Fund&lt;/td&gt;
&lt;td&gt;Government money before you need VC terms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI-first product&lt;/td&gt;
&lt;td&gt;NovaUCD AI Accelerator&lt;/td&gt;
&lt;td&gt;Tailored, networked, technically credible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First institutional raise&lt;/td&gt;
&lt;td&gt;Elkstone, Furthr, MVB&lt;/td&gt;
&lt;td&gt;Local cheques who understand the market&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaling with traction&lt;/td&gt;
&lt;td&gt;Enterprise Equity, BVP, Furthr follow-on&lt;/td&gt;
&lt;td&gt;Patient capital that knows downturn cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deep-tech / hardware / bio&lt;/td&gt;
&lt;td&gt;SOSV&lt;/td&gt;
&lt;td&gt;Global network, sector-specific expertise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What I'm Actually Doing With This
&lt;/h2&gt;

&lt;p&gt;For Critique.sh specifically, I'm looking most closely at the NovaUCD AI Accelerator (timing is good, sector fit is strong) and starting conversations with Elkstone and MVB on the VC side given the AI/B2B positioning.&lt;/p&gt;

&lt;p&gt;I'm also mid-applying to Enterprise Ireland's Pre-Seed Start Fund. The convertible note structure is clean, the mentoring is real, and it buys runway without a full round of dilution while I prove the next metrics milestone.&lt;/p&gt;

&lt;p&gt;The broader thing I'd say after all this research: the Irish ecosystem rewards founders who treat it as a network problem, not just a capital problem. The funds are smaller than London or Berlin. The cheques are smaller. But the community is tight, the introductions travel fast, and a warm word from the right person can move faster than a perfect cold deck.&lt;/p&gt;

&lt;p&gt;If you're building something here — or building something &lt;em&gt;from&lt;/em&gt; here — the infrastructure exists. You just have to know where to look.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Critique.sh is an AI code review platform built for engineering teams who actually care about what lands in production. Multi-agent analysis, GitHub integration, the works. If that sounds like something your team needs — or if you want to compare notes on the funding journey — find me on X &lt;a href="https://x.com/@rayk69420" rel="noopener noreferrer"&gt;@rayk69420&lt;/a&gt; or just try the product at &lt;a href="https://critique.sh" rel="noopener noreferrer"&gt;critique.sh&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>startup</category>
      <category>ai</category>
      <category>ireland</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>AI code review pricing is getting weird in 2026</title>
      <dc:creator>Critique</dc:creator>
      <pubDate>Tue, 02 Jun 2026 04:32:35 +0000</pubDate>
      <link>https://dev.to/critiquedotsh/ai-code-review-pricing-is-getting-weird-in-2026-5a1f</link>
      <guid>https://dev.to/critiquedotsh/ai-code-review-pricing-is-getting-weird-in-2026-5a1f</guid>
      <description>&lt;p&gt;AI code review pricing used to be easy to compare.&lt;/p&gt;

&lt;p&gt;How much per developer per month?&lt;/p&gt;

&lt;p&gt;That question is not useless, but it is no longer enough. In 2026, the actual bill can depend on seats, pull request volume, model usage, review effort, private-repo runner minutes, and whether the tool runs a shallow diff pass or an agentic review with broader repository context.&lt;/p&gt;

&lt;p&gt;The pricing page is only the start of the story.&lt;/p&gt;

&lt;p&gt;The four pricing shapes to compare&lt;br&gt;
If you are buying AI pull request review this year, you probably need to compare at least four models:&lt;/p&gt;

&lt;p&gt;Per-developer seats&lt;br&gt;
Usage-based review runs&lt;br&gt;
AI credits or model usage&lt;br&gt;
CI/runtime minutes for agentic review&lt;br&gt;
Those are not just different billing labels. They reward different behavior.&lt;/p&gt;

&lt;p&gt;Seat pricing is easy for finance. Usage pricing tracks workload better. AI credits expose the model bill. Runtime minutes show up when the review agent needs infrastructure, not just inference.&lt;/p&gt;

&lt;p&gt;The trap is comparing only the headline price.&lt;/p&gt;

&lt;p&gt;Seats are predictable, until usage is uneven&lt;br&gt;
CodeRabbit is the cleanest example of familiar seat pricing.&lt;/p&gt;

&lt;p&gt;As of this check, CodeRabbit documents Pro at $24 per developer per month when billed annually, or $30 month-to-month. Pro+ is listed at $48 per developer per month annually, or $60 month-to-month. Their docs also describe per-developer review limits and a usage-based add-on for eligible over-limit reviews.&lt;/p&gt;

&lt;p&gt;That is straightforward to budget.&lt;/p&gt;

&lt;p&gt;But it can still be awkward to right-size.&lt;/p&gt;

&lt;p&gt;A 6-person platform team touching auth, billing, queues, migrations, and infra may create more review risk than a 20-person team mostly shipping small UI changes. Seat count does not tell you how many PRs need deep review.&lt;/p&gt;

&lt;p&gt;The useful question is not:&lt;/p&gt;

&lt;p&gt;How many developers do we have?&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;Which pull requests are expensive if the reviewer misses something?&lt;/p&gt;

&lt;p&gt;Usage pricing matches work, but needs policy&lt;br&gt;
Cursor's Bugbot is the clearest recent shift.&lt;/p&gt;

&lt;p&gt;Cursor announced that Bugbot is moving from a $40 per-seat subscription to usage-based billing for Teams and Individual plans. They say the average Bugbot run costs about $1.00-$1.50, depending on PR size and complexity. They also connect usage billing to configurable effort levels, including deeper review settings.&lt;/p&gt;

&lt;p&gt;That makes sense. A one-file typo PR should not cost the same as a complicated refactor.&lt;/p&gt;

&lt;p&gt;But usage pricing needs guardrails.&lt;/p&gt;

&lt;p&gt;Before turning it on everywhere, decide:&lt;/p&gt;

&lt;p&gt;Which paths deserve deep review?&lt;br&gt;
Who can trigger expensive reruns?&lt;br&gt;
Should docs-only PRs get the same effort as auth changes?&lt;br&gt;
What is the monthly review budget?&lt;br&gt;
What counts as value: bugs found, risky merges blocked, or comment count?&lt;br&gt;
Without policy, usage-based review can become a slot machine attached to every pull request.&lt;/p&gt;

&lt;p&gt;GitHub Copilot adds another line item: runtime&lt;br&gt;
GitHub Copilot code review adds a different wrinkle.&lt;/p&gt;

&lt;p&gt;GitHub says Copilot code review is billed through AI Credits, and that private-repository reviews started consuming GitHub Actions minutes on June 1, 2026. GitHub's docs describe code review as having two cost components: AI credits for the model interaction, and Actions minutes for agentic capabilities like context gathering and tool use.&lt;/p&gt;

&lt;p&gt;That does not mean Copilot code review is bad.&lt;/p&gt;

&lt;p&gt;It means the bill can show up in more than one place.&lt;/p&gt;

&lt;p&gt;If your org already tracks Actions spend closely, fine. If Actions minutes are treated as background CI noise, review usage may be harder to notice until later.&lt;/p&gt;

&lt;p&gt;This is the new pattern: the cost of review is no longer only the model. It can also be the system around the model.&lt;/p&gt;

&lt;p&gt;Model choice is becoming a budget control&lt;br&gt;
This is the part most pricing pages still hide.&lt;/p&gt;

&lt;p&gt;Not every PR needs the strongest available model. Not every finding needs a frontier model to inspect it. A practical review system should let teams spend differently based on risk.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Routine PRs can use cheaper review passes.&lt;br&gt;
Auth, billing, infra, permissions, migrations, and public APIs can trigger deeper review.&lt;br&gt;
Large or ambiguous diffs can escalate to stronger models.&lt;br&gt;
Specialist agents can inspect security, tests, performance, or architecture without making every run maximum-cost.&lt;br&gt;
Some teams may prefer bring-your-own-key so the model provider bills tokens directly.&lt;br&gt;
This is how we think about pricing in Critique.&lt;/p&gt;

&lt;p&gt;Critique's plans are built around shared review credits rather than per-developer seats. The current local pricing model is Solo at $19/mo with 750 credits, Pro at $49/mo with 3,000 credits, and Team at $149/mo with 10,000 credits plus frontier escalation lanes. The BYOK harness is $8/mo: Critique runs the orchestration layer, while OpenRouter or CrofAI bills model tokens separately.&lt;/p&gt;

&lt;p&gt;The point is not "credits are magically cheaper."&lt;/p&gt;

&lt;p&gt;The point is control. A team should be able to run broad, cheaper checks on everyday work and reserve expensive review for pull requests that can actually hurt production.&lt;/p&gt;

&lt;p&gt;The buyer question changed&lt;br&gt;
The old question was:&lt;/p&gt;

&lt;p&gt;Which AI code review tool has the cheapest plan?&lt;/p&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;p&gt;What is the cost per useful review on the pull requests that matter?&lt;/p&gt;

&lt;p&gt;To answer that, model your own workload:&lt;/p&gt;

&lt;p&gt;Monthly PR volume&lt;br&gt;
Average changed files per PR&lt;br&gt;
Sensitive paths: auth, billing, data, infra, dependencies&lt;br&gt;
Private-repo CI/runtime cost&lt;br&gt;
Expected reruns&lt;br&gt;
False-positive tolerance&lt;br&gt;
True positives that would actually block a bad merge&lt;br&gt;
Then split PRs into tiers.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Low risk: docs, copy, simple UI&lt;br&gt;
Medium risk: feature work, tests, internal APIs&lt;br&gt;
High risk: auth, billing, permissions, migrations, infra, public APIs&lt;br&gt;
Run the cheap path broadly. Escalate the risky path deliberately.&lt;/p&gt;

&lt;p&gt;That one habit matters more than arguing over whether a seat, run, credit, or minute looks cheaper in isolation.&lt;/p&gt;

&lt;p&gt;A practical checklist before buying&lt;br&gt;
Before installing an AI review tool across every repo, ask:&lt;/p&gt;

&lt;p&gt;Does pricing scale with seats, PRs, models, minutes, or all of the above?&lt;br&gt;
Can we set review effort by path, branch, or risk tier?&lt;br&gt;
Can maintainers control expensive reruns?&lt;br&gt;
Can we start advisory-only before requiring the check?&lt;br&gt;
Can we see what each review cost?&lt;br&gt;
Are private-repo runtime minutes part of the bill?&lt;br&gt;
Are model costs hidden, bundled, or directly billed through our own key?&lt;br&gt;
If the vendor cannot explain this clearly, the pricing is not simple. It is just under-described.&lt;/p&gt;

&lt;p&gt;Where a calculator helps&lt;br&gt;
I do not think teams should pick AI review tooling from a pricing table alone.&lt;/p&gt;

&lt;p&gt;Take one busy repository. Count a normal month of PRs. Split the PRs into low, medium, and high risk. Then estimate what each pricing model does to that workload.&lt;/p&gt;

&lt;p&gt;That is why we made a small PR review cost calculator:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.critique.sh/tools/pr-review-cost-calculator" rel="noopener noreferrer"&gt;https://www.critique.sh/tools/pr-review-cost-calculator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use it as a sanity check before turning any AI reviewer into a required gate.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>github</category>
      <category>codereview</category>
      <category>programming</category>
    </item>
    <item>
      <title>A security checklist for AI-generated pull requests</title>
      <dc:creator>Critique</dc:creator>
      <pubDate>Tue, 02 Jun 2026 04:31:12 +0000</pubDate>
      <link>https://dev.to/critiquedotsh/a-security-checklist-for-ai-generated-pull-requests-2n3j</link>
      <guid>https://dev.to/critiquedotsh/a-security-checklist-for-ai-generated-pull-requests-2n3j</guid>
      <description>&lt;p&gt;AI-generated code is not automatically insecure.&lt;/p&gt;

&lt;p&gt;The problem is that it can create convincing pull requests faster than teams can inspect them. The diff may be formatted well, the helper names may look reasonable, and the tests may be green. None of that proves the change preserved the security rules your app depends on.&lt;/p&gt;

&lt;p&gt;When I review AI-generated PRs, I use a short checklist. It is close to the way we wrote Critique's &lt;code&gt;[critique-review]&lt;/code&gt;(&lt;a href="https://www.critique.sh/skills/critique-review" rel="noopener noreferrer"&gt;https://www.critique.sh/skills/critique-review&lt;/a&gt;) skill: establish scope, map blast radius, trace risky paths, check authorization, and only report findings that are grounded in the actual code.&lt;/p&gt;

&lt;p&gt;No vague "this might be risky" comments. If there is a security concern, it should point to a real path and a real failure mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Start with blast radius
&lt;/h2&gt;

&lt;p&gt;Before reading every line, mark the parts of the system the PR touches.&lt;/p&gt;

&lt;p&gt;Pay extra attention to changes involving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auth&lt;/li&gt;
&lt;li&gt;Billing&lt;/li&gt;
&lt;li&gt;Permissions&lt;/li&gt;
&lt;li&gt;Data export or import&lt;/li&gt;
&lt;li&gt;Migrations&lt;/li&gt;
&lt;li&gt;Webhooks&lt;/li&gt;
&lt;li&gt;Background jobs&lt;/li&gt;
&lt;li&gt;Infrastructure&lt;/li&gt;
&lt;li&gt;Public APIs&lt;/li&gt;
&lt;li&gt;AI agents, tool calls, or model output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not every AI-generated diff deserves the same review depth. A copy tweak does not need the same pass as a webhook handler. A CSS fix is not token validation. A UI-only change is not the same as a database migration.&lt;/p&gt;

&lt;p&gt;The first question is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What is the worst thing this PR can affect if it is wrong?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That answer decides how hard you review.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Trace untrusted input
&lt;/h2&gt;

&lt;p&gt;Find anything that enters the system from outside:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request bodies&lt;/li&gt;
&lt;li&gt;Headers&lt;/li&gt;
&lt;li&gt;Uploaded files&lt;/li&gt;
&lt;li&gt;Webhook payloads&lt;/li&gt;
&lt;li&gt;User-generated content&lt;/li&gt;
&lt;li&gt;Retrieved documents&lt;/li&gt;
&lt;li&gt;Model outputs&lt;/li&gt;
&lt;li&gt;Agent instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then follow where that data can go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database writes&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Commands&lt;/li&gt;
&lt;li&gt;Prompts&lt;/li&gt;
&lt;li&gt;Tool calls&lt;/li&gt;
&lt;li&gt;External APIs&lt;/li&gt;
&lt;li&gt;Credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI-generated code is often good at the happy path. It parses the payload, calls the helper, returns the response, and adds a test for the expected case.&lt;/p&gt;

&lt;p&gt;Security review is mostly about the other cases.&lt;/p&gt;

&lt;p&gt;What if the webhook payload is replayed? What if the uploaded file is bigger than expected? What if the retrieved document contains instructions for the model? What if a user passes another user's ID?&lt;/p&gt;

&lt;p&gt;Write the path down if needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;external input -&amp;gt; validation -&amp;gt; permission check -&amp;gt; side effect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one of those steps is missing, that is where the review should slow down.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Check authorization, not just authentication
&lt;/h2&gt;

&lt;p&gt;This is the mistake I see most often in generated code.&lt;/p&gt;

&lt;p&gt;The PR checks that a user is logged in, but does not check whether that user can access the specific object.&lt;/p&gt;

&lt;p&gt;Authentication asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Who are you?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Authorization asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are you allowed to do this specific thing?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can user A access user B's object?&lt;/li&gt;
&lt;li&gt;Can one tenant read another tenant's data?&lt;/li&gt;
&lt;li&gt;Can a non-admin reach an admin-only path?&lt;/li&gt;
&lt;li&gt;Did the change bypass an existing owner check?&lt;/li&gt;
&lt;li&gt;Does the API enforce the same rule as the UI?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unauthorized&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You still need the object-level check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;project&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getProject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ownerId&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Forbidden&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a real multi-tenant app, even that may be too simple. You might need organization membership, role checks, feature policy, or plan limits.&lt;/p&gt;

&lt;p&gt;The point is not the exact code. The point is that "logged in" is rarely the whole rule.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Treat model output as untrusted
&lt;/h2&gt;

&lt;p&gt;If an LLM can influence a privileged action, its output is untrusted input.&lt;/p&gt;

&lt;p&gt;That includes output used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool calls&lt;/li&gt;
&lt;li&gt;File writes&lt;/li&gt;
&lt;li&gt;Shell commands&lt;/li&gt;
&lt;li&gt;API requests&lt;/li&gt;
&lt;li&gt;Database updates&lt;/li&gt;
&lt;li&gt;Workflow routing&lt;/li&gt;
&lt;li&gt;Prompt construction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompt injection is not only a chatbot problem. It is a tool authorization problem.&lt;/p&gt;

&lt;p&gt;The risky pattern looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model reads untrusted content -&amp;gt; model decides action -&amp;gt; app executes action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is not just "use a better prompt." Prompts help, but they are not a security boundary.&lt;/p&gt;

&lt;p&gt;Use boring controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allowlist tools&lt;/li&gt;
&lt;li&gt;Validate tool arguments outside the model&lt;/li&gt;
&lt;li&gt;Scope credentials tightly&lt;/li&gt;
&lt;li&gt;Require confirmation for sensitive writes&lt;/li&gt;
&lt;li&gt;Keep read tools separate from write tools&lt;/li&gt;
&lt;li&gt;Log tool calls&lt;/li&gt;
&lt;li&gt;Fail closed when the request is unclear&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a PR adds agent behavior, review it like a new public API. Ask what it can read, what it can write, and what happens when the input is hostile.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Validate the fix
&lt;/h2&gt;

&lt;p&gt;For security-sensitive changes, do not accept "looks patched."&lt;/p&gt;

&lt;p&gt;Ask for one of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A regression test&lt;/li&gt;
&lt;li&gt;A reproducer&lt;/li&gt;
&lt;li&gt;A before/after exploit path&lt;/li&gt;
&lt;li&gt;A clear invariant the code now enforces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good validation sounds like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: User A could request User B's invoice by ID.
After: The API checks organization membership before loading invoice details.
Test: A user from org_1 gets a 403 when requesting an invoice from org_2.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is much better than:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fixed auth bug.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same rule applies to tests. A generated PR may include tests, but check what they prove. Happy-path coverage is useful. Boundary coverage is what catches the security bug.&lt;/p&gt;

&lt;p&gt;Look for negative tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logged-out user cannot access the endpoint&lt;/li&gt;
&lt;li&gt;Normal user cannot access admin action&lt;/li&gt;
&lt;li&gt;Tenant A cannot update Tenant B's settings&lt;/li&gt;
&lt;li&gt;Invalid webhook signature is rejected&lt;/li&gt;
&lt;li&gt;Replayed webhook event does not double-apply&lt;/li&gt;
&lt;li&gt;Model output cannot call a disallowed tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the PR changes authorization and only tests the allowed case, the test suite is still missing the important part.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Keep review comments specific
&lt;/h2&gt;

&lt;p&gt;The least useful security review is a wall of generic warnings.&lt;/p&gt;

&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Make sure permissions are correct.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This endpoint checks that a session exists, but it does not verify that the requested invoice belongs to the caller's organization. A user who can obtain another invoice ID may be able to read it. Load the invoice through an organization-scoped query or compare the invoice organization against the caller's memberships before returning it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives the author something to fix.&lt;/p&gt;

&lt;p&gt;This is the part of Critique's &lt;code&gt;critique-review&lt;/code&gt; skill I like most. It pushes the reviewer to separate findings from guesses. A real finding needs a code path, an impact, and a fix direction. If the evidence is incomplete, call it an open question instead of pretending it is a confirmed bug.&lt;/p&gt;

&lt;p&gt;AI-generated code does not need a totally different review process.&lt;/p&gt;

&lt;p&gt;It needs a stricter one.&lt;/p&gt;

&lt;p&gt;Use the same standards you would use for human-written production code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;find the blast radius&lt;/li&gt;
&lt;li&gt;trace untrusted input&lt;/li&gt;
&lt;li&gt;check object-level authorization&lt;/li&gt;
&lt;li&gt;treat model output as untrusted&lt;/li&gt;
&lt;li&gt;require evidence for security fixes&lt;/li&gt;
&lt;li&gt;keep findings grounded in code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to block AI-generated PRs. The goal is to make them prove the same thing every production change should prove: the right users can do the right things, and the wrong users cannot.&lt;/p&gt;

&lt;p&gt;If you want the review posture in reusable form, the public &lt;code&gt;[critique-review](https://www.critique.sh/skills/critique-review)&lt;/code&gt; skill is built around that idea: fewer generic comments, more grounded findings.&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>github</category>
      <category>devplusplus</category>
    </item>
  </channel>
</rss>
