DEV Community

Cover image for Dynamic Workflows in Opus 4.8: Build a Self-Verifying PR Reviewer
Mehmet TURAÇ
Mehmet TURAÇ

Posted on

Dynamic Workflows in Opus 4.8: Build a Self-Verifying PR Reviewer

You stopped being the loop

Most people use Opus 4.8 the way they used every model before it: open a chat, type a request, watch the cursor, correct it, repeat. That's a conversation. A dynamic workflow is something else entirely.

The shift is this: you stop being the loop. Instead, an orchestrator — plain code you control — spawns subagents you design, fanning out work in parallel, running steps in sequence, judging and merging results, and reporting back when the whole thing is done. Opus 4.8 can drive hundreds of parallel subagents inside a single workflow, with effort control per node so cheap steps stay cheap and hard steps think harder.

In this tutorial you'll learn the core patterns by building one concrete thing: a pull-request reviewer that fans out across correctness, security, and performance, then adversarially verifies every finding before it reaches you.

// You design the shape. The orchestrator runs it.
const found    = await parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS })))
const deduped  = dedupeByFileLine(found.flatMap(r => r.findings))
const verified = await parallel(deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT })))
const real     = verified.filter(v => v.refuted === false)
Enter fullscreen mode Exit fullscreen mode

By the end you'll know when to reach for parallel() versus pipeline(), how structured output schemas keep subagents composable, and where to set effort per node.

The mental model: it's a graph, not a prompt

Stop thinking "I send a prompt, I get a completion." Start thinking: an orchestrator runs a workflow graph, and each node is an agent call. The orchestrator is plain code. It decides what runs, in what order, and what to do with each result. Subagents are the leaf workers — each gets a focused prompt, a structured-output schema, and its own effort setting. The unit of work is no longer the prompt; it's the graph.

Two primitives compose every graph, and the difference between them is entirely about barriers — when the orchestrator blocks and waits.

parallel() is a barrier

parallel() fans work out to many subagents at once and resolves only when all of them return. Nothing downstream runs until the slowest node finishes. Use it for independent work that must be fully collected before the next decision — one subagent per review dimension, N-way verification, hundreds of concurrent checks.

// FAN-OUT: dimensions are independent → run them together
const found = await parallel(
  DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS, effort: "medium" }))
)
// barrier: every dimension has returned before we continue
const deduped = dedupeByFileLine(found.flatMap(r => r.findings)) // plain code, no agent
Enter fullscreen mode Exit fullscreen mode

Note the () => thunks. parallel() invokes them itself — it schedules the work; it doesn't receive already-started promises.

pipeline() enforces order

pipeline() chains stages where stage N+1 depends on stage N's output. Each stage blocks until its input exists, so the stages run strictly in sequence and the latencies add up. Reach for it when there's a true data dependency — you can't synthesize a review before findings exist, and you can't verify findings before they're deduplicated.

const review = await pipeline(
  () => parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))),
  (found)   => dedupeByFileLine(found.flatMap(r => r.findings)),
  (deduped) => parallel(deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT }))),
)
Enter fullscreen mode Exit fullscreen mode

Notice dedupeByFileLine is not an agent — deterministic work stays in code. You only spend a subagent where judgment is required.

The whole grammar: parallel for independence, pipeline for dependency. Real workflows alternate between the two, fanning out for breadth and chaining where order matters.

Structured outputs: typed, not parsed

Every agent() call above passes a schema. The model returns data shaped to that contract — FINDINGS, VERDICT, REVIEW — so you index fields instead of regexing prose. This is what lets the dedup and filter steps be plain code rather than yet another LLM call:

const real = verified.filter(v => v.refuted === false)
Enter fullscreen mode Exit fullscreen mode

Schemas are the seams that keep subagents composable. A node's output is machine-readable, so the next node — agent or code — consumes it without a parsing layer in between.

The worked example: a self-verifying PR reviewer

Most "AI code review" is one model, one prompt, one pass. It finds plausible bugs and reports them with equal confidence — including the ones that aren't real. Dynamic workflows let you do better: fan out across review dimensions in parallel, then make the model attack its own findings before reporting them. Here's the full pipeline.

Step 1: Fan out across dimensions

Run one subagent per review dimension. They don't depend on each other, so they execute concurrently behind a barrier.

const DIMENSIONS = [
  { name: "correctness", prompt: correctnessPrompt(diff) },
  { name: "security",    prompt: securityPrompt(diff) },
  { name: "performance", prompt: perfPrompt(diff) },
];

const found = await parallel(
  DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))
);
Enter fullscreen mode Exit fullscreen mode

Each agent() call is an isolated subagent with its own context window — the security reviewer never sees the performance reviewer's noise. { schema: FINDINGS } forces a structured output: an array of { file, line, severity, claim }, not prose you have to regex later.

Step 2: Dedup (plain code, not an agent)

Three reviewers will flag the same line. Merging is deterministic set logic — don't spend a model on it.

const deduped = dedupeByFileLine(found.flatMap(r => r.findings));
Enter fullscreen mode Exit fullscreen mode

flatMap flattens the per-dimension arrays into one list; dedupeByFileLine collapses entries sharing a (file, line) key. Use code wherever the answer is mechanical. Agents are for judgment, not joins.

Step 3: Adversarially verify

This is the step that kills false positives. For each surviving finding, spawn a skeptic subagent whose only job is to refute it.

const verified = await parallel(
  deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT }))
);
const real = verified.filter(v => v.refuted === false);
Enter fullscreen mode Exit fullscreen mode

refutePrompt(f) instructs the subagent: "Here is a claimed bug. Prove it's wrong — find the guard, the caller, the type that makes it safe." VERDICT is { refuted: boolean, reason: string }. A finding that survives a dedicated attacker is worth reporting; one that doesn't, isn't.

For higher-stakes findings, fan out N skeptics per finding and keep only what a majority can't refute — verification scales independently of review:

async function survivesQuorum(f, n = 3) {
  const verdicts = await parallel(
    Array.from({ length: n }, () => () => agent(refutePrompt(f), { schema: VERDICT }))
  );
  const refutals = verdicts.filter(v => v.refuted).length;
  return refutals <= Math.floor(n / 2); // a majority could not refute it
}
Enter fullscreen mode Exit fullscreen mode

This is a judge pattern: refutation is adjudication, kept separate from the generation in step 1. Asking a model to merely re-summarize its own findings launders the weak ones into the report. Refutation is a sharper filter than agreement.

Step 4: Synthesize

One agent turns confirmed findings into the review a human reads.

const review = await agent(synthesisPrompt(real), { schema: REVIEW });
Enter fullscreen mode Exit fullscreen mode

Wiring it together

const review = await pipeline(
  ()        => parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))),
  (found)   => dedupeByFileLine(found.flatMap(r => r.findings)),
  (deduped) => parallel(deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT }))),
  (verified, deduped) => synthesize(deduped, verified), // keep only refuted === false, then write
);
Enter fullscreen mode Exit fullscreen mode

pipeline() is sequential — each stage's output feeds the next. parallel() is the barrier inside stages 1 and 3.

Effort control per node

Not every node deserves the same compute. Set effort per call: skeptics run cheap because refutation is a narrow question; synthesis runs at high effort because it's the artifact a human trusts.

agent(refutePrompt(f),       { schema: VERDICT, effort: "low"  });
agent(synthesisPrompt(real), { schema: REVIEW,  effort: "high" });
Enter fullscreen mode Exit fullscreen mode

You spend reasoning where judgment is hard and conserve it where the work is mechanical — and a human still approves the final review before anything posts.

Pitfalls and best practices

Match the primitive to the dependency

parallel() returns when the slowest node finishes; pipeline() runs stages in sequence and accumulates their latency. Mismatching them is the most common cost mistake. Your review dimensions are independent, so fan them out — don't chain them.

// Good: 3 dimensions run concurrently, wall-time ≈ slowest dimension
const found = await parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS })))

// Bad: same work, ~3x the latency for no reason
const found = await pipeline(
  () => agent(DIMENSIONS[0].prompt, { schema: FINDINGS }),
  () => agent(DIMENSIONS[1].prompt, { schema: FINDINGS }),
  () => agent(DIMENSIONS[2].prompt, { schema: FINDINGS }),
)
Enter fullscreen mode Exit fullscreen mode

Reserve pipeline() for true data dependencies — verify needs dedup's output, so that edge stays sequential.

Dedup before you verify

Verification is the expensive phase: it can spawn N skeptics per finding. If correctness and security both flag auth.js:42, verifying twice burns budget for nothing. Collapse duplicates first with plain code — no agent required.

Keep a human at the merge

The synthesize step is your human-in-the-loop checkpoint. Confirmed findings are a recommendation, not an auto-commit — a person approves before anything lands.

Amplify signal, not noise

Fan-out multiplies whatever your base node produces, so the base node's reliability matters. Anthropic reports Opus 4.8 makes roughly 4x fewer silent code bugs than its predecessor; the more trustworthy each leaf reviewer is, the safer it is to run many of them in parallel.

When to reach for a workflow

A single agent is the right default. Reach for a dynamic workflow only when the task has structure you can name: independent dimensions that fan out in parallel, a verification step that must be adversarial rather than self-graded, or a synthesis pass that depends on confirmed inputs.

The PR-review example earns its workflow because each stage has a different shape — fan out, collapse in code, fan out again to refute, then synthesize. parallel() is the barrier; pipeline() enforces order; schemas keep the seams machine-readable; effort goes high on synthesis and low on the mechanical passes.

Open question: which of your "trust me" agent steps is actually an unverified claim waiting for a skeptic?

Top comments (0)