I tried Claude Code's new dynamic workflows with Opus 4.8 on a real research task.

The short version: the agent count was interesting, but the failure handling was more useful.
What I ran
I used the built-in workflow:
/deep-research <research question>
The question was broad enough to need multiple angles: understand an AI project related to medical imaging, compare possible use cases, look at competitors, and identify product weak spots.
After launching the workflow, I inspected it with:
/workflows
That view shows the phase structure, agent counts, token totals, and runtime.
The phase breakdown
My run split into five phases:
| Phase | Agents | Purpose |
|---|---|---|
| Scope | 1 | Define the research frame |
| Search | 6 | Search from several angles |
| Fetch | 28 | Retrieve and read sources |
| Verify | 75 | Check key claims |
| Synthesize | 1 | Produce the final report |
Total: 111 agents.
Important caveat: this does not mean 111 agents ran at the same time. The point is that the workflow had a script-like structure that coordinated many subagents across phases.
The useful failure
The Verify phase extracted 25 claims and tried to check each claim with three independent agents.
Then the run partially broke.
In my case, 17 of the 25 claims ended up with a killed status. The issue looked like a StructuredOutput failure.
That sounds bad, but the workflow's interpretation was the useful part.
It separated:
- confirmed;
- contradicted;
- not verified.
Only 2 of the 17 killed claims were actually contradicted. The other 15 were incomplete checks, not false claims.
That distinction matters.
killed should not automatically mean "wrong." It means the check did not finish. If your agent turns that into a confident conclusion, the report becomes dangerous.
Why workflows feel different from long chats
In a normal long chat, intermediate state gets compressed into the final answer. That is convenient, but it can hide weak spots.
With a workflow, the plan moves into a script. The runtime can hold phase state, branch logic, repeated checks, and failed agents without forcing everything into the chat context.
That makes it better for tasks where process quality matters:
- codebase audits;
- large migrations;
- cross-checked research;
- repeated claim verification;
- multi-angle planning.
It is not a good fit for every task. For small edits or quick debugging, it is overkill.
My checklist
Before running a dynamic workflow, I would do this:
- Narrow the question.
- Run a small version first.
- Open
/workflows, not just the final report. - Inspect failed, killed, contradicted, and not-verified states separately.
- Do not treat
killedas a refutation. - Save the workflow only after it proves useful.
- Watch token usage.
Takeaway
Dynamic workflows are not just "more subagents."
They are a way to turn a long task into an inspectable process.
My run partially failed, but that failure was visible. For automated research and coding-agent work, visible failure is much better than a clean answer that hides uncertainty.
If you are testing Opus 4.8 for Claude Code, coding agents, or long-running workflows, I keep the model notes and API routing details here: Claude Opus 4.8 API for Coding Agents.
Top comments (0)