Mikhail Andreev

Posted on Jun 1

Writing Workflows in Claude Code: What They Unlock and Where They Can Bite You

#ai #claude #javascript #workflow

Anthropic has introduced the Workflow API for Claude Code, so I spent some time digging into how it works in practice.

I looked at how agents are launched, how they pass results between steps, when parallel makes sense, when pipeline is a better fit, and how workflows relate to skills and sub-agents.

In short, workflows let you describe Claude Code's work as a scenario, not as one long prompt.

Not like this:

First do this, then do not forget that, then check one more thing.

But as an actual executable process: phases, agents, input data, response schemas, and a final synthesis step.

Before getting to the nice examples, though, it is worth starting with the limitations. They matter more than they seem at first.

Start with the sandbox

A workflow script does not get full access to your project by itself.

It runs inside an isolated node:vm context. In other words, this is not a normal Node.js application where you can import modules, read files, call APIs, and execute shell commands.

The workflow script has no direct access to:

fs
process
require
module
__dirname
Buffer
fetch
networks
file system
shell commands
Node APIs

import() also does not behave like a normal ESM import.

At first, this may look strange. If the workflow cannot read files or call tools directly, how does it help with code?

The answer is simple: the workflow is not the thing doing the hands-on work.

The agents are.

An agent can read files, search through the project, use LSP, run bash, call MCP tools, or perform web search, assuming those tools are available in the current Claude Code session.

So the mental model looks like this:

workflow orchestrates the process
agents do the work
tools give agents access to code, search, and external context

That is the key idea.

A workflow is not a replacement for agents or skills. It is an orchestration layer above them.

The JavaScript inside the sandbox is still real JavaScript

Do not confuse "sandboxed" with "barely a language."

Inside the workflow, you still have modern JavaScript: arrays, objects, promises, classes, closures, Map, Set, recursion, eval, new Function, and modern V8 features.

For example, this works:

const makeScore = new Function("x", "return x.priority * 2")
const score = makeScore({ priority: 3 }) // 6

This works too:

const rule = "item.severity === 'high'"
const filter = new Function("item", "return " + rule)

const result = [
  { severity: "low" },
  { severity: "high" }
].filter(filter)

That can be useful if you want to build a small router, filter, state machine, or dynamic result-processing logic inside the workflow.

But all of that still happens inside the sandbox.

eval and new Function do not give you access to the host environment. You cannot use them to reach files, the network, or Node APIs.

This will not work:

const fs = require("fs") // require is not defined

And this is not something you should expect to work either:

const files = await fetch("https://example.com") // fetch is not defined

If you need to read a file, find a symbol in the project, call an external service, or run a command, the workflow should delegate that task to an agent via agent().

What we had before workflows

Before workflows, complex processes in Claude Code were usually assembled with skills and sub-agents.

For example, you could write a skill for code review:

First, understand the task.
Then find related files.
Then check the business logic.
Then check the tests.
Do not make a claim without a file and line reference.
If you are unsure, mark the finding as a hypothesis.

This approach works. It is still useful.

A skill gives an agent discipline: how to search, how to verify, which tools to use, and how to format the result. This is especially important in team workflows. Without these rules, agents quickly start producing polished but weak answers.

The problem is not that skills are bad.

The problem is that a skill is still a text instruction. It tells the model how it should behave, but the order of operations does not become a program.

The model has to keep the sequence in context: first find files, then review them, then gather evidence, then produce the report. For small tasks, that is fine. For longer tasks, familiar problems show up:

a step gets skipped;
the review stays too general;
several agents return nearly the same answer;
questionable findings are not rechecked;
the final report is written before all inputs are ready.

A workflow addresses exactly this layer.

It does not tell the agent, "please be careful."

It tells the system:

Run this first, then this, run these checks in parallel, and only then synthesize the result.

The difference may look small on paper, but it is significant in practice.

skills define rules
agents execute tasks
workflow controls the order

What a workflow is

A Claude Code workflow is a JavaScript file that orchestrates sub-agents.

It has a few core building blocks:

agent() starts an agent;
phase() marks a stage of work;
parallel() runs several tasks and waits for all of them;
pipeline() pushes items through several stages;
schema defines the expected response shape;
args passes input data into the workflow.

Simplified, a workflow feels like a small backend process.

The difference is that instead of calling normal functions, you launch agents.

In a skill, you might write:

First find candidate files.
Then verify each finding.
Then produce a report.

In a workflow, that becomes executable code:

phase("Explore")

const candidates = await agent(
  "Find candidate files related to this task."
)

phase("Verify")

const verified = await parallel(
  candidates.files.map(file => () =>
    agent("Verify findings in this file: " + file)
  )
)

phase("Synthesize")

return await agent(
  "Create a final report from verified findings."
)

In a skill, the order is described in words.

In a workflow, the order is executed.

When a workflow is actually worth it

Not every request should become a workflow.

If you need to explain one function, fix one test, or quickly find where something is defined, regular Claude Code is usually enough.

A workflow becomes useful when you have a repeatable sequence.

For example, before opening a pull request, you may always ask Claude Code to:

understand what changed;
find related parts of the codebase;
review business logic;
review tests;
check security risks;
remove weak or unsupported findings;
produce a short report.

In chat, this quickly turns into manual orchestration. You keep reminding the model what to do next. The model remembers some things, misses others, and then you ask it to recheck or shorten the result.

A workflow lets you write that order once.

How workflows relate to skills

There is an important nuance here.

The workflow script itself does not automatically inherit rules from skills. Skills are loaded into the context of a specific agent, usually through agentType.

If you call a plain agent() without an agentType, it may not receive the project-specific rules you expected.

For example, suppose you have an agent that knows how your team searches the codebase: use LSP first, then grep, make claims only with evidence, and always return path and line.

In that case, it is better to explicitly select that agent type:

await agent(
  "Find where this symbol is used. Use LSP before grep. Return path and line.",
  {
    agentType: "keep-backend:codebase-explorer",
    phase: "Explore",
    schema: RESULT_SCHEMA
  }
)

Otherwise, you may get a generic agent that understands the task but does not follow the exact workflow discipline you rely on.

A good rule of thumb:

If the discipline of a step matters, do not rely on magic. Use the right agentType, or write the rule directly in the prompt.

workflow controls the order
skill controls the agent's behavior
agent performs the concrete step

Start with the process, not the code

The most common mistake is to start writing JavaScript immediately.

It is usually better to start with plain text.

For example:

Task: check whether a feature is ready for a pull request.

1. Understand which areas were affected.
2. Separately review logic, tests, and security.
3. Collect the results.
4. Remove duplicates and weak claims.
5. Return risks and next steps.

After that, the code is almost mechanical.

You already have the phases:

Scope → Review → Synthesize

Now you only need to decide where one agent is enough, where several agents should run in parallel, and where you need a final synthesis step.

A minimal workflow

A workflow file starts with meta.

This describes the workflow to Claude Code:

export const meta = {
  name: "feature-review",
  description: "Review a feature before opening a pull request.",
  whenToUse: "Use after implementing a feature.",
  phases: [
    { title: "Scope" },
    { title: "Review" },
    { title: "Synthesize" }
  ]
}

meta should be a simple object. No variables, functions, spread operators, or computed values. The runtime reads it statically.

After that, you can write normal JavaScript:

const A = typeof args === "string" ? JSON.parse(args) : (args || {})
const task = A.task || ""

if (!task) {
  return {
    status: "error",
    message: "Pass task in args."
  }
}

There is an annoying but important detail here: args may arrive as a string. Even if you passed an object, it is safer to parse it with JSON.parse.

If you skip this, you can spend too much time wondering why A.task is empty.

The first agent

An agent performs one concrete step.

phase("Scope")

const scope = await agent(
  "Read the task and identify review areas.\n\nTask:\n" + task,
  {
    label: "scope",
    phase: "Scope"
  }
)

What happens here:

phase("Scope") opens the stage;
agent() starts a sub-agent;
the result is saved into scope.

So far, it looks like a normal prompt. The difference is that the result can now be used by the rest of the code.

Why life gets painful without schemas

Without a schema, an agent returns text.

Text is fine for a human, but it is inconvenient for the next step in a workflow. It is harder to filter, merge, verify, and transform.

That is why most workflows should define response schemas early.

For example, suppose you want a list of review areas:

const SCOPE_SCHEMA = {
  type: "object",
  required: ["areas"],
  properties: {
    areas: {
      type: "array",
      items: { type: "string" }
    }
  }
}

Now use that schema in the agent call:

const scope = await agent(
  "Identify review areas for this task.\n\nTask:\n" + task,
  {
    label: "scope",
    phase: "Scope",
    schema: SCOPE_SCHEMA
  }
)

Now scope.areas is data, not a block of text you have to guess your way through with regexes.

Parallel checks

Suppose the first agent returns three areas:

logic
tests
security

You can review them in parallel:

phase("Review")

const reviews = await parallel(
  scope.areas.map(area => () =>
    agent(
      "Review this task from the angle: " + area + "\n\nTask:\n" + task,
      {
        label: "review:" + area,
        phase: "Review",
        schema: REVIEW_SCHEMA
      }
    )
  )
)

const validReviews = reviews.filter(Boolean)

One important detail: pass functions to parallel(), not already-started promises.

Correct:

parallel([
  () => agent("Check tests"),
  () => agent("Check security")
])

Incorrect:

parallel([
  agent("Check tests"),
  agent("Check security")
])

parallel() is a barrier. It starts several tasks and waits until all of them are done.

That is useful when the next step needs to see the full set of results.

When you need pipeline

pipeline() solves a different problem.

Imagine you have a list of files. Each file must be analyzed first and verified after that. But you do not need to wait until all files are analyzed before starting verification for the first one.

That is what pipeline() is for:

const results = await pipeline(
  files,

  file => agent(
    "Analyze this file for risky changes: " + file,
    {
      label: "analyze:" + file,
      phase: "Analyze",
      schema: REVIEW_SCHEMA
    }
  ),

  (analysis, file) => agent(
    "Verify these findings for " + file + ". Remove weak claims.\n\n" +
    JSON.stringify(analysis),
    {
      label: "verify:" + file,
      phase: "Verify",
      schema: REVIEW_SCHEMA
    }
  )
)

In short:

parallel = start several tasks and wait for all of them
pipeline = move each item through several stages

Use parallel when you need one consolidated report after all checks are complete.

Use pipeline when each file, module, ticket, or item goes through the same chain of steps.

The workflow does not read files, and that is fine

This is worth repeating because it is easy to forget.

The workflow script should not be the hands.

It does not read files. It does not call the network. It does not run shell commands.

This is the wrong mental model:

// now the workflow will read files and find changes by itself

Think like this instead:

await agent(
  "Read the changed files and find risky logic. Cite file paths and lines."
)

If the agent has tools available, it can read files, search the project, use LSP, or run commands.

The workflow does not do the work directly.

It assigns work to agents.

Final synthesis

Do not return raw outputs from every agent directly to the user.

They will almost always contain repetition, inconsistent style, and unnecessary detail. A separate synthesis step usually produces a much better result:

phase("Synthesize")

const report = await agent(
  "Merge these reviews into one concise report. " +
  "Remove duplicates. Keep only actionable findings.\n\n" +
  JSON.stringify(validReviews),
  {
    label: "final-report",
    phase: "Synthesize",
    schema: REVIEW_SCHEMA
  }
)

return {
  status: "ok",
  task,
  report
}

A good final report answers four questions:

what was found
why it matters
where the evidence is
what to do next

A complete starter template

Here is a template you can start from. The prompts should be adjusted for your own project and workflow.

export const meta = {
  name: "custom-review",
  description: "Run a structured review and return actionable findings.",
  whenToUse: "Use when a task needs several checks.",
  phases: [
    { title: "Scope" },
    { title: "Review" },
    { title: "Synthesize" }
  ]
}

const A = typeof args === "string" ? JSON.parse(args) : (args || {})
const task = A.task || ""

if (!task) {
  return {
    status: "error",
    message: "Missing task."
  }
}

const SCOPE_SCHEMA = {
  type: "object",
  required: ["areas"],
  properties: {
    areas: {
      type: "array",
      items: { type: "string" }
    }
  }
}

const REVIEW_SCHEMA = {
  type: "object",
  required: ["summary", "findings"],
  properties: {
    summary: { type: "string" },
    findings: {
      type: "array",
      items: {
        type: "object",
        required: ["severity", "claim", "evidence"],
        properties: {
          severity: { enum: ["high", "medium", "low"] },
          claim: { type: "string" },
          evidence: { type: "string" }
        }
      }
    }
  }
}

phase("Scope")

const scope = await agent(
  "Read the task and identify 3 to 5 useful review areas.\n\nTask:\n" + task,
  {
    label: "scope",
    phase: "Scope",
    schema: SCOPE_SCHEMA
  }
)

if (!scope || !scope.areas || scope.areas.length === 0) {
  return {
    status: "blocked",
    message: "Could not identify review areas."
  }
}

phase("Review")

const reviews = await parallel(
  scope.areas.map(area => () =>
    agent(
      "Review the task from this angle: " + area + "\n\n" +
      "Return concrete findings only. Include evidence.\n\nTask:\n" + task,
      {
        label: "review:" + area,
        phase: "Review",
        schema: REVIEW_SCHEMA
      }
    )
  )
)

const validReviews = reviews.filter(Boolean)

phase("Synthesize")

const finalReport = await agent(
  "Merge the review results into one concise report. " +
  "Deduplicate findings. Keep only actionable issues.\n\n" +
  JSON.stringify(validReviews),
  {
    label: "final-report",
    phase: "Synthesize",
    schema: REVIEW_SCHEMA
  }
)

return {
  status: "ok",
  task,
  report: finalReport
}

How to add agentType

If your project already has specialized agents with skills, use them explicitly.

For example:

const codeFindings = await agent(
  "Explore the codebase for files related to this task. " +
  "Use LSP for symbols. Return exact paths and evidence.\n\nTask:\n" + task,
  {
    agentType: "keep-backend:codebase-explorer",
    label: "code-explorer",
    phase: "Review",
    schema: REVIEW_SCHEMA
  }
)

This starts not just "some agent," but an agent with the rules you actually need.

That matters a lot in team workflows where you may have requirements such as:

use LSP before grep;
do not make claims without evidence;
verify external data through MCP;
return path and line;
separate facts from hypotheses.

Where to save the workflow

Usually, workflows live inside the project:

.claude/workflows/custom-review.js

After that, you can run the workflow by name.

During development, it is often more convenient to run it by file path. That way, Claude Code reads the latest version from disk each time.

If you run a workflow by name and then change the file, reload the registration:

/reload-plugins

Common mistakes

Turning the workflow into one giant prompt

At that point, it becomes the same chat prompt, just stored in a file.

Break the work into phases instead:

Scope → Review → Verify → Synthesize

Forgetting about skills

A workflow does not replace skills.

If an agent must follow specific rules, use the right agentType or write those rules directly into the prompt.

Not defining schemas

Without a schema, the agent returns text.

Text is harder to verify, merge, filter, and pass to the next step.

Waiting for user input inside the workflow

A workflow should not ask the user questions halfway through execution.

If information is missing, return needs_input:

return {
  status: "needs_input",
  questions: [
    "Which branch should I compare against?",
    "Which service is in scope?"
  ]
}

Then the main flow can ask those questions and rerun the workflow with the answers.

Using parallel when you need pipeline

If each item has to pass through several stages, use pipeline().

Otherwise, you create unnecessary waiting points.

Thinking eval bypasses the sandbox

eval and new Function are available, but they still run inside the same isolated context.

They are useful for dynamic logic inside the workflow, such as building a filter function or a small router.

They are not a way to access fs, process, require, the network, or shell commands.

If you need files, commands, or network access, delegate the work to an agent and its tools.

Forgetting about args

Parse args defensively:

const A = typeof args === "string" ? JSON.parse(args) : (args || {})

Otherwise, the workflow may receive a string while your code expects an object.

The main idea

A Claude Code workflow makes the process executable.

A skill tells an agent how to work.

A workflow tells the system in which order to work.

For a useful first workflow, three phases are enough:

Scope → Review → Synthesize

First, the workflow understands the task.

Then it runs the checks.

Then it assembles the final report.

That alone turns a normal prompt into a repeatable working process.

After that, you can add agentType, schema, parallel, pipeline, and stricter rules for agents.

Do not add everything at once.

Start with order, then add discipline, then scale.

What it feels like

If you have worked with LangGraph, the idea will feel familiar.

A Claude Code workflow is similar to building a flow graph: there are steps, executors, transitions, waiting points, and final response assembly.

The difference is the abstraction level.

In LangGraph, you usually build the graph yourself: nodes, edges, state, routing. In Claude Code, the workflow looks simpler: a JavaScript scenario that starts agents with agent(), merges results with parallel(), and moves items through stages with pipeline().

But the thinking is almost the same:

split the task → start executors → pass state → assemble the result

That is why workflows are useful to think of as Claude Code's built-in version of an agentic flow.

Not just "ask the model."

More like building a small system out of steps, agents, and rules.

DEV Community