DEV Community

Cover image for CLAUDE.md Best Practices: Mermaid for Workflows
 Gábor Mészáros
Gábor Mészáros Subscriber

Posted on

CLAUDE.md Best Practices: Mermaid for Workflows

I picture says a thousand words. I wanted to see my system.

Not the code. I wanted to see the workflows. What happens when a rule gets validated. What happens when a session starts. What happens when compaction triggers. Systems are workflows, and I couldn't see mine.

I had them written down, of course. Prose paragraphs in CLAUDE.md/SKILL.md or RULES describing each process step by step. But past four or five steps with branching, the prose became unreadable. I'd write it, come back a week later, and need to re-parse the whole thing to understand what I'd written. Mental overload, every time.

My coding agent had the same problem. Research calls it "lost in the middle" - LLMs perform best with information at the beginning and end of their context, and significantly worse with information buried in the middle. My prose workflows were exactly that: critical branching logic buried in paragraphs, sandwiched between other instructions. Claude would miss steps. Skip branches. Drift from the intended process.

And the workflows themselves drifted too. I'd remove a pipeline phase and update one paragraph but miss another. Prose makes that invisible - three sentences can reference a removed step and nothing looks broken.

So I rewrote my workflows as Mermaid diagrams. And three things happened at once:

  1. I could see the system. Rendered Mermaid gives you a visual map of what's happening - for free.
  2. Claude followed them more reliably. Structured syntax sticks out in a context window full of prose.
  3. They stopped rotting. You can't leave a dangling arrow in a flowchart the way you can leave a stale sentence in a paragraph.

Turns out there's research backing all three.

The research

FlowBench (Xiao et al., EMNLP 2024) tested how LLM agents perform when given the same workflow knowledge in different formats - natural language, pseudo-code, and flowcharts. Across 51 scenarios on GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo:

  • Flowcharts achieved the best trade-off for agent performance
  • Combining formats (text + code + flowcharts) outperformed any single format

Format matters. It measurably affects how well the agent follows your instructions.

What to convert

Not everything benefits equally from a diagram. The rule:

If it has branches, it needs a diagram. If it has judgment, it also needs prose. Most real workflows need both.

Deterministic pipelines - CI/CD, deployment, validation, review workflows - are pure flowchart territory. Every step has a defined outcome, every branch has a condition.

But most workflows aren't purely deterministic. They have branching and judgment: "if the tests fail with a type error, fix inline; if it's a logic error, rethink the approach." The diagram captures the branch. The prose below it captures the judgment. Neither format alone carries both.

Before and after

Here's what my rule validation workflow looked like before - prose only, describing the same process:

## Rule Validation

Run validation on all rules. For each rule, first validate the
schema (fields, types, format). If that passes, check the contract
(.md and .yml matching). If the contract is valid, resolve template
variables and run OpenGrep validation on pattern syntax. If OpenGrep
returns exit 2 or 7, report the error. If it returns 0 or 1,
the rule passes. After all rules are checked, output a summary.
Enter fullscreen mode Exit fullscreen mode

And here's what the Mermaid version looks like:

flowchart TD
    START([/validate-rules options]) --> COLLECT[Collect rules from paths]
    COLLECT --> LOOP[For each rule]
    LOOP --> SCHEMA[1. Schema validation<br/>Fields, types, format]
    SCHEMA -->|fail| REPORT
    SCHEMA -->|pass| CONTRACT[2. Contract validation<br/>.md and .yml matching]
    CONTRACT -->|fail| REPORT
    CONTRACT -->|pass| RESOLVE[Resolve template variables]
    RESOLVE --> OPENGREP[3. OpenGrep validation<br/>Pattern syntax]
    OPENGREP -->|exit 2 or 7| REPORT
    OPENGREP -->|exit 0 or 1| REPORT[Report results]
    REPORT --> NEXT{More rules?}
    NEXT -->|yes| LOOP
    NEXT -->|no| SUMMARY[Summary output]
Enter fullscreen mode Exit fullscreen mode

And the result:

Rendered Mermaid workflow from Reporails rule validation

Rendered Mermaid workflow from Reporails rule validation

Same information. But the flowchart makes every branch explicit and every failure path visible. Claude can't accidentally skip a validation step or misinterpret which exit codes mean failure.

But the diagram alone is still only half the answer.

The combo: diagram + prose

FlowBench's strongest finding wasn't "use flowcharts" - it was "combine formats." Each format carries what it's best at.

Here's what one of my actual workflows looks like after conversion - rule-validation.md from Reporails:

## Rule Validation Workflow

​mermaid
flowchart TD
    START([/validate-rules options]) --> COLLECT[Collect rules from paths]
    COLLECT --> LOOP[For each rule]
    LOOP --> SCHEMA[1. Schema validation<br/>Fields, types, format]
    SCHEMA -->|fail| REPORT
    SCHEMA -->|pass| CONTRACT[2. Contract validation<br/>.md and .yml matching]
    CONTRACT -->|fail| REPORT
    CONTRACT -->|pass| RESOLVE[Resolve template variables]
    RESOLVE --> OPENGREP[3. OpenGrep validation<br/>Pattern syntax]
    OPENGREP -->|exit 2 or 7| REPORT
    OPENGREP -->|exit 0 or 1| REPORT[Report results]
    REPORT --> NEXT{More rules?}
    NEXT -->|yes| LOOP
    NEXT -->|no| SUMMARY[Summary output]
​

## Why Three Layers in This Order

1. **Schema validation** catches structural errors (missing fields, wrong
   types) with zero external dependencies. Cheapest check - filters out
   rules that would cause confusing downstream failures.

2. **Contract validation** confirms that rule.md and rule.yml agree.
   Catches the class of bugs where one file was updated but the other
   wasn't. Requires both files to be schema-valid first.

3. **OpenGrep validation** runs actual patterns against the syntax
   checker. Most expensive step - requires template resolution, file I/O,
   agent config loading. Only runs on rules that are already structurally
   sound.
Enter fullscreen mode Exit fullscreen mode

The diagram shows the three-step pipeline with its branches. The prose explains why that ordering - cheapest first, most expensive last, each layer depending on the previous one being clean. Neither format alone carries both the flow and the reasoning.

When to adopt this

If your CLAUDE.md has any of these, you have a flowchart waiting to happen:

  • "First do X. If X passes, do Y. If Y fails, do Z."
  • "Run A, then B, then C. If any step fails, stop."
  • "Check for X. If found, do Y. Otherwise, do Z."

Sequential steps with conditions = flowchart. Convert those, leave everything else as prose.

Try it

  1. Find a workflow in your CLAUDE.md that reads like a recipe with conditions
  2. Rewrite the control flow as Mermaid
  3. Keep the rationale and judgment calls as prose below the diagram
  4. Delete the original prose-only version

One converted workflow. See if Claude follows it more reliably - and enjoy being able to see your system for the first time.

Don't describe the path. Draw the map.


*The FlowBench paper is at arxiv.org/abs/2406.14884. The "lost in the middle" paper is at arxiv.org/abs/2307.03172.

I'm building instruction file governance at Reporails - this finding led to a new rule category (Context Quality) that I'll cover in the next post.*

Previous in series: The backbone.yml Pattern

Top comments (3)

Collapse
 
nedcodes profile image
Ned C

i ran into the same thing writing cursor rules. past a few hundred lines of prose instructions, the model just stops following them reliably. i ended up breaking rules into smaller scoped files instead of one giant doc, and compliance went up noticeably. haven't tried mermaid specifically but the "can't leave a dangling arrow" point tracks because with prose you can absolutely have two paragraphs that contradict each other and nothing flags it.

Collapse
 
cleverhoods profile image
Gábor Mészáros

Breaking into smaller scoped files is the L3 -> L4 move in the capability model (path-scoped rules that load based on what the agent is working on). Mermaid is orthogonal. It helps within each file when the logic branches. Both are fighting the same enemy: context dilution.

Collapse
 
trinhcuong-ast profile image
Kai Alder

Great idea combining CLAUDE.md with Mermaid for workflow docs. For previewing diagrams before pasting them in, webtoolz.dev/mermaid renders live and lets you share SVG links directly — saves a lot of back and forth.