AlexTDev

Posted on Apr 5

How I Automated Mermaid.js Upgrades with a Claude Code Skill

#claude #ai #automation #agentskills

I maintain an open-source IntelliJ plugin that bundles Mermaid.js. Every time Mermaid releases a new version, I need to update around ten files across a Kotlin codebase — lexer grammars, completion data, test fixtures, documentation. I know the process by heart, but knowing it doesn't make it less tedious. It's the same careful, detail-heavy work each time, and the kind where a single missed file means a debugging session you didn't plan for, or new features you silently skip without realizing.

The Plugin

Mermaid Visualizer is a JetBrains IDE plugin that adds full Mermaid.js support: rendering inside Markdown preview, a standalone editor with live preview for .mmd and .mermaid files, syntax highlighting, code completion, inspections, navigation, and structure view. It crossed 2,000 downloads in about six weeks on the JetBrains Marketplace, which was a nice surprise for a solo project.

The integration runs deep. The plugin bundles mermaid.min.js (~2.9 MB) and renders diagrams via JCEF (Chromium embedded in the IDE). But it also has a JFlex lexer that knows every diagram type and keyword, a Grammar-Kit parser with PSI tree, contextual code completion that suggests the right arrows and keywords depending on which diagram you're writing, and inspections that catch mistakes. All of that means when Mermaid.js adds a new diagram type or changes syntax, the upgrade isn't just swapping a JavaScript file — it's updating the lexer, the completion catalogs, the test data, the documentation, and then verifying everything still works.

GitHub repo if you want to look at the code.

The Problem

The first time I upgraded Mermaid.js, I did everything by hand. Read the changelog, figured out what was new, opened each file, made the edits, ran the tests, fixed what broke, ran them again. It took about three hours, and I still missed some new features from the release. I only noticed later when I went back to the changelog and realized I'd skipped a couple of new keywords.

The second time, I used Claude to help. I walked it through the process step by step — "now open this file, add these entries here, now regenerate the lexer, now update the tests." It was faster and more thorough, but I was essentially being the orchestrator. I had to remember the sequence, remember which files to touch, remember that longer arrow patterns need to come before shorter ones in the JFlex grammar or longest-match breaks. Every piece of domain knowledge was in my head, and I had to give it to Claude sequentially.

The core tension: the process is identical every time, but it demands attention to detail across many files with non-obvious ordering constraints. Miss one file and you get broken tests. Forget to regenerate the lexer after editing the grammar and you get a build that compiles fine but silently ignores your changes. It's the kind of work that's just tedious enough to be error-prone.

Why a Skill, Not a Prompt

Claude Code has a concept called skills — markdown files that live in your project and encode a reusable procedure. You invoke them with a slash command, and Claude follows the instructions. They're different from prompts in ways that matter for this kind of problem.

A prompt is one-shot. You write it, use it, and if you need it again next month, you rewrite it — or dig through your conversation history hoping you saved it. A skill lives in the project repository, versioned alongside the code. When the project structure changes, you update the skill. It evolves with the codebase.

More importantly, a skill encodes domain knowledge that would otherwise exist only in your head. The exact file paths. The line ranges where diagram types are defined in the lexer. The fact that the JFlex KEYWORDS HashSet lives around lines 17-66 and new entries need a comment naming the diagram type. The ordering constraint on arrow patterns. None of that is obvious from reading the code cold — you'd have to explore, grep, build a mental model. The skill front-loads all of that.

Skills also support conditional logic. A Mermaid.js release might add three new diagram types, or it might be just bug fixes. The skill adapts: if there are no new types, keywords, or arrows, it skips the lexer, completion data, and parser phases entirely and jumps straight to updating the bundled file and version references. A static prompt can't do that — or rather, it can suggest it, but it won't enforce it structurally.

The critical insight is simple: skills are for recurring problems. If you do something once, a prompt is fine. If you'll do it again in four weeks, and again four weeks after that, encode it. A skill is a reproducible process you can invoke instantly. That's the core difference.

Anatomy of the Skill

The skill has 11 phases, from changelog analysis through to a manual testing checklist. That might sound like overkill for "update a JavaScript file," but remember — this isn't just a file swap. It's a coordinated change across a lexer grammar, completion catalogs, test data, documentation, and a build pipeline.

The structure is: analyze what changed, download the new version, update each affected layer of the codebase in dependency order, run the build, report results.

Here's what makes it work.

The Impact Classification

Phase 0 starts with reading the changelog and classifying every change:

| Category | Impact | Examples |
|---|---|---|
| New diagram types | HIGH — touches ~10 files | venn-beta, ishikawa-beta |
| New keywords for existing types | MEDIUM — touches ~4 files | set, union for venn |
| New arrow patterns | MEDIUM — touches ~4 files | half-arrowheads |
| Breaking changes | HIGH — varies | renamed types, removed syntax, API changes |
| Internal changes only | LOW — version refs only | perf improvements, bug fixes |

This is the decision engine. A bug-fix-only release skips most phases — download the new file, update version strings, build, done. A release with new diagram types triggers the full pipeline. The skill doesn't waste time on phases that don't apply, and it doesn't skip phases that do.

Precision in File Locations

Phase 2 handles the JFlex lexer, and this is where precision matters:

## Phase 2 — JFlex Lexer (IF new types/keywords/arrows)
File: src/main/grammars/Mermaid.flex
### 2a. Diagram types (YYINITIAL state, ~lines 100-130)
Add new diagram type strings to the alternation block. Keep grouped logically.
### 2b. Keywords (KEYWORDS HashSet, ~lines 17-66)
Add new keywords grouped under a comment naming the diagram type.
Only reserved words, NOT arbitrary identifiers.
### 2c. Arrow patterns (NORMAL state, ~lines 160-218)
Longer patterns MUST come BEFORE shorter ones (JFlex longest-match).
Variable-length arrows go before fixed-length.

Without this, Claude would have to search the codebase, figure out the lexer structure, and guess where to insert new entries. It might get it right. It might put a new arrow pattern after shorter ones and break longest-match. The skill eliminates that uncertainty.

The line ranges use "~" because they shift as the file grows. They're landmarks, not GPS coordinates — close enough to orient, flexible enough to survive edits.

The Verification Gate

Phase 10 is short but non-negotiable:

## Phase 10 — Build & Verify
./gradlew build then ./gradlew verifyPlugin.
Both must succeed. If tests fail: read, fix, re-run.

This is what turns the skill from a checklist into a reliable process. It doesn't declare success until the build passes. If a test fails because a new diagram type wasn't added to a test fixture, Claude reads the failure, fixes it, and re-runs. The loop continues until green.

Why It Works

Looking across the whole skill, a few design choices carry the weight:

Explicit file paths and line ranges remove ambiguity. The agent knows where to look without exploring.
Conditional phases with clear triggers handle any type of release without wasted effort.
Ordered dependencies prevent cascading failures — the lexer is regenerated before completion data is updated, the parser before navigation, everything before tests.
Verification gates ensure the process produces a working build, not just a set of edits.
Delegation to existing tooling where it makes sense — ./gradlew updateMermaid handles the download, SHA-256 verification, and atomic file replacement. The skill doesn't reinvent that.

What I Changed Along the Way

The first version of this skill was too vague. It said things like "update the lexer with new diagram types" without specifying where in the lexer or how. Claude made reasonable guesses, but reasonable guesses are wrong often enough to be a problem.

The second version had better structure — numbered phases, file paths — but didn't handle variability well. What if the release has no new diagram types? What about breaking changes that rename existing syntax? It assumed every upgrade was the same shape, which they aren't.

The third version added the conditional branching and the impact classification table. That was the version that started working reliably in one activation — invoke the skill, provide the changelog, let it run.

I also had to refine the Gradle updateMermaid task itself to integrate cleanly. The skill assumes that task handles download and integrity verification, so the task needed to actually be reliable — atomic writes, SHA-256 checks, clear error messages on failure.

The lesson is familiar to anyone who writes code: you iterate. Expect two or three versions before a skill works the way you want. That's normal, not a failure.

Results

Before the skill, a Mermaid.js upgrade took about 3 hours of focused work and I'd still miss things. With the skill, it takes 45 minutes to an hour, including running the full test suite and verifying the build.

More importantly: zero missed features since adopting the skill. The impact classification in Phase 0 forces a thorough changelog review, and the per-phase file lists ensure nothing gets skipped. This matters because staying close to official Mermaid.js releases is the whole point of the plugin — one of the common criticisms of existing Mermaid plugins was lagging months behind upstream. The skill is what makes that goal sustainable as a solo dev, and I'll keep iterating on it to push that time down further.

Takeaways for Writing Your Own Skills

If you're thinking about encoding one of your own processes as a skill, here's what I've learned:

Target recurring processes. If you do something once, write a prompt. If you know you'll do it again, write a skill.

Be explicit about locations. File paths, line ranges, section names. The less the agent has to search, the less it can get wrong.

Use conditional branching. Real processes have variability. "IF new types exist, do X; otherwise skip to Y" handles that without separate skills for each scenario.

Include verification steps. A skill that makes edits but doesn't check if they work is just a fancy TODO list. Build, test, lint — whatever your project needs.

Iterate. Your first version will be too vague or too rigid. That's fine. Refine it after each use, the same way you'd refine code after a code review.

Keep it in the repo. Version the skill alongside the code it operates on. When the code structure changes, the skill should change too.

One Last Thing

If you work in JetBrains IDEs and want Mermaid diagram support, give the plugin a try. But the broader point here isn't about Mermaid or IntelliJ — it's about recognizing when a recurring process is worth encoding.

If you have a procedure that's always the same steps, always touches the same files, and always requires the same domain knowledge you keep in your head, that's a skill waiting to be written. The upfront cost is maybe an hour. The payoff compounds every time you invoke it.

If you found this helpful, you can buy me a coffee ☕.

DEV Community