If you’re building custom slash commands for Claude Code, there’s a good chance you’ve fallen into a trap that silently inflates your token costs and makes your command files harder to maintain. The culprit? Inline code blocks embedded directly in your .md command files.
We discovered this pattern in our own project governance system at EchoForgeX, and the numbers were eye-opening. In this post, we’ll break down the problem, show you the real cost, and walk through how to fix it — plus how to guide Claude Code away from creating this pattern in the first place.
The Pattern: Inline Python in Command Files
Claude Code slash commands live in .claude/commands/ as Markdown files. They contain instructions that Claude follows when you invoke them. When these commands need to interact with external tools — reading YAML plans, querying a catalog, updating task statuses — Claude tends to generate something like this:
bash
cd /path/to/project && python3 -c "
import sys; sys.path.insert(0, 'tools/project-governance')
from planner import load_plan, get_phases, can_advance_phase
from pathlib import Path
plan = load_plan('{PROJECT_ID}', Path('tools/project-governance/plans'))
if not plan:
print('Plan not found'); sys.exit(1)
phases = get_phases(plan.project_type)
phase_idx = phases.index(plan.current_phase)
print(f'Project: {plan.project_name} ({plan.project_id})')
print(f'Phase: {plan.current_phase} ({phase_idx + 1}/{len(phases)})')
... 40 more lines of formatting logic
"
plaintext
This looks reasonable at first glance. The Python modules exist, the functions are real, and the output is useful. But when you step back and look at the full picture, the costs add up fast.
The Real Cost: We Measured It
We audited three command files in our project governance system: hire.md, manage.md, and plan-check.md. Here’s what we found:
| File | Total Lines | Inline Python Lines | % Python |
|---|---|---|---|
| hire.md | 230 | 79 | 34% |
| manage.md | 311 | 167 | 54% |
| plan-check.md | 98 | 1 | 1% |
| Total | 639 | 247 | ~39% |
Nearly 40% of our command file content was inline Python. That translates to roughly 8,700 characters — about 2,000+ tokens — loaded into Claude’s context window every single time one of these commands is invoked. And in manage.md, the status dashboard block alone was 78 lines and 2,599 characters of inline Python.
The worst part? Every block repeated the same boilerplate:
import sys; sys.path.insert(0, 'tools/project-governance')
from pathlib import Path
plans_dir = Path('tools/project-governance/plans')
plaintext
That’s three lines of identical setup code repeated 16 times across our command files.
Why This Happens
Claude Code generates inline code blocks for a practical reason: the command .md files need to be self-contained instructions. When Claude builds a command that needs to call external Python, the most direct approach is to embed the call inline. It works. It’s correct. And it’s how most developers would write a quick one-off script.
The problem is that command files aren’t one-off scripts. They’re prompt templates loaded into context on every invocation. Every character counts because every character becomes tokens, and tokens cost money and consume context window space that could be used for actual reasoning.
There are three specific costs:
- Token bloat: ~2,000 extra tokens per invocation, across every conversation that uses these commands. Over hundreds of invocations, this adds up to real dollars.
- Maintainability debt: When the output format needs to change, you’re editing inline Python embedded inside Markdown inside bash code fences. One misplaced quote breaks everything. And the same logic is duplicated across multiple files.
-
Reliability risk: Claude has to parse 78-line inline Python blocks and correctly substitute
{PLACEHOLDER}values. Longer blocks mean more surface area for templating errors.
The Fix: Extract a CLI Layer
The solution is straightforward. The Python modules already have clean function APIs — planner.py, catalog.py, etc. The inline blocks are just glue code. Extract that glue into a proper CLI entry point.
Before: 78 Lines in the Command File
bash
cd /path/to/project && python3 -c "
import sys; sys.path.insert(0, 'tools/project-governance')
from planner import load_plan, get_phases, can_advance_phase
from pathlib import Path
plan = load_plan('{PROJECT_ID}', Path('tools/project-governance/plans'))
... 70 more lines of formatting and display logic
"
plaintext
After: 1 Line in the Command File
bash
python3 tools/project-governance/cli.py status {PROJECT_ID}
python
The CLI script (tools/project-governance/cli.py) handles imports, path setup, formatting, and output — once, in a tested Python file, not scattered across Markdown.
A typical CLI structure using Python’s built-in argparse:
#!/usr/bin/env python3
"""Project governance CLI — single entry point for all governance operations."""
import argparse
from pathlib import Path
from planner import load_plan, list_plans, advance_phase
from catalog import search_profiles, rehire, create_profile
PLANS_DIR = Path( __file__ ).parent / "plans"
CATALOG_DIR = Path( __file__ ).parent / "catalog"
def cmd_status(args):
plan = load_plan(args.project_id, PLANS_DIR)
# All formatting logic lives here, tested and maintained in one place
...
def cmd_advance(args):
plan = load_plan(args.project_id, PLANS_DIR)
ok, msg = advance_phase(plan, PLANS_DIR)
print(f"{'SUCCESS' if ok else 'BLOCKED'}: {msg}")
parser = argparse.ArgumentParser(prog="governance")
sub = parser.add_subparsers()
status_p = sub.add_parser("status")
status_p.add_argument("project_id")
status_p.set_defaults(func=cmd_status)
# ... additional subcommands
if __name__ == " __main__":
args = parser.parse_args()
args.func(args)
plaintext
Every subcommand maps to one function. The command .md files shrink to thin orchestration scripts with one-liner shell calls.
The Impact
| Metric | Before | After |
|---|---|---|
| Inline Python in command files | 247 lines / 8,700 chars | ~16 one-liners / ~1,200 chars |
| Tokens per invocation | ~2,000+ extra | ~300 extra |
| Places to update formatting logic | 16 inline blocks across 3 files | 1 CLI file |
| Runtime performance | No change | No change |
Runtime performance stays the same — python3 -c and python3 cli.py have identical startup costs. The wins are entirely in token efficiency and maintainability.
Guiding Claude Code Away From This Pattern
The inline code pattern is Claude’s default behavior when it doesn’t know a CLI exists. You can prevent it with a few targeted interventions:
1. Add Rules to Your CLAUDE.md
Your project’s CLAUDE.md file is the most authoritative way to shape Claude Code’s behavior. Add explicit guidance:
## Command File Conventions
When creating or editing `.claude/commands/*.md` files:
- NEVER embed inline Python (`python3 -c "..."`) in command files
- Always call existing CLI tools or scripts instead
- If no CLI exists for the operation, create one in `tools/` first
- Command files should contain orchestration logic and one-liner shell calls, not application code
- Each bash block in a command file should be a single line where possible
markdown
2. Document Your CLI Tools
Claude Code reads your project structure. If your CLI tools have clear help text and are documented in CLAUDE.md, Claude will use them instead of reinventing the wheel inline:
## Available CLI Tools
| Command | Purpose |
|---------|---------|
| `python3 tools/project-governance/cli.py status <id>` | Show project dashboard |
| `python3 tools/project-governance/cli.py advance <id>` | Advance project phase |
| `python3 tools/project-governance/cli.py catalog search --role <r>` | Search agent catalog |
markdown
3. Use Feedback Memories
If you’re using Claude Code’s memory system, save a feedback memory the first time you correct this behavior. Claude will apply the guidance in future conversations without being told again:
## Feedback Memory Example
"Never embed inline Python in .claude/commands/*.md files.
Why: Inline code bloats token usage by ~2K tokens per invocation and
creates maintenance burden with duplicated logic across files.
How to apply: Always use CLI scripts in tools/ instead."
4. Review Generated Commands Before Committing
When Claude Code creates or modifies a command file, scan for python3 -c blocks before accepting the change. If you see one, ask Claude to extract it into a script. Once corrected, it typically won’t revert to the inline pattern in the same conversation.
Beyond Command Files: The Broader Lesson
This isn’t just about Claude Code command files. The same principle applies anywhere AI-generated Markdown contains embedded code:
-
GitHub Actions workflows with long inline scripts — extract to shell scripts in
.github/scripts/ - Documentation with embedded setup scripts — link to maintained scripts instead
- Prompt templates with inline code examples — reference tested scripts by path
The pattern is always the same: if code is embedded in a document that gets loaded repeatedly, the cost compounds. Extract it once, reference it everywhere.
Key Takeaways
- Inline code in Claude Code command files can consume 30-50% of the file’s token budget with boilerplate
- Extracting to a CLI layer cuts ~85% of that overhead with zero runtime cost
- Guide Claude Code’s behavior through
CLAUDE.mdrules, CLI documentation, and feedback memories - The same principle applies anywhere AI-generated content embeds code that’s loaded repeatedly
At EchoForgeX, we build AI-powered tools and help businesses integrate AI into their workflows. Get in touch to learn how we can help your team work smarter with AI.
Top comments (0)