I Was Repeating Myself
By the time the harness was working and I'd moved to on-the-loop development, my sessions with Claude had a rhythm. Pick up a Jira ticket. Read the requirements. Decide which part of the codebase it touches. Write failing tests. Get them approved. Implement. Run lint and tests. Commit. Open a PR. Watch CI. Review the diff. Maybe refactor.
Every time, I typed the same instructions. "Here's a Jira ticket. Pull the requirements with jira issue view. Write tests first. Follow the Action pattern. Run make lint and make test before committing."
It worked. But I was the ceremony. I was the one remembering the steps, enforcing the order, making sure the harness feedback loop happened. If I forgot to say "write tests first," Claude might skip straight to implementation. If I forgot to say "run lint," it might commit without checking.
The workflow was good. But it lived in my head, not in the system.
Skills: Slash Commands for Claude Code
Claude Code supports custom skills: reusable prompts you invoke with a slash command. They live in .claude/skills/ as Markdown files with a bit of frontmatter. When you type /implement-jira-card PROJ-123, Claude reads the skill definition and executes the workflow described in it.
The skill file is just a SKILL.md:
.claude/skills/
implement-jira-card/
SKILL.md
implement-change/
SKILL.md
Each skill defines the argument it expects, describes the workflow in phases, and lists the rules the agent must follow. It's the same kind of guidance as a CLAUDE.md harness file (plain Markdown, checked into the repo, version-controlled) except instead of scoping guidance to a directory, it scopes guidance to a workflow.
The Two Skills
I built two skills that cover the two ways work enters the system.
/implement-jira-card [PROJ-123] — for work that starts with a Jira ticket. The skill pulls the issue details from Jira, walks through requirements, planning, TDD, and delivery. It knows how to use jira for issue details and gh for GitHub operations.
/implement-change [description] — for everything else. Bug fixes that don't have a ticket. Follow-up tasks from code review. Small improvements. Ad-hoc work. Same workflow, but the requirements come from the user's description instead of Jira.
Both skills follow the same eight-phase workflow. The difference is where the requirements come from — Jira or the user's own words.
What the Skill Needs from Jira
The /implement-jira-card skill pulls issue details with jira issue view, but not every field matters equally. The agent needs specific information to draft a requirements plan, and our Jira structure is set up to provide it.
Epics are projects. An epic groups all the tasks for a single initiative. When the agent reads a task, the parent epic gives it the broader goal: why this work exists, what it's part of, and what other tasks sit alongside it. Without that context, the agent treats every task as isolated. With it, the agent understands where the task fits and can make better scoping decisions.
Tasks are implementation units. Each task maps to one piece of work and typically produces one PR. We don't use stories. A task is specific enough that the agent can read it and know what to build, what to test, and when it's done.
Three fields on each task do the heavy lifting:
Description — the problem statement and context. This tells the agent what needs to change and why. A good description includes enough domain context that the agent doesn't invent assumptions. "Users should not be able to approve their own orders" is better than "fix approval logic." The description feeds directly into Phase 1's requirements document.
Acceptance criteria — the conditions that define done. These translate almost directly into test cases during Phase 4. "Given a user who created an order, when they attempt to approve it, then the request should be rejected with a 403" becomes a failing test before any implementation exists. The more precise the acceptance criteria, the better the test coverage. Vague criteria produce vague tests.
Screenshots and attachments — the visual reference. For UI work, screenshots show what the result should look like. The agent uses these during implementation to match layout, placement, and content. During Phase 5, if the skill runs visual verification through Puppeteer, the screenshots serve as the expected state.
The skill pulls all three, drafts a requirements plan from them, and asks for feedback before moving on. If the card is thin (missing acceptance criteria, vague description) that becomes obvious at the first checkpoint. I either flesh out the card or fill in the gaps in conversation. Either way, the agent doesn't start writing code until the requirements are clear.
This is why card quality matters more with an agent than without one. A developer can fill in gaps from tribal knowledge and hallway conversations. An agent works with what the card gives it. Good cards produce good requirements plans. Bad cards produce a longer Phase 1 conversation, or worse, confident code that solves the wrong problem.
The Eight Phases
Here's the full workflow, phase by phase, as I actually experience it.
Phase 0: Scope the Target
The first thing the skill does is ask a question: does this change apply to the legacy application (Blade CRUD), the React SPA, or both?
This matters because the project is in a transitional state. The legacy app is typical CRUD with Blade views. The SPA is event-driven and personalized. They have different controllers, different test patterns, different harness files. Getting this wrong means the agent writes code in the wrong layer.
I answer "SPA" or "legacy" or "both," and the skill knows which harness files to consult, which layers to touch, and how to test.
Phase 1: Requirements
For /implement-jira-card, the skill runs jira issue view PROJ-123 to pull the issue details. For /implement-change, it asks me to describe the problem if the argument wasn't clear enough.
Then it creates a requirements document: the problem, acceptance criteria, and scope. And it asks for feedback.
This is the first checkpoint. I read the requirements. If something's off, I correct it. If my feedback is about code quality or patterns, such as "we don't do it that way, use the notification service instead of calling the webhook directly", then the skill does something specific: it updates the relevant CLAUDE.md harness file first, reloads it, and then revises the requirements.
That's the harness feedback loop, baked into the workflow. My correction doesn't just fix this instance. It fixes all future instances.
Phase 2: Implementation Plan
The skill creates an implementation plan: files to change, new files to create, and the testing strategy.
Another feedback checkpoint. Same rules: if my feedback is about patterns, the harness gets updated before the plan gets revised.
This phase catches architectural missteps early. If Claude plans to put business logic in a controller instead of an Action, I catch it here, not after 200 lines of implementation.
Phase 3: Branch Setup
For Jira cards, the branch is prefixed with the issue key: PROJ-123-short-description. For ad-hoc changes, it's descriptive: fix-order-approval or followup-ticket-validation.
The branch is always created from the latest origin/main. No stale branches.
Phase 4: TDD Implementation
This is the core of the workflow, and it's where TDD stops being a philosophy and becomes a protocol.
The skill writes failing tests first. PHP tests, JavaScript tests, or both — whatever the change requires. Then it presents me with a list of test descriptions and asks for feedback.
This is the second critical checkpoint. I'm reviewing what the code should do before any implementation exists. The test descriptions are the spec. If they're wrong, the implementation will be wrong no matter how clean it is.
Once I approve the tests, the skill implements the smallest changes to make them pass. It presents a description of what changed. Another checkpoint.
If at any point my feedback references patterns or code quality, the harness gets updated first, then the tests or implementation get regenerated. The ratchet turns.
Phase 5: Change Approval and Commit
The skill shows a diff of all changes and asks for approval. If I'm happy, it runs the full pre-commit checks:
make lint # Pint, Psalm, Prettier, ESLint, TypeScript
make test # 2,700+ PHP tests
make test-js # Vitest suite
If the change has UI, it runs make migrate-e2e and make seed-e2e, then uses the Puppeteer MCP server to visually verify the page. It asks whether to run the full E2E suite.
Everything passes? Commit, push, and create a PR (or update the existing one).
Everything fails? Fix it first. No exceptions.
Phase 6: CI and Code Review
The skill watches the GitHub Actions workflow in the background and fixes any CI failures. Then it runs a code review of its own changes looking for logic errors, security issues, and pattern violations.
It presents all findings at once and asks which ones to address. For each approved finding, same rules: if it's a pattern issue, update the harness first. For denied findings, skip them.
This is the agent reviewing its own work. It catches things the automated checks miss: subtle logic issues, naming inconsistencies, missing edge cases. I decide which findings are worth addressing and which are noise.
Phase 7: Refactoring
After the feature works and the code review is addressed, the skill identifies refactoring opportunities and security concerns. Same presentation format: all suggestions at once, I pick which to address.
This is where the rule of three applies. If the skill identifies duplicated code that's only duplicated twice, I skip it. If it's the third time, we refactor.
Refactoring commits are separate from feature commits. The skill follows conventional commits: feat: for the feature, refactor: for the cleanup. Separate concerns, separate commits.
Phase 8: Done
CI is green. PR is up. Code review is addressed. Refactoring is committed.
The skill confirms everything passed and asks if there's follow-up work. If there is, we start the next skill invocation. If not, we're done.
What This Looks Like in Practice
A real session:
me: /implement-jira-card PROJ-456
claude: [reads Jira issue — "Add order approval workflow to admin dashboard"]
claude: Does this apply to the legacy app, the SPA, or both?
me: SPA
claude: [creates requirements document]
claude: Here are the requirements. Feedback?
me: Looks good.
claude: [creates implementation plan — new API endpoint, React component, tests]
claude: Here's the plan. Feedback?
me: Use the existing DashboardDataService instead of a new service.
claude: [updates app/Services/CLAUDE.md to note DashboardDataService as the
preferred service for dashboard data, revises plan]
claude: Updated plan. Feedback?
me: Good.
claude: [creates branch PROJ-456-order-approval-dashboard]
claude: [writes failing tests]
claude: Here are the test descriptions:
- admin can view pending orders on dashboard
- order approval updates status and records approver
- unauthenticated user gets 401
- user without dashboard permission gets 403
- user cannot approve their own order
Feedback?
me: Add a test for orders that have already been approved.
claude: [adds test, presents updated list]
me: Good.
claude: [implements — API endpoint, React component, service method]
claude: Here's what changed. Feedback?
me: Looks right.
claude: [shows diff]
me: Ship it.
claude: [runs make lint, make test, make test-js — all pass]
claude: [commits, pushes, creates PR]
claude: [watches CI — green]
claude: [runs code review — 2 findings]
claude: Finding 1: The order type enum could use a TypeScript const assertion.
Finding 2: Missing null check on allocation end date.
Which to address?
me: Both.
claude: [fixes both, commits, pushes]
claude: All green. Anything else?
me: Nope.
That's the whole thing. From Jira ticket to merged PR. I typed maybe 30 words of direction. The skill handled the ceremony.
Why This Works
Consistency
Without the skill, the workflow depends on me remembering the steps. Some sessions I'd forget to ask for E2E verification. Some sessions I'd skip the code review phase because I was in a hurry. The skill doesn't forget. It runs the same eight phases every time.
Harness Feedback Is Built In
The most important line in both skill files:
If feedback references code quality or patterns, update the relevant CLAUDE.md harness file FIRST, reload it, then apply.
This rule appears at every feedback checkpoint. It means the harness feedback loop isn't something I have to remember to invoke; it's structural. Every piece of feedback I give either confirms the harness is working or improves it. The skill enforces this.
TDD Is Non-Negotiable
The skill writes tests before implementation. Not as a suggestion. As a phase that happens before the implementation phase exists. There's no way to skip it without aborting the skill entirely.
This is important because TDD is easy to skip when you're in a hurry. "I'll write the tests after" is the most common lie in software engineering. The skill makes the lie structurally impossible.
The Agent Reviews Its Own Work
Phases 6 and 7, code review and refactoring, are the agent auditing itself. It's not perfect. It misses things. But it catches enough that my review time drops significantly. And I get to choose which findings to act on, so it never runs away with unnecessary changes.
Separation of Concerns in Commits
The skill produces separate commits for features, fixes, and refactoring. This isn't cosmetic. When something breaks in production and you're scanning git log, the difference between feat: add order approval to dashboard and refactor: extract order calculation helper tells you instantly which commit to investigate.
The Skill File Anatomy
Both skills are plain Markdown with YAML frontmatter:
---
name: implement-jira-card
description: Analyze and implement a Jira issue using TDD — from requirements through to a PR
argument-hint: "[Jira issue key, e.g. PROJ-123]"
---
Implement the Jira issue: $ARGUMENTS
## Workflow
### Phase 0: Scope the Target
...
The $ARGUMENTS placeholder gets replaced with whatever you pass after the slash command. The description shows up in Claude Code's skill list. The argument hint tells you what to pass.
The workflow section is the actual prompt the agent follows. It's specific, phased, and full of checkpoints. The key rules section at the bottom handles edge cases and priorities.
That's it. No plugin system. No SDK. No custom tooling. A Markdown file with a workflow written in plain English.
/implement-jira-card vs /implement-change
The two skills are nearly identical. The differences:
/implement-jira-card |
/implement-change |
|
|---|---|---|
| Input | Jira issue key | Text description |
| Requirements source | jira issue view |
User conversation |
| Branch naming | PROJ-123-description |
fix-description or followup-description
|
| Tools |
jira + gh
|
gh only |
Everything else — the phases, the checkpoints, the harness feedback loop, the TDD workflow, the CI watching, the code review — is the same.
I considered making one skill with a flag, but two separate skills is clearer. When I type /implement-jira-card, I know I'm starting from a ticket. When I type /implement-change, I know I'm describing the work myself. The intent is obvious from the command.
The Feedback Checkpoints
Count the checkpoints in a single skill run:
- Scope the target (legacy, SPA, or both)
- Requirements review
- Implementation plan review
- Test descriptions review
- Implementation review
- Diff review
- Code review findings — which to address
- Refactoring suggestions — which to address
Eight checkpoints. Eight moments where I'm on-the-loop: reviewing output, giving direction, and making judgment calls. Between those checkpoints, the agent operates autonomously. It writes code, runs tests, fixes failures, manages git, and creates PRs without asking.
This is the on-the-loop workflow from post 7, made concrete and repeatable. I'm not directing input. I'm reviewing output at predetermined checkpoints.
What the Skills Don't Do
The skills don't replace judgment. They automate ceremony.
I still decide what to build. I still decide which layer it belongs in. I still review test descriptions to make sure they capture the right behavior. I still read the diff. I still choose which code review findings matter.
The skills handle the sequence: read the ticket, write tests first, implement, run checks, commit, push, create PR, watch CI, review, refactor. That sequence is the same every time. It doesn't need my attention. The judgment calls at each checkpoint do.
Building Your Own
If you want to build skills for your project:
- Start by noticing repetition. What instructions do you give the agent every session? That's your first skill.
- Define the phases. What's the sequence? Where are the checkpoints?
- Build in the harness feedback loop. Every checkpoint should have the rule: if feedback is about patterns, update the harness first, then re-apply.
- Make TDD structural. Tests before implementation. Not as guidance, but as a phase that must complete before the next phase starts.
- Include self-review. Have the agent audit its own work before you see it.
- Keep it simple. A Markdown file. Plain English. No tooling.
The skill doesn't need to be clever. It needs to be consistent. The value isn't in any single phase; it's in the guarantee that all eight phases happen, in order, every time.
The Takeaway
Custom skills are the on-the-loop workflow made executable:
- Codify your workflow, not just your patterns. The harness tells the agent how to write code. Skills tell it when to write code, when to test, when to ask for feedback.
- Every feedback checkpoint is a harness improvement opportunity. The skill enforces this. Corrections become rules before they become code.
- TDD as a phase, not a preference. The skill makes it structural. Tests come first because that's what Phase 4 says.
- Separate the ceremony from the judgment. Automate the sequence. Keep the checkpoints.
- Two skills cover most work. Jira ticket or ad-hoc description. Everything else is the same workflow.
The skills turned a workflow I was repeating from memory into a workflow the system enforces. Same eight phases. Same checkpoints. Same harness feedback loop. Every time, without fail.
That's not a minor convenience. That's the difference between a process that depends on discipline and a process that depends on structure. Structure scales. Discipline doesn't.
Top comments (0)