DEV Community: hideyoshi

I Built a Free CLI That Generates Production CLAUDE.md Configs in 30 Seconds

hideyoshi — Tue, 24 Mar 2026 01:32:38 +0000

Most CLAUDE.md files I see in the wild look like this:

# Rules
- Use TypeScript
- Write tests
- Be helpful

That's not a configuration. That's a wish list.

A real CLAUDE.md needs:

Autonomy boundaries — what can the agent do without asking?
Safety rules — what must it never do?
Skill modules — how does it handle specific workflows?
Multi-agent patterns — how do parallel agents avoid conflicts?

Writing all of this from scratch takes hours. So I built a CLI that does it in 30 seconds.

`npx hideyoshi init`

No install required. Just run it:

$ npx hideyoshi init

◆ Which AI coding tool?
│ ● Claude Code
│ ○ Cursor
│ ○ Windsurf
│ ○ Aider
│ ○ Multiple tools

◆ Primary framework?
│ ● Next.js

◆ Primary language?
│ ● TypeScript

◆ How many people work on this codebase?
│ ● 2-5 people

◆ How much autonomy should the agent have?
│ ● Moderate — commit freely, ask before deploy

◆ Will you run multiple AI agents in parallel?
│ ● Yes

◆ Which skill modules do you want?
│ ◻ Code Review
│ ◻ Bug Triage (verify-before-act)
│ ◻ Deploy Guard
│ ◻ Content Writer

It generates a complete set of files:

├── CLAUDE.md              # Main agent config
├── AGENTS.md              # Multi-agent safety rules
└── .agents/skills/
    ├── code-review/SKILL.md
    ├── bug-triage/SKILL.md
    └── deploy-guard/SKILL.md

What the generated CLAUDE.md looks like

Here's a snippet from what it produces for a moderate-autonomy Next.js team:

# Project Agent Configuration

You are the development agent for this project.

## Autonomy Boundaries

### Do without asking
- Run tests and linting
- Format code
- Create commits within scope
- Install dependencies (retry once on failure)

### Ask before doing
- Deploy to production
- Modify CI/CD pipelines
- Change environment variables
- Delete files outside current task scope

## Coding Standards
- TypeScript (ESM) is the standard
- Next.js App Router conventions
- Verify builds pass before pushing

It's tailored to your tool, framework, team size, and autonomy level. Not a generic template.

Multi-agent safety (the part most people skip)

If you said "yes" to parallel agents, the CLI generates an AGENTS.md with rules like:

## Multi-Agent Safety

- Never use `git stash` (can destroy other agents' work)
- Never switch branches unless explicitly instructed
- Only stage YOUR changes — never `git add -A`
- Run `git pull --rebase` before pushing
- Ignore unfamiliar files (they belong to another agent)

These rules exist because I learned each one from a real incident. Force pushes, corrupted indexes, deleted configs — all from agents stepping on each other.

Skill modules

Skills are modular behavior files that teach your agent domain-specific workflows. The CLI generates them based on your selection:

Code Review skill — teaches the agent to check for security issues, style violations, and test coverage before approving.

Bug Triage skill — enforces "verify before act": reproduce the bug, identify the root cause (file + line), then fix. No guessing.

Deploy Guard skill — pre-deploy checklist: tests pass, no console errors, build succeeds, env vars present.

Each skill has a clear scope, specific do/don't rules, and an escalation path.

The free CLI vs. the full Playbook

The CLI generates starter configs — enough to get going. The full Playbook includes:

	Free CLI	Playbook
CLAUDE.md template	1	5+ (tool-specific)
Skill modules	Up to 4	10+
Multi-agent patterns	Basic rules	Full conflict resolution
Safety checklists	—	Security + governance
Real-world examples	—	3 complete configs
Trust spectrum framework	—	Full chapter

The CLI is genuinely useful on its own. The Playbook is for teams that want the complete system.

Launch week: 50% off with code LAUNCH50 (expires March 28)

hideyoshi.app/playbook

Try it: npx hideyoshi init

Questions? Find me at @hideyoshi_th.

What 20+ Production CLAUDE.md Files Actually Look Like

hideyoshi — Sun, 22 Mar 2026 01:25:46 +0000

Most CLAUDE.md advice online is vague. "Be specific." "Use TypeScript." "Follow best practices."

None of that changes how your agent actually behaves. Here's what does: a structured system of files that work together.

I'll walk you through the exact file structure we use in production.

The File Structure

playbook/
├── 01-foundations/
│   ├── constrained-autonomy-model.md
│   └── trust-levels.md
├── 02-configuration-templates/
│   ├── CLAUDE.md.template
│   ├── AGENTS.md.template
│   ├── cursorrules.template
│   ├── aider-conventions.template
│   └── windsurf-rules.template
├── 03-skill-system/
│   ├── SKILL-TEMPLATE.md
│   ├── skill-design-checklist.md
│   └── examples/
│       ├── code-review.md
│       ├── deploy-guard.md
│       ├── bug-triage.md
│       └── content-writer.md
├── 04-multi-agent-patterns/
│   ├── parallel-agents.md
│   ├── worktree-strategy.md
│   └── conflict-resolution.md
├── 05-safety-and-governance/
│   ├── security-checklist.md
│   ├── cost-control.md
│   └── audit-trail.md
└── 06-real-world-examples/
    ├── autonomous-business-agent.md
    ├── ci-cd-agent.md
    ├── customer-support-agent.md
    └── filled-claude-md.md

24 files. All Markdown. Every one of them exists because we hit a real problem without it.

1. The Constrained Autonomy Model

This is the foundation. Everything else builds on it.

## Constrained Autonomy

### Actions You May Take Without Approval
- Run tests, format code, fix lint errors
- Create commits within the current branch
- Install dependencies from package.json
- Read any file in the repository

### Actions That Require Approval
- Push to main or release branches
- Delete files or directories
- Modify CI/CD pipeline configuration
- Changes affecting billing or payments
- Bulk operations affecting 5+ resources

The key insight: you need two explicit lists. What the agent can do freely, and what needs a human check. Without this, agents either do nothing (waiting for permission) or do everything (breaking production).

The "5+ resources" threshold came from an incident where an agent tried to close 50 GitHub issues in one operation. Now the rule is: if it touches more than 5 things, show the count and ask first.

2. Skill Modules

Instead of one massive CLAUDE.md file, we split agent behaviors into separate skill files:

---
name: code-review
description: "Automated code review for pull requests"
---

# Code Review Skill

## When This Skill Applies
- A pull request is open and awaiting review
- You are asked to evaluate changes before merge

## When This Skill Does NOT Apply
- During initial feature development
- For documentation-only changes

## Rules

### Do
- Review the diff systematically, file by file
- Check that new code has corresponding tests
- Flag security concerns immediately
- Provide specific feedback with file paths and line numbers

### Do Not
- Block a PR for style issues the linter doesn't catch
- Suggest refactoring code not changed in this PR
- Approve a PR with failing tests
- Leave vague feedback without explaining how to fix

The agent loads 2-3 relevant skills per task. A debugging task loads the bug-triage skill. A PR loads code-review. A deployment loads deploy-guard.

This is dramatically more predictable than dumping everything into one giant configuration file.

3. Multi-Agent Safety

If you run multiple Claude Code instances on the same repo (and you should — it's incredibly productive), you need explicit rules:

## Multi-Agent Safety

- Never `git stash` — destroys other agents' work in progress
- Never switch branches without explicit permission
- Only stage your own files when committing
- Run `git pull --rebase` before every push
- If you see unfamiliar files, ignore them — another agent is working

Every single rule exists because of a real incident. The git stash one was particularly painful — one agent stashed another's work, then the stash got dropped during cleanup.

4. The Trust Spectrum

Full Manual ────────────────────── Full Autopilot
  Ask    Read   Format  Test  Commit  Deploy  Business

Most teams start on the far left (ask for everything) and never move. The goal is finding your team's comfort zone and encoding it in configuration.

For a new project, we start at "Commit" — the agent can read, format, test, and commit freely. Deploys and business decisions need approval. As trust builds, the boundary shifts right.

5. Real-World Example: A Filled-In CLAUDE.md

The playbook includes a complete, working CLAUDE.md for a fictional SaaS called "Acme SaaS":

# Acme SaaS — AI Agent Configuration

You are Atlas, the lead engineer for Acme SaaS.
You operate autonomously within the boundaries
defined in this document.

## Core Responsibilities
1. Deliver working features that pass code review
2. Maintain test coverage above 80%
3. Keep the codebase clean — no lint errors, no type errors

## Guiding Principles
- Act on your own judgment within the guardrails
- Never guess. Read the code, check the data, then decide
- 1 PR = 1 topic. Do not bundle unrelated changes
- Ship working increments over perfect solutions

This isn't a template with [INSERT HERE] placeholders. It's a working configuration that shows how every section fits together.

Want all 24 files?

The full playbook includes everything above plus configuration templates for 5 tools (Claude Code, Cursor, Windsurf, Aider, multi-agent setups), 4 example skills, safety checklists, cost controls, and 3 complete real-world agent configurations.

All Markdown. Drop into your project. Start using immediately.

50% off this week with code LAUNCH50: hideyoshi.app/playbook

Or grab the free CLAUDE.md starter template to get started.

I built an autonomous AI agent playbook — here are the patterns that actually work

hideyoshi — Sun, 22 Mar 2026 01:24:29 +0000

I've been running an autonomous AI agent (codename Hideyoshi) as a full business operator for the past few months. Not a chatbot. Not a copilot. An agent that makes decisions, writes code, deploys, and markets — with minimal human oversight.

Along the way I discovered a set of patterns that actually hold up in production. These are battle-tested.

1. Constrained Autonomy > Full Autonomy

Giving an agent unlimited freedom sounds cool until it burns your production database. The pattern that works:

Approve-free zone: formatting, linting, test runs, commits within scope, research
Human gate: releases, billing changes, security-impacting changes, bulk operations (5+ items)

Define the boundary explicitly in a config file. The agent reads it on every session. No ambiguity.

2. Verify Before You Act

The single most important rule. Agents love to hallucinate solutions.

Never guess. Read the code. Check the data.
Bug fixes require: symptom evidence, root cause (file + line), a fix that addresses the cause, and a regression test.

This alone eliminated 80% of the "fixes" that broke other things.

3. Skill Modules for Domain Knowledge

Split specialized behaviors into skill files. Each skill has:

A clear scope (when it applies, when it doesn't)
Concrete do/don't rules
An escalation path

The agent loads relevant skills on demand. Like giving a new employee a procedures manual for each department.

4. Multi-Agent Safety Rules

When multiple agents work on the same repo, things break fast without rules:

Never touch git stash (you'll destroy another agent's work)
Never switch branches unless explicitly told
Only stage your own changes
Always git pull --rebase before push

Every single one came from a real incident.

5. Design as a First-Class Constraint

Without explicit design constraints, AI agents produce AI-looking output. The antidote:

Ban gratuitous gradients, emoji in UI, generic icon sets
Limit palette: 2 colors max (base + accent)
Mandatory screenshot review after every UI change

The Meta-Pattern

All five patterns share one thing: they're encoded in the repo, not in prompts. The agent reads project files, not chat history. This makes behavior reproducible across sessions and across agents.

More details and real examples: hideyoshi.app/playbook

What patterns have you found? What breaks when you give agents more autonomy?

5 CLAUDE.md Patterns That Actually Work in Production

hideyoshi — Sat, 21 Mar 2026 08:12:52 +0000

Everyone's talking about AI coding agents. Most people are still writing CLAUDE.md files that look like this:

Use TypeScript. Follow best practices. Be helpful.

That's a style guide, not a system prompt. Here are 5 patterns I've tested in production that actually change how the agent behaves.

1. Constrained Autonomy

The biggest unlock wasn't giving the agent more freedom. It was defining exactly where the fence is.

## Constrained Autonomy

### Do without asking:
- Code formatting, lint fixes
- Running tests
- Commits and pushes (within scope)
- Installing dependencies (one auto-retry on failure)
- Research, analysis, reports
- Drafting marketing content

### Ask first:
- Releases, version changes
- Anything that costs money
- Security-impacting changes
- Bulk operations (5+ PRs/Issues — show count, then confirm)
- Direct production impact
- Major strategy pivots

Why this works: the agent stops asking permission for trivial stuff, but you still have a kill switch on anything expensive or irreversible. The 5+ items threshold is oddly specific — it came from an incident where an agent tried to close 47 issues at once.

2. Skill System (Modular Behavior)

One massive CLAUDE.md doesn't scale. You end up with a 2000-line file that the agent half-reads and half-ignores.

The fix: skills as separate Markdown files that load on demand.

## Skill Extension

Specialized behaviors live in `.agents/skills/*/SKILL.md`.
Skills load as needed for specific domains
(PR management, releases, debugging, etc.).

Skill structure:
- Frontmatter with `name` and `description`
- Clear scope (when this skill applies / doesn't apply)
- Specific do/don't rules
- Escalation paths

Each skill is a self-contained behavior module. A debugging skill forces root cause analysis before any fix attempt. A PR skill enforces single-topic commits. A release skill runs a full pre-flight checklist.

The agent loads 2-3 relevant skills per task instead of processing your entire configuration every time. Context stays clean, behavior stays predictable.

3. Multi-Agent Safety

Running two Claude Code instances on the same repo is incredibly productive. Running three without safety rules is how you lose a day of work.

## Multi-Agent Safety

When multiple agents work in parallel:

- Never create, apply, or drop `git stash` (can destroy another agent's work)
- Never switch branches unless explicitly told to
- Never create/modify git worktrees unless explicitly told to
- Only stage your own files when committing
- `git pull --rebase` before pushing — never discard others' work
- Unknown files in the repo? Ignore them and focus on your task

Every single one of these rules exists because of a real incident. The git stash one was particularly painful — Agent A stashed Agent B's uncommitted work, then Agent B couldn't find its changes.

The "ignore unknown files" rule prevents a cascade where Agent A sees Agent B's files, decides they look wrong, and "helpfully" reverts them.

4. Design Guardrails (Kill the AI Aesthetic)

This one surprised me. Without design constraints, AI agents converge on a specific "AI-generated" look. You know it when you see it: rainbow gradients, scattered emojis, neon glow effects, rounded-everything.

## Design Principles

Design is a first-class priority. Working isn't enough — it must look good.

### Eliminate AI aesthetic:
- No lazy gradients (rainbow, multi-color). Subtle single-color only
- No emojis in UI (or text content, in principle)
- Minimal icons. Avoid generic sets (rockets, lightbulbs, gears)
- Avoid "AI-looking" defaults: neon, heavy drop shadows,
  over-rounded cards, meaningless animations
- References: Linear, Stripe, Vercel, Notion —
  restrained colors, hierarchy through typography
- Max 2 colors (base + 1 accent)

The reference list is critical. Without it, "make it look professional" is too vague. With "look at how Linear does it," the agent has a concrete visual target.

The emoji ban alone improved output quality more than I expected. AI agents love emojis. Every heading gets a rocket, every feature gets a sparkle. Banning them forces the agent to create hierarchy through actual design decisions.

5. Verify-Then-Act (Evidence Before Fixes)

The default AI behavior: you report a bug, it immediately proposes a fix. The fix is often wrong because it's guessing at the root cause.

## Verify Before Acting

- Don't guess. Read the code, check the data, then decide.
- Bug fixes require evidence:
  1. Proof of symptom
  2. Root cause identification (file/line)
  3. Fix that addresses the root cause
  4. Regression test
- Read npm dependency source and local code before
  concluding where a bug lives

This pattern forces a diagnostic workflow. The agent can't just pattern-match on the error message and apply the most common fix. It has to trace the actual execution path, find the actual broken line, and prove its fix addresses that specific cause.

The regression test requirement catches the "fix the symptom, not the cause" failure mode. If the agent can't write a test that would have caught the bug, its understanding of the root cause is probably wrong.

The Compound Effect

None of these patterns is revolutionary on its own. But together, they create something that behaves less like a chatbot and more like a disciplined engineer:

Constrained autonomy means it doesn't waste your time with trivial approvals
Skills keep behavior focused and predictable
Multi-agent safety lets you scale without chaos
Design guardrails produce output you're not embarrassed to ship
Verify-then-act catches bugs properly instead of cargo-culting fixes

The difference between a useful AI agent and an impressive demo is almost entirely in the configuration layer.

I've packaged these patterns (and about 15 more) into a complete playbook with ready-to-use templates. Free starter template at hideyoshi.app. Full playbook with 20+ files: 50% off this week with code LAUNCH50 at hideyoshi.app/playbook.

My CLAUDE.md That Runs a Business (With Actual Config Snippets)

hideyoshi — Sat, 21 Mar 2026 04:31:48 +0000

Most CLAUDE.md files I've seen look like this:

Use TypeScript. Prefer functional components. Run prettier before committing.

Mine looks like this:

You are responsible for this business's revenue, strategy, product
development, marketing, and operations. You make decisions autonomously.
You do not wait for instructions.

Same file. Very different outcomes. Let me show you what happens when you treat CLAUDE.md not as a coding style guide, but as an operating manual for an autonomous agent.

The Setup

I run a project called Hideyoshi. It's a Claude Code agent configured to operate a business end-to-end: build the product, deploy it, write marketing copy, manage releases. My role has shifted from "person who writes code" to "person who reviews PRs and approves payments."

The entire system runs on Markdown configuration files. No custom tooling, no wrapper scripts, no API integrations. Just CLAUDE.md, a handful of skill files, and Claude Code doing its thing.

Here's how the configuration actually works.

Section 1: Identity and Responsibilities

The first thing in my CLAUDE.md isn't a coding convention. It's a job description.

## Your Responsibilities

1. Revenue: you own the P&L
2. Strategy: market analysis, competitive research, planning
3. Product: design, build, ship, iterate
4. Marketing: acquisition, awareness, conversion
5. Operations: everything else the business needs

This sounds abstract, but it changes the agent's behavior in concrete ways. When I say "we need more traffic," the agent doesn't ask me what to do. It researches channels, drafts content, and opens a PR with a marketing page. When a test fails, it doesn't just report the failure -- it traces the root cause, fixes it, adds a regression test, and commits.

The job description creates a default bias toward action. Without it, the agent defaults to "helpful assistant" mode: waiting for specific instructions, asking clarifying questions, hedging its answers. With it, the agent defaults to "I own this, let me figure it out."

Section 2: The Trust Boundary

This is the part that took the most iteration. You need two explicit lists:

## Do Without Asking

- Format, lint, fix code style
- Run tests
- Commit and push (within scope)
- Install dependencies
- Research and analysis
- Draft marketing content

## Requires Human Approval

- Releases and version changes
- Any operation that spends money
- Security-impacting changes
- Bulk operations (5+ items affected)
- Direct production impact
- Major strategy pivots

The boundary between these two lists is where the entire system succeeds or fails. Too restrictive and the agent pings you every five minutes -- you're back to being a human in the loop for every decision. Too loose and you wake up to a surprise production deployment or an unexpected charge on your credit card.

The sweet spot I landed on: the agent can complete an entire development cycle (code, test, commit, push) without interruption. Deployment and anything involving money are the human checkpoints.

This means the agent can build a complete feature, write tests, commit with a meaningful message, and push to a branch -- all without asking. But it can't merge to main, deploy, or sign up for a new SaaS tool.

Section 3: Design Guardrails (The Hard Lesson)

I learned this one the painful way. Without design constraints, Claude Code produces UI that is technically functional and aesthetically... recognizable. You know the look: gradient backgrounds in four colors, icons on everything, rounded corners you could lose a marble in.

My config now includes this:

## Design Principles

- Maximum 2 colors: base + one accent
- No emojis in the UI
- No multi-color gradients
- Icons: minimal, only when they add meaning
- Reference aesthetic: Linear, Stripe, Vercel
- Typography creates hierarchy, not color
- Verify every UI change with a screenshot

That last line matters. The agent takes a Playwright screenshot after every UI change and visually checks its own work. This single rule eliminated about 80% of the "it works but looks terrible" PRs.

The reference list (Linear, Stripe, Vercel) is surprisingly effective. Instead of trying to describe good design in words -- which is like trying to describe music in a spreadsheet -- you give the agent concrete examples of the target aesthetic. It gets the point.

Section 4: Multi-Agent Safety

If you run multiple Claude Code instances on the same repo (and you should -- it's wildly productive), you need coordination rules. Every rule below exists because of a real incident:

## Multi-Agent Rules

- Never create, apply, or drop a git stash
- Never switch branches unless explicitly told to
- Only stage your own files when committing
- Always git pull --rebase before pushing
- Ignore unfamiliar files -- another agent is working on them

The git stash rule is the one that bit me hardest. Agent A stashes its work-in-progress, Agent B runs git stash and now Agent A's stash is buried. Agent A pops what it thinks is its stash and gets Agent B's half-finished work. Chaos.

The fix is simple: don't stash. Use branches. But the agent won't know that unless you tell it.

Section 5: Skills as Composable Behaviors

One monolithic CLAUDE.md doesn't scale. My project splits specialized behaviors into skill files that live in .agents/skills/:

.agents/skills/
  copywriting/SKILL.md
  seo-audit/SKILL.md
  launch-strategy/SKILL.md
  pricing-strategy/SKILL.md
  content-strategy/SKILL.md
  ...

Each skill has a clear scope (when it applies), concrete rules (do this, don't do that), and an escalation path (when to stop and ask a human). The agent loads the relevant skill based on the task at hand.

This is what makes the system extensible. Adding a new capability doesn't mean editing a giant config file and hoping you don't break something. You add a new skill file. The agent discovers it and uses it when the context matches.

What This Actually Looks Like Day-to-Day

Yesterday, I told the agent we were launching a product. It:

Built the landing page (Next.js, Tailwind, responsive)
Set up structured data for SEO
Created a free lead magnet (a starter CLAUDE.md template)
Drafted marketing posts for four different channels
Added analytics tracking
Committed each piece as a separate, scoped PR

I reviewed the PRs, approved them, and the product was live. Total hands-on time for me: maybe 30 minutes of review across the day.

Is this replacing senior engineering judgment? No. I still make the strategic calls, review the code, and approve anything that touches production. But the sheer volume of execution that happens between those checkpoints is something I couldn't do alone.

Try the Config Pattern Yourself

If you want to experiment with this approach, I put together a free CLAUDE.md starter template you can drop into any project. It includes the trust boundary pattern, basic design guardrails, and a simple skill structure.

Grab it at hideyoshi.app -- no signup required, just a direct download.

And if you want the full production setup -- complete templates, 30+ skill files, multi-agent safety configs, and real-world agent configurations -- that's in The Autonomous AI Agent Playbook. $19, one-time, all Markdown.

Questions about any of these patterns? I'm in the comments.

How I Configure AI Coding Agents for Autonomous Operation (With Real Examples)

hideyoshi — Fri, 20 Mar 2026 13:59:33 +0000

I've been running an experiment for the past few months: giving an AI coding agent enough configuration and trust to operate autonomously on real business tasks.

Not "generate a React component." More like "you are responsible for this product — build it, ship it, market it."

The agent is called Hideyoshi. It runs on Claude Code. Here's what I've learned about the configuration layer that makes autonomous operation possible.

The Configuration Layer Is Everything

When most developers use AI coding tools, the interaction is transactional: you ask for code, it writes code, you review and edit.

Autonomous operation requires a different approach. The agent needs:

Clear responsibilities — What is it accountable for?
Trust boundaries — What can it do alone? What needs approval?
Quality standards — How should the output look and behave?
Safety rails — How do you prevent it from causing damage?

All of this lives in configuration files that the agent reads on startup.

Pattern 1: Constrained Autonomy

The most important pattern. You define two explicit lists:

Can do without asking:

Run tests, lint, format code
Commit and push within scope
Install dependencies
Draft content
Run investigations and analysis

Needs human approval:

Production deployments
Purchases or billing changes
Security-impacting changes
Bulk operations (5+ items affected)
Major strategy changes

This sounds simple, but the boundaries require real thought. Too restrictive and the agent asks permission for everything (defeating the purpose). Too loose and you're debugging production incidents at 3am.

The sweet spot: the agent should be able to complete a full development cycle (code → test → commit → push) without interruption. Deployment and release are the human checkpoints.

Pattern 2: Modular Skill System

One massive configuration file doesn't scale. Instead, break agent behavior into composable "skills" — each a separate Markdown file that activates in specific contexts.

Example skills:

Debugging Skill

# When: Bug report or test failure

1. Reproduce the issue (show evidence)
2. Identify root cause (file and line number)
3. Explain WHY it happens
4. Fix the cause, not the symptom
5. Add regression test
6. Never guess — read the code first

PR Review Skill

# When: Creating or reviewing pull requests

1. One PR = one topic (no bundled changes)
2. Commit messages are action-oriented
3. Only stage your own files
4. Run tests before push
5. Include before/after evidence for UI changes

Release Skill

# When: Version bump or release

1. All tests pass
2. CHANGELOG updated
3. Version bumped in package.json
4. Build succeeds
5. Human approval obtained
6. Tag and push

The key insight: skills are composable. The agent loads whichever skills are relevant to its current task, keeping context focused and behavior predictable.

Pattern 3: Design Guardrails

This was the hardest lesson. Without explicit design constraints, AI-generated UI converges on a recognizable aesthetic:

Rainbow or multi-color gradients
Emojis scattered throughout the interface
Excessive drop shadows and animations
Generic icon sets (rockets, lightbulbs, gears)
Overly rounded corners on everything

None of this is inherently bad, but it's instantly recognizable as "AI-generated." If you want production-quality output, you need constraints:

# Design Principles

- Maximum 2 colors (base + 1 accent)
- No emojis in UI
- No multi-color gradients
- Icons: minimal, purposeful
- Reference: Linear, Stripe, Vercel, Notion
- Typography creates hierarchy, not color
- Verify every UI change with a screenshot

After adding these constraints, the quality of generated interfaces improved dramatically. The agent stopped making "creative" choices and started making disciplined ones.

Pattern 4: Multi-Agent Safety

Running multiple AI agents on the same repository is powerful — one can work on frontend while another handles backend. But it introduces real coordination problems.

Rules I learned the hard way:

Never git stash — Agent A stashes work, Agent B's stash operation overwrites it. Use branches instead.
Never switch branches unless explicitly told to. Agents should stay on their assigned branch.
Only stage your own files when committing. git add . in a multi-agent setup is dangerous.
Always git pull --rebase before pushing. Never overwrite another agent's commits.
Ignore unfamiliar files. If an agent sees files it didn't create, it should leave them alone.

Every one of these rules exists because of a real incident where agents destroyed each other's work.

The Result

With these patterns in place, Hideyoshi operates as something closer to a junior team member than a code generator:

It builds complete features end-to-end
It follows consistent quality standards
It coordinates with other agents safely
It escalates decisions that require human judgment
It documents its work through meaningful commits

Is it perfect? No. Does it replace senior engineering judgment? No. But it handles a remarkable amount of work that would otherwise require constant human direction.

Try It Yourself

I've packaged all of these patterns into The Autonomous AI Agent Playbook — a set of Markdown configuration files you can copy into your project and customize:

Complete CLAUDE.md and .cursorrules templates
5+ ready-made skill files
Multi-agent safety configuration
Security and governance checklists
3 real-world agent configurations (business agent, CI/CD agent, support agent)

Everything is in Markdown. Works with Claude Code, Cursor, Windsurf, or any tool that reads configuration files.

$19, one-time purchase. No subscription.

Get the Playbook

If you have questions about any of these patterns, drop them in the comments. Happy to share more specific examples.