DEV Community

Cover image for Your AI coding agent is winging it. Here's how to stop that.
Sharp Dev Eye
Sharp Dev Eye

Posted on

Your AI coding agent is winging it. Here's how to stop that.

I spent months watching AI coding agents make the same mistakes across every project I threw at them:

  • Unstructured wall-of-text prompts
  • Context windows stuffed until they overflow
  • 15+ tools exposed with vague one-line descriptions
  • Zero error handling — happy path only
  • Multi-agent orchestration for tasks a single agent handles fine
  • "It seems to work" as the entire evaluation strategy

I call this workflow slop. And every AI coding tool ships with it by default.

So I built Maestro — 21 skills and 20 commands that inject workflow discipline into any AI coding agent. One install. Works with Cursor, Claude Code, Gemini CLI, Copilot, Codex, and 5 more.


What Does "Workflow Slop" Actually Look Like?

Run /diagnose on any project. You'll get a scored audit across 5 dimensions:

╔══════════════════════════════════════╗
║          MAESTRO DIAGNOSTIC         ║
╠══════════════════════════════════════╣
║ Prompt Quality       ████░  4/5     ║
║ Context Efficiency   ███░░  3/5     ║
║ Tool Health          ██░░░  2/5     ║
║ Architecture         ████░  4/5     ║
║ Safety & Reliability ██░░░  2/5     ║
╠══════════════════════════════════════╣
║ Overall Score:       15/25          ║
╚══════════════════════════════════════╝
Enter fullscreen mode Exit fullscreen mode

Every finding maps to a specific remediation command:

Score Meaning Auto-prescribed action
5 Excellent No action needed
4 Minor gaps /refine for polish
3 Functional but risky /fortify or /streamline
2 Significant issues /fortify + /guard immediately
1 Broken /onboard-agent — rebuild

No generic advice. No "consider adding tests." The agent tells you exactly which command to run next.


The 20 Commands

Every command is a structured skill file with explicit instructions, checklists, anti-patterns, and a recommended next step so the agent never leaves you hanging.

Analysis — find the problems:

  • /diagnose — Full workflow health audit with scored dimensions
  • /evaluate — Test workflow quality against realistic scenarios

Fix & Improve — targeted repairs:

  • /fortify — Add error handling, retries, fallbacks
  • /streamline — Remove over-engineering and complexity
  • /calibrate — Align naming, formatting, conventions
  • /refine — Final quality pass before shipping

Enhancement — add new capabilities:

  • /amplify /compose /enrich /accelerate /chain /guard /iterate /temper /turbocharge

Utility — setup and adaptation:

  • /teach-maestro /onboard-agent /specialize /adapt-workflow /extract-pattern

Install in 30 Seconds

Option A: Skill Files (any provider)

npx skills add sharpdeveye/maestro
Enter fullscreen mode Exit fullscreen mode

Works with Cursor, Claude Code, Gemini CLI, Codex CLI, VS Code Copilot / Antigravity, Kiro, Trae, OpenCode, and Pi.

Option B: MCP Server (any MCP client)

{
  "mcpServers": {
    "maestro": {
      "command": "npx",
      "args": ["-y", "maestro-workflow-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Drop that in your MCP config. Done. 20 prompts, 4 tools, 8 knowledge resources — instantly available.


Why This Isn't Just Another Prompt Collection

Most "AI skill" repos are prompt dumps. Maestro is an ecosystem:

Feature Prompt dumps Maestro
Structure Random .md files YAML frontmatter + versioned skills
Flow Dead ends Every command recommends the next step
Anti-patterns None Explicit "NEVER do X" in every skill
Context Hope the AI figures it out .maestro.md project context protocol
Delivery Copy-paste files npx install + MCP server + 10 providers
Evaluation None /diagnose scores 5 dimensions 1-5

The ecosystem forms a loop:

/teach-maestro → /diagnose → /fortify → /evaluate → /refine
       ↑                                                  |
       └──────────────── continuous improvement ──────────┘
Enter fullscreen mode Exit fullscreen mode

Real Example: What /diagnose Found in My Project

I ran /diagnose on my production app. It found:

  1. Wallet service handling real money with zero test coverage. Idempotency keys were implemented, but no tests verified they actually prevent double-credits. Score: Safety 2/5.

  2. Two services using DB transactions without try/catch. If a deadlock occurs, the exception bubbles unhandled and the user gets a raw 500 error.

  3. Frontend deploying to Cloudflare Pages without tsc --noEmit. Type errors were reaching production undetected.

Each finding came with a specific command: /fortify WalletService, /guard financial-flows, /fortify frontend-build.

That's the difference between "you should probably add tests" and "Run /guard on your wallet service because your financial operations have zero test coverage and idempotency keys are unverified."


The Architecture

source/skills/           ← 21 skill definitions (source of truth)
├── agent-workflow/      ← Core skill + 7 reference docs
│   └── reference/       ← Prompt engineering, context mgmt, etc.
├── diagnose/            ← Analysis commands
├── fortify/             ← Fix commands
├── amplify/             ← Enhancement commands
└── teach-maestro/       ← Utility commands

scripts/
├── build.js             ← Copies to 10 provider directories
├── bundle-skills.js     ← Bundles into MCP server
└── validate.js          ← Validates frontmatter + references

mcp-server/              ← npm package: maestro-workflow-mcp
├── tools.ts             ← 4 MCP tools with template resolution
├── prompts.ts           ← 20 MCP prompts
└── resources.ts         ← 8 read-only knowledge resources
Enter fullscreen mode Exit fullscreen mode

One source. 10 providers. One MCP server. Everything validated, bundled, and versioned.


What's Next

  • More references — domain-specific guides for testing, deployment, observability
  • Scoring trends — track /diagnose scores over time
  • Community skills — contribute your own commands via PR

Try It

# Install skills
npx skills add sharpdeveye/maestro

# Or add the MCP server
# → add to your mcp config: npx -y maestro-workflow-mcp

# Then run your first diagnostic
/diagnose
Enter fullscreen mode Exit fullscreen mode

If it finds workflow slop — it will.

GitHub: github.com/sharpdeveye/maestro
npm: maestro-workflow-mcp
License: MIT


If this saved you from one more "it seems to work" deployment, consider dropping a ⭐ on the repo. It helps more than you think.

Top comments (0)