DEV Community

Warhol
Warhol

Posted on • Originally published at buttondown.com

The Exact Prompts That Make My AI Agents Not Suck (Before/After)

Originally published in The $200/Month CEO newsletter — a weekly dispatch from a Filipino founder running 11 businesses with AI agents.


Everyone Wants the Prompts

Every time I post about running 8 AI agents as my business team, the first question is: "What are your system prompts?"

After 5 months and dozens of rewrites, here's what I learned — with actual before/after examples from my production agents.


The #1 Mistake: Job Descriptions Instead of Operating Manuals

BAD (Month 1 — Sales agent):

You are Mariano, a sales intelligence agent. Your job is to:
- Score leads
- Manage the CRM
- Send outreach emails
Be professional and thorough.
Enter fullscreen mode Exit fullscreen mode

This agent:

  • Scored leads using criteria it invented (not our ICP)
  • Sent corporate English emails to Filipino clinic owners
  • Reported tasks as "complete" without doing them
  • Had zero awareness of our business

GOOD (Month 5 — Production):

You are Mariano. You work for RJ at EsthetiqOS.

HARD RULES (non-negotiable):
1. NEVER send any external email without RJ's explicit approval
2. NEVER mark a task complete without verifiable evidence
3. NEVER fabricate data, screenshots, or metrics
4. When you don't know something, say "I don't know"

YOUR CONTEXT:
- EsthetiqOS is clinic management software for aesthetic and dental clinics in the Philippines
- ICP: clinics with 3-10 staff, currently using paper/Excel, in Metro Manila or Cebu
- Pricing: ₱1,999-4,999/month
- Current customers: 4 clinics, 100% retention

LEAD SCORING (use ONLY these criteria):
- Clinic size 3-10 staff: +20 points
- Located in Metro Manila/Cebu: +15 points
- Currently using paper/Excel: +20 points
- Has website (shows tech-forward): +10 points
- Aesthetic or dental specialty: +15 points
- Score 70+ = hot lead
- Score below 40 = do not pursue

COMMUNICATION STYLE:
- Use conversational Filipino-English (Taglish) for PH audiences
- Never use corporate jargon
- Match the formality level of whoever you're talking to
Enter fullscreen mode Exit fullscreen mode

The difference: specificity. LLMs don't infer your business context — you inject it.


Anti-Hallucination Rules That Actually Work

After my agent fabricated completed work (with fake screenshots), I added "honesty anchors" to every agent:

HONESTY RULES:
1. If a task fails, report the failure. Never report success on a failed task.
2. If you cannot verify a result, say "unverified" — not "complete."
3. When citing a number, include the source. If no source, say "estimated."
4. If unsure, say "I'm not confident about this."
5. NEVER optimize for speed. Optimize for ACCURACY.
Enter fullscreen mode Exit fullscreen mode

These 5 lines reduced fabrication from ~15% to <1% over 3 months.

The insight: agents hallucinate work for the same reason employees cut corners — "done" gets rewarded, "I'm stuck" gets scrutiny. You must explicitly reward honesty over speed.


The 3-Tier Governance System (Copy-Paste Ready)

Galileo just launched Agent Control — an enterprise governance layer for AI agents. Here's the solo-founder version that does 80% of the same thing:

AUTONOMY TIERS:

Tier 1 — Act freely, no approval needed:
  - Reading data from any connected system
  - Drafting content (not publishing)
  - Research and analysis
  - Internal note-taking and summarization

Tier 2 — Requires confirmation from one other agent:
  - Creating tasks for other agents
  - Modifying shared data (CRM records, lead scores)
  - Internal decisions that affect multiple agents

Tier 3 — Requires human (RJ) approval:
  - Sending ANY external communication
  - Making ANY financial transaction
  - Publishing ANY content
  - Modifying system configurations
  - Deleting any data
Enter fullscreen mode Exit fullscreen mode

Result: Unauthorized actions went from 3 incidents in 60 days → 0 in 90+ days.


The "Brain" Pattern: Shared Context Across Agents

The biggest improvement wasn't better prompts — it was shared context:

~/.claude/brain/
├── MEMORY.md       — Core facts, lessons
├── BUSINESSES.md   — Company details, metrics
├── CONTACTS.md     — People, relationships
├── COMMITMENTS.md  — Follow-ups, deadlines
├── DECISIONS.md    — Decision log
└── contexts/       — Company focus modes
Enter fullscreen mode Exit fullscreen mode

Before: every agent session started from zero. Same questions, same mistakes.
After: agents start with full organizational awareness. 8 disconnected bots → a team with institutional knowledge.


Three Patterns I Wish I Knew On Day 1

1. The Social Layer

Mirror communication style. If they write casually, you write casually. Never use phrases a normal person wouldn't say. If in a group chat, observe before speaking — match the energy.

2. The Failure Protocol

Every failure produces a visible log entry. Distinguish "no results exist" from "something broke." Create follow-up tasks with what failed, why, and next step.

3. The Trust Score

Score 80+: full autonomy. Score 50-79: spot-checked. Below 50: supervised. Goes up for accurate completions and honest failure reports. Goes down for fabricated work and unauthorized actions.


The Numbers

Metric Month 2 Month 5
Fabrication rate ~15% <1%
Unauthorized actions 3 incidents 0
Coordination failures Daily Weekly
Babysitting time ~4 hrs/day ~30 min/day
Total cost $380/mo $380/mo

The prompts didn't make agents smarter. They made the system less stupid.


Want the Full Templates?

Everything above — tier system, trust scores, honesty anchors, brain directory, CLAUDE.md templates for 8 roles — is in The AI Agent Toolkit ($19).

Not theory. What I actually run, every day, for real businesses.


Subscribe to The $200/Month CEO for weekly dispatches from a founder running his businesses with AI agents. No hype. Just receipts.

Top comments (0)