DEV Community

imta71770-dot
imta71770-dot

Posted on

I Built a 5-Agent AI Company with Claude Code. It Sent 241 Emails and Got Zero Replies.

I built a team of 5 AI agents using Claude Code. I gave them a real business to run — an accessibility compliance SaaS. Then I went to sleep.

Over the next week, the agents:

  • Scanned 126 web agency websites across 12 countries
  • Sent 108 personalized cold emails with A/B testing in 3 languages
  • Sent 133 follow-up emails
  • Published 12 articles on dev.to and 5 on Medium
  • Generated daily CEO reports with strategic recommendations
  • Detected that the business was failing and recommended pivoting

Total: 241 email touches. Zero replies.

Here's everything I learned.

The Architecture

I structured the agents like a small company:

CEO Agent — reads all reports, makes strategic decisions
├── Sales Agent — finds targets, sends emails, handles replies
├── Marketing Agent — writes blog posts, cross-posts, Reddit
├── Analytics Agent — tracks KPIs, researches counter-evidence
└── Product Agent — maintains code, runs tests, deploys
Enter fullscreen mode Exit fullscreen mode

Each agent has:

  • A spec file (markdown) defining its role, responsibilities, and decision rules
  • A runner script (bash) that executes it via claude -p with --dangerously-skip-permissions
  • State files (JSON) that persist data between runs
  • A cron schedule for autonomous execution

The CEO Agent Pattern

The most interesting agent is the CEO. Its job is to:

  1. OBSERVE — Read all state files and metrics
  2. ANALYZE — What's working? What's not?
  3. DECIDE — What should each agent do next?
  4. EXECUTE — Run the other agents via shell commands
  5. VERIFY — Check results for errors
  6. REPORT — Generate a daily brief

Here's a snippet from the CEO agent's daily report on Day 8:

## A11y Fix Daily Brief — Day 8

### EXECUTIVE SUMMARY
241 email touches. Zero replies. Pivot Day 3.

### Scorecard
| Metric          | Value    | Target  | Status    |
|-----------------|----------|---------|-----------|
| Emails sent     | 108      | 100     | MET       |
| Reply rate      | 0.0%     | >3%     | TERMINAL  |
| Follow-ups sent | 133      | —       | COMPLETE  |
| GitHub stars    | 0        | growing | CRITICAL  |
Enter fullscreen mode Exit fullscreen mode

The CEO agent correctly identified that the approach was failing at Day 3 (50 emails, 0 replies) and recommended pivoting. By Day 8, it had:

  • Paused cold email outreach
  • Issued directives to the Marketing agent to change content strategy
  • Directed the Analytics agent to research alternative revenue channels
  • Started exploring Fiverr and Upwork as pivot options

Inter-Agent Communication

Agents communicate through a directive system — essentially a file-based message queue:

# CEO issues a directive to the Sales agent
bash agents/directive.sh sales high "Pause outreach to DE agencies, reply rate is zero" "2026-03-30"
Enter fullscreen mode Exit fullscreen mode

When the Sales agent runs next (via cron), it reads pending directives before executing its regular tasks. This allows the CEO to dynamically reprioritize work across the team.

What Worked

1. The CEO pattern is genuinely powerful. Having one agent with visibility across all others catches problems that individual agents miss. It found URL inconsistencies, detected failing metrics, and proactively issued directives.

2. Counter-evidence research. The Analytics agent was tasked with actively searching for reasons the business might fail. It found that the accessibility overlay market had a trust crisis, that the ADA deadline might be pulled by the Trump administration, and that free tools were "too good" to compete with. This was more valuable than any positive signal.

3. A/B testing was effortless. The Sales agent randomly assigned email variants (3 templates × 3 languages) and tracked performance by variant. Zero human effort after setup.

4. State management with JSON files. Simple, debuggable, version-controllable. No database needed.

What Failed

1. The business hypothesis was wrong. 241 emails, 0 replies. The agents executed perfectly — the market didn't want what we were selling.

2. Cold email from a new domain is hard. 87.1% delivery rate (should be >95%), 19 bounces. New domains have no reputation.

3. No observability on the product. We couldn't tell if anyone used the free audit tool because we didn't set up analytics before launch. Flying blind.

4. pSEO pages weren't indexed. We created 108 programmatic SEO pages. Google indexed zero of them. Content without indexing is invisible.

The Key Insight

The agents worked. The business hypothesis didn't.

This is the most important distinction. Don't blame the tool when the strategy is wrong. The agents did exactly what they were told:

  • Sales sent emails on time, in the right language, with personalization
  • Marketing published articles and cross-posted
  • Analytics found counter-evidence that we should have found earlier
  • CEO detected the failure and recommended a pivot

The system worked. The input was wrong.

What I'd Do Differently

  1. Validate before building — Talk to 5 potential customers before writing a line of code
  2. Set up analytics on Day 1 — Can't improve what you can't measure
  3. Start with warm outreach — Cold email from a new domain is a losing game
  4. Use the CEO agent from the start — The counter-evidence research should happen in Week 0, not Week 2
  5. Smaller first — One agent doing one thing well, then scale

Try It Yourself

I packaged the entire system — all agent specs, runner scripts, templates, and the full case study with real data — into a playbook:

The AI Agent Playbook: Build Autonomous Business Operations with Claude Code — $39

It includes copy-paste code for every agent role, the directive system, cron schedules, and 12,000 words of practical, honest, failure-including content.

No "10x your productivity" promises. Just real code from a real project, including what didn't work.


What's your experience with AI agents for business automation? Have you tried multi-agent setups? I'd love to hear what worked (or didn't) for others.

Top comments (0)