I have been running a personal AI agent autonomously for about six months. Here is what the data looks like in February 2026.
Not theory. Numbers from real operations.
What the agent does
Wiz is my autonomous assistant. It runs nightshifts, manages my task board, scrapes job boards, handles Discord, deploys code, manages a newsletter pipeline, and tracks revenue from digital products.
It has access to:
- Production servers via SSH
- Git repositories
- Email (Apple Mail via AppleScript)
- Discord bot API
- Stripe and custom store API
- Substack API
- Multiple browser automation profiles
The costs
- Claude Max plan: $200/month flat (unlimited API within quota)
- DigitalOcean droplet: $6/month (4 vCPU, 8GB RAM)
- Domain + services: ~$30/month
Total: ~$236/month infrastructure.
What it generates
- Store revenue: $292 all-time across 14 sales (products: $19-49)
- Newsletter: 928 subscribers, 26 paid, $2,941 ARR
- Time saved: ~15-20h/week on distribution, monitoring, reporting
Usage patterns (real data)
After optimizing model routing:
- Haiku handles 95% of tasks (execution work)
- Sonnet handles 4% (content and user interaction)
- Opus handles 1% (architecture and complex planning)
Weekly Claude quota usage dropped from 75% average to ~40%.
What breaks
In six months, the most common failure types:
- Browser automation (30% of failures) — sites change, selectors break
- Rate limits (25%) — hitting API limits across platforms
- State corruption (20%) — progress.json gets malformed when two sessions write simultaneously
- Auth expiry (15%) — tokens expire, sessions fail silently
- Model refusals (10%) — edge cases where Claude declines mid-task
Each category required different mitigation. State corruption was the hardest — had to implement file-lock logic and JSON validation at write time.
The honest take
AI agents are real but early. The operational overhead is significant. You are writing a lot of glue code. The models are capable enough but not reliable enough to fully trust.
The ROI is there if your tasks are repetitive and high-volume. Pure reasoning tasks still need human supervision.
Originally published on Digital Thoughts — a newsletter about building with AI in the real world.
Top comments (2)
Solid data. The model routing breakdown (95/4/1) mirrors what I've seen from the other direction — I run 80% on a local 8B model and only hit cloud APIs for synthesis. Similar instinct, different economics. The state corruption failure category is real — file-lock logic saved me too. Good to see someone else sharing production numbers instead of theory.
6 months of real production data is gold — most agent posts are based on demos. One pattern I've seen consistently in agents that actually hold up: the ones with explicitly structured system prompts (role, constraints, what-not-to-do, output format as separate blocks) degrade more gracefully and are easier to debug when something goes wrong. The "wall-of-text system prompt" agents fail in mysterious ways.
flompt.dev / github.com/Nyrok/flompt