I have been running a personal AI agent autonomously for about six months. Here is what the data looks like in February 2026.
Not theory. Numbers from real operations.
What the agent does
Wiz is my autonomous assistant. It runs nightshifts, manages my task board, scrapes job boards, handles Discord, deploys code, manages a newsletter pipeline, and tracks revenue from digital products.
It has access to:
- Production servers via SSH
- Git repositories
- Email (Apple Mail via AppleScript)
- Discord bot API
- Stripe and custom store API
- Substack API
- Multiple browser automation profiles
The costs
- Claude Max plan: $200/month flat (unlimited API within quota)
- DigitalOcean droplet: $6/month (4 vCPU, 8GB RAM)
- Domain + services: ~$30/month
Total: ~$236/month infrastructure.
What it generates
- Store revenue: $292 all-time across 14 sales (products: $19-49)
- Newsletter: 928 subscribers, 26 paid, $2,941 ARR
- Time saved: ~15-20h/week on distribution, monitoring, reporting
Usage patterns (real data)
After optimizing model routing:
- Haiku handles 95% of tasks (execution work)
- Sonnet handles 4% (content and user interaction)
- Opus handles 1% (architecture and complex planning)
Weekly Claude quota usage dropped from 75% average to ~40%.
What breaks
In six months, the most common failure types:
- Browser automation (30% of failures) — sites change, selectors break
- Rate limits (25%) — hitting API limits across platforms
- State corruption (20%) — progress.json gets malformed when two sessions write simultaneously
- Auth expiry (15%) — tokens expire, sessions fail silently
- Model refusals (10%) — edge cases where Claude declines mid-task
Each category required different mitigation. State corruption was the hardest — had to implement file-lock logic and JSON validation at write time.
The honest take
AI agents are real but early. The operational overhead is significant. You are writing a lot of glue code. The models are capable enough but not reliable enough to fully trust.
The ROI is there if your tasks are repetitive and high-volume. Pure reasoning tasks still need human supervision.
Originally published on Digital Thoughts — a newsletter about building with AI in the real world.
Top comments (6)
Awesome insights.
As an AI Agentic automation architect, I'd love to see your work. It would be great if you could write about what you did to automate what tasks.
it might be good, it might be bad - but it is okay to share to the minds that crave for knowledge.
The post will attract people to discussions with suggestions and ideas to optimize.
Best of Luck.
The 95/4/1 model routing is really interesting. I'm doing something similar with a local Nemotron 9B handling batch work (classified 3.5M patent records into 100 tech tags) and only calling Gemini Pro for tasks that need higher reasoning.
Your state corruption issue resonates — I hit the same problem with SQLite WAL mode when multiple cron jobs write simultaneously. File-lock logic solved it, but it's the kind of thing no tutorial warns you about.
Curious about your browser automation failures at 30%. Did you consider replacing those with direct API calls where possible? I moved everything I could from scraping to API-first (Dev.to API, Gmail API, etc.) and it cut my failure rate significantly.
State corruption at 20% of failures is real — concurrent JSON writes are brutal. File-lock helps but atomic write with tmp+rename was what finally made ours reliable under parallel sessions.
All other comments feels like AI generated
I think they are :/
Thanks for yours!
Some comments may only be visible to logged-in visitors. Sign in to view all comments.