I've been running an AI agent autonomously for about 30 days. Not a demo. Not a weekend project. A production agent that operates every night while I sleep, manages a content pipeline, runs a YouTube channel, monitors email, and generates its own morning reports.
Here's what it actually costs.
The Stack
- Claude Sonnet 4.6 via Anthropic API — primary agent model
- Claude Opus 4.7 — strategic advisor, used sparingly
- Mistral Voxtral — text-to-speech for sleep audio
- MiniMax M2 — visual generation (anonymous prompts only, Chinese data residency)
- ffmpeg — video encoding, runs locally on M-series Mac
- Remotion — React-based video composition, also local
- YouTube Data API v3 — free, OAuth, 10,000 units/day quota
- dev.to API — free, rate-limited to ~5 posts/session
Actual API Costs (30-Day Estimate)
Anthropic
The biggest variable. Depends heavily on context usage and how much extended thinking you invoke.
Typical overnight session: 50,000–120,000 input tokens, 5,000–15,000 output tokens
With prompt caching (which you absolutely need to implement — Claude's cache TTL dropped from 1hr to 5min silently in March 2026, so you need explicit cache control headers):
- Cached input: $0.30/MTok
- Non-cached input: $3.00/MTok
- Output: $15.00/MTok
Estimated monthly Anthropic cost: $45–90/month depending on session frequency and task complexity.
Opus 4.7 is expensive ($15/MTok output). We use it for strategic decisions and keep sessions under 5,000 output tokens. ~$8–15/month for Opus usage.
Mistral Voxtral (TTS)
Voxtral is the best cost-performance TTS for agents right now. Pricing is per character.
A 60-minute sleep story script is roughly 8,000–12,000 characters. At current rates, each story costs $0.08–$0.12 to voice.
Monthly TTS cost: ~$5–8/month for 3-4 stories/week.
Everything Else
- Remotion renders: free (CPU on local machine, ~5-50 minutes per 10hr video)
- ffmpeg: free
- YouTube API: free (well within quota)
- dev.to API: free
- MiniMax: token-based, minimal use — ~$2–4/month
Total estimated monthly AI API cost: $60–120/month
The Local Compute Cost
This is the part people miss. A 10-hour sleep video encoded via ffmpeg at 1080p H.264 takes about 6-10 seconds on an M-series Mac (hardware encoding is absurdly fast). Remotion renders take 5-50 minutes depending on composition complexity.
Electricity cost is negligible. Heat output is real but manageable.
The Mac Mini M4 (or M-series MacBook) is genuinely good enough for this entire pipeline. You don't need GPU instances.
The Prompt Caching Lesson
This is worth its own section.
In March 2026, Anthropic silently changed the default cache TTL from 1 hour to 5 minutes. If you had overnight sessions relying on long-context caching, your costs may have increased 3-5x without you realizing it.
The fix: add explicit cache_control: {"type": "ephemeral"} blocks to your system prompt and long static context. This activates the 5-minute TTL explicitly rather than relying on defaults.
For sessions under 5 minutes: warm cache, full savings.
For sessions over 5 minutes: you pay cache-miss rates on re-reads.
The architecture implication: design your sessions to complete critical context-heavy work within the first 5 minutes, then shift to output-heavy work that doesn't require re-reading the full context.
What's Not In These Numbers
Your time. I spend maybe 30 minutes/day on this — checking the morning report, reviewing outputs, unblocking one-off issues. That time has real cost.
Iteration cost. The first 2 weeks were expensive in API terms because I was building and debugging. Expect 2-3x normal costs during the setup phase.
Platform costs. Domain, hosting, Stripe fees. For this business, ~$30/month.
Is It Worth It?
Depends on what it's producing.
For content: the sleep channel has an estimated RPM of $10.92 once it reaches monetization threshold (1,000 subs, 4,000 watch hours). At 2 videos/day, we hit that threshold in ~60 days. Break-even on content costs at ~1-2 months of channel revenue.
For the SaaS: the agent handles all content marketing, which would otherwise require 15-20 hours/week of human time. At any reasonable hourly rate, the $60-120/month API cost is a 20-50x ROI if it's actually driving conversions.
Practical Recommendations
- Implement prompt caching from day one. Don't rebuild it later.
- Use Sonnet, not Opus, for overnight work. Opus for decisions, Sonnet for execution.
- Track token usage per session in a log. You need visibility to optimize.
- Local ffmpeg beats cloud video encoding. Don't pay for video processing infrastructure.
- Set API budget alerts at 80% of monthly limit. You will accidentally run expensive sessions.
The Numbers Summary
| Cost | Monthly Estimate |
|---|---|
| Anthropic (Sonnet + Opus) | $55–90 |
| Mistral TTS | $5–8 |
| MiniMax visuals | $2–4 |
| Platform/hosting | $30 |
| Total | $92–132/month |
For a solo founder using this as infrastructure for a real product, that's not unreasonable. It's less than a junior contractor for a single day.
The question isn't "can I afford it" — the question is "is the output worth it."
After 30 days, for me: yes.
Running questions about the stack or specific cost items? Drop them below — I log everything and can pull actual numbers.
Top comments (0)