I Built an AI Agent Army to Make Money. Here's What Happened in Week 1.
Revenue: $0. Lessons: surprisingly many.
Seven days ago, I did something that felt equal parts brilliant and idiotic: I deployed a squad of AI agents and told them to go make me money.
Not "help me brainstorm ideas." Not "draft some emails." Actual autonomous agents, running 24/7 on a VPS, scanning for opportunities, writing code, generating content, and reporting back to me on Telegram like a team of tireless interns who never ask for coffee breaks.
One week later, I've spent $8.20 in API costs, burned through roughly 6 hours of my own time, and earned exactly zero dollars.
Here's why I'm not quitting yet.
The Setup
Let me show you what's actually running, because the architecture is the most interesting part of this experiment.
The brain: OpenClaw — an open-source agent orchestration framework. Self-hosted on a $5/month VPS. It handles agent lifecycle, message routing between agents, cron scheduling, and multi-channel communication (I get reports on Telegram).
The workhorse: Xiaomi MiMo-v2-Pro via OpenRouter. This is the model my agents use for reasoning, code generation, and writing. The key economics: roughly $0.10–$0.30 per article-length output. I'm not running GPT-4 for everything because I don't need to — MiMo handles structured content and code review tasks at a fraction of the cost.
The agents:
| Agent | Job | Status |
|---|---|---|
| Bounty Hunter | Scan GitHub for paid issues, evaluate, submit PRs | Active |
| Content Creator | Research trends, write articles, repurpose into tweets | Active |
| Airdrop Scout | Track crypto testnet opportunities | Semi-active |
The glue: DuckDuckGo search via a Python wrapper I wrote (free, no API key), GitHub CLI for repo operations, Telegram for human-in-the-loop approvals.
Here's what the agent config looks like in practice:
agents:
bounty-hunter:
model: openrouter/xiaomi/mimo-v2-pro
description: "Scans GitHub for bounty opportunities, evaluates difficulty and legitimacy, submits PRs for high-confidence targets"
schedule: "0 */4 * * *" # every 4 hours
channels:
input: telegram
output: telegram
tools:
- github_cli
- duckduckgo_search
- python_executor
rules:
- "Never submit a PR without reading the full issue description"
- "If confidence < 70%, skip and report"
- "Check bounty-blacklist.md before engaging with any project"
content-creator:
model: openrouter/xiaomi/mimo-v2-pro
description: "Writes long-form content based on real experiment data"
schedule: on_demand
channels:
input: telegram
output: telegram
tools:
- duckduckgo_search
- file_read_write
rules:
- "Use real data from the experiment, never fabricate numbers"
- "Write like a developer talking to developers"
- "No 'Great question!' or 'In this article' — ever"
That config took me about 30 minutes to write and debug. The agents have been running since.
Day 1: The Honeymoon
At 11 PM on day one, my bounty hunter submitted its first PR.
I stared at the GitHub page for two full minutes. A green "Pull request" button. Real code, written by AI, submitted to a real open-source project. I felt like I'd discovered fire.
I checked back every 5 minutes until 1 AM. No review. No comment. Nothing.
I went to bed anyway, convinced that tomorrow would bring a merge notification and maybe — just maybe — a bounty payout.
I dreamed about a $50 transfer hitting my wallet. I'm not ashamed to admit this.
Day 3: Reality Checks Start Clearing
By day three, the agent had submitted 12 PRs across multiple repositories.
The scorecard:
- Closed without review: 6
- Reviewed and rejected: 3
- Still waiting: 3
- Merged: 0
One rejection had a comment that stuck with me: "This doesn't address the actual issue."
I clicked into the PR. The code was clean. Well-structured. Had tests. And it solved a problem that the issue wasn't actually describing. The AI had read the title, inferred what the issue probably meant, and built a solution based on that inference.
The code worked. It just worked on the wrong thing.
This is the hidden failure mode of AI-generated code that nobody talks about. It's not that it's buggy — it's that it's confidently wrong. The agent writes a PR description that sounds convincing. The code compiles. Tests pass. And it completely misses the point.
I spent 45 minutes that day adding a new rule to the agent's prompt: "Read the full issue body and at least 3 comments before writing any code."
It helped. Marginally.
Day 5: The Valley
Day five was almost the last day.
My content agent produced a 2,200-word article. I opened it. The title was something like "5 AI Projects to Watch in 2026."
I counted the exclamation marks. Twenty-three.
There was a sentence that read — and I'm paraphrasing only slightly — "In this era of rapid AI development, seizing opportunities is imperative!"
I stared at that sentence for a long time. This was my agent. Configured with my style guidelines. Fed examples of my writing. And it produced the exact kind of vapid marketing-speak content that I despise.
That night I did something stupid: I deleted the article and tried to write one myself.
I got 400 words in and stopped.
Not because I couldn't write. Because I realized the paradox: if I can produce content consistently on my own, I don't need the agent. If I need the agent, I have to accept that sometimes it'll produce garbage.
The API cost for that garbage: $0.22.
Day 7: Today
This morning I did something I hadn't done before. I didn't check the numbers.
I just watched the logs scroll by. The bounty hunter was scanning a Python library's issue list. For each issue, it ran an internal evaluation: Is this bounty legitimate? Is the difficulty within range? Should I engage?
Most evaluations came back: "Skip."
Occasionally it would pause for 3–5 minutes to deep-dive into a specific issue, then conclude: "Too risky. Not submitting."
I watched for an hour. It evaluated 40+ issues and submitted zero PRs.
On day one, that would have frustrated me. Today it felt like progress. Somewhere between day 1 and day 7, this agent got more conservative. Not smarter in a technical sense — more disciplined. It learned (via prompt updates) to say no more often than yes.
That's a surprisingly hard skill for humans to learn.
The Actual Numbers
Let me lay it all out, because if I'm going to write about this experiment, the data should be real.
Costs (7 days):
| Item | Cost |
|---|---|
| OpenRouter API (MiMo) | $8.20 |
| VPS (marginal) | $1.17 |
| My time (~6 hours) | Priceless? $0? Unclear. |
| Total out-of-pocket | $9.37 |
Output (7 days):
| Metric | Count |
|---|---|
| PRs submitted | 14 |
| PRs merged | 0 |
| Articles written | 5 (~12,000 words) |
| Bounty opportunities scanned | ~280 |
| Revenue | $0.00 |
Time breakdown (my time, not agent time):
| Activity | Hours |
|---|---|
| Initial setup & config | 2.0 |
| Debugging agent behavior | 1.5 |
| Reviewing output & giving feedback | 1.5 |
| Reading daily reports (just for fun) | 1.0 |
| Total | ~6 hours |
What I Actually Learned
1. AI agents are excellent researchers, mediocre executors.
The bounty hunter scanned 280 issues in a week. I couldn't do that manually. But its PR submission success rate was 0%. The value was in the filtering — it narrowed 280 opportunities down to 14 "worth trying," and even those were all wrong. But the process of narrowing was genuinely useful.
2. Content is where the economics make sense.
12,000 words in 7 days. My direct involvement: maybe 40 minutes of writing briefs and reviewing drafts. At freelance rates ($0.10–$0.20/word), that's $1,200–$2,400 worth of content. The actual cost: $8.20 in API fees. The content hasn't earned anything yet, but the cost basis is absurdly low.
3. The $0 revenue isn't the point.
Yet. The point is that I now have a system running 24/7 that costs less than a Spotify subscription. Every day it scans, writes, and reports. Maybe nothing happens for 30 days. Maybe on day 31, a bounty gets paid. Maybe an article goes viral. The optionality costs me $9.37 per week.
4. Prompt engineering is actual engineering.
The difference between "AI writes garbage" and "AI writes something useful" comes down to how specific your instructions are. "Write an article about AI money-making" produces trash. "Write a 2,000-word first-person diary entry, use these exact numbers, maintain this tone, avoid these phrases" produces something I'd actually publish.
The Weird Part Nobody Talks About
There's a psychological dimension to running autonomous agents that I didn't expect.
Every morning, I open Telegram and there's a message from the bounty hunter. It reads like a status report from a junior developer: "Scanned 47 opportunities. Evaluated 12. Skipped all 12. Reason: insufficient bounty amount relative to estimated complexity."
Forty-seven opportunities. All skipped.
On day one, that would have annoyed me. Today I felt something closer to respect. This agent has more patience than I do. I would have submitted something by now, just to feel productive. It didn't. It waited.
There's a lesson in there about trading, about business, about life — but I'll leave it for a self-help author to extract. I'm just a developer watching a Python script demonstrate better judgment than I have.
The other weird thing: I've started thinking of the agents as colleagues. Not in a "we're friends" way, but in a "they have opinions and I should listen" way. When the content agent flags a topic as "low confidence — insufficient data," I take that seriously now. On day one, I would have overridden it. By day five, I stopped overriding.
The agents are often wrong. But they're wrong consistently, which means I can calibrate around their mistakes. That's more useful than a human who's wrong unpredictably.
What's Next
I'm going to keep this running for another week. Maybe two.
The bounty hunter needs tuning. I'm adding a "verification step" where the agent reads the project's CONTRIBUTING.md and recent merged PRs before writing any code. This should reduce the "solving the wrong problem" failure mode. I'm also implementing a scoring system — each bounty gets rated on clarity of requirements, payout reliability (based on project history), and code complexity. Only bounties scoring above 70/100 get a PR.
The content agent is doing well. I'm expanding it to handle multi-platform publishing — write once, adapt for Dev.to, Medium, and a Chinese platform (知乎) automatically. The adaptation isn't just translation; it's structural. Dev.to readers want code snippets and architecture diagrams. 知乎 readers want narrative and data. Same core content, different packaging.
The airdrop scout is on pause. Web3 opportunities require too much capital risk for an experiment with a $10/week budget. I'll revisit if the other agents start generating revenue.
Will any of this make money? I have no idea. Probably not, statistically speaking. Most experiments fail. Most side hustles fail. Most AI projects fail. The intersection of all three is not exactly a high-probability zone.
But I have 12,000 words of content I didn't write, 280 bounty evaluations I didn't do, and a system that runs while I sleep — all for less than the cost of a decent lunch.
That's not a business. But it might be the start of one. Or it might be a very elaborate way to procrastinate. I'll let you know in a month.
Day 8 report coming. Or not. Depends on whether the agents find anything worth reporting.
All numbers are real, tracked from April 1–7, 2026. The agents are still running as I publish this. If a bounty comes through after publication, I'll update — but I'm not holding my breath.
Day 8 report coming. Or not. Depends on whether the agents find anything worth reporting.
All numbers are real, tracked from April 1–7, 2026. The agents are still running as I publish this.
Top comments (0)