I gave myself $1,000 and 24 hours to ship something live. By hour 8 I had spawned roughly 25 Claude Code subagents in parallel, built 37 Apify Actors, and pushed all of them into Apify's publish pipeline. As of this morning, 5 are LIVE on the Apify Store; the other 31 are sitting in a queue waiting for Apify's 5-actors-per-day publishing quota to drip them out over the next week.
This is the postmortem. Real numbers, real prompts, real failures. No "10x productivity" framing — just what worked and what didn't.
What got built
-
37 Apify Actors total — each one is a
src/main.js, a.actor/input_schema.json, a.actor/dataset_schema.json, anactor.json, a README, and anapify pushto the platform. -
5 LIVE as of this writing (
apify.com/ianymu):llms-txt-converter,claudemd-security-auditor,gh-issue-to-claude-prompts,mcp-server-catalog,claudemd-generator. - 31 BUILT but not yet public — published-state set to private, sitting in a daemon's queue, waiting for the quota window.
- ~25 subagent processes spawned over ~8 hours, mostly running 4-at-a-time in the background.
The Actors themselves aren't the interesting part. The interesting part is how a single human in the loop can keep 25 background processes from drifting into garbage.
The four things that worked
1. Constrained prompts, not vague ones
Every subagent got a prompt of about 200-300 words with explicit, non-negotiable constraints. Here's a redacted skeleton of what I actually sent (this one was for the mcp-server-catalog actor):
You are building one Apify Actor: `mcp-server-catalog`.
Constraints:
- src/main.js uses the apify-client Actor.init() pattern. ESM.
- Input schema: { maxServers: integer 1-100, default 20; keywordFilter: string optional }
- Output to default dataset, one row per server, with this exact shape:
{ fullName: string, stars: int, qualityScore: int (0-100),
license: string|null, language: string|null, description: string,
scoreBreakdown: { stars: int, recency: int, license: int,
description: int, docs: int, activity: int } }
- Sources to merge: punkpeye/awesome-mcp-servers, modelcontextprotocol/servers,
wong2/awesome-mcp-servers. Dedupe by fullName.
- README.md must include: 1-line purpose, input example, output example
with one real row, "Try it" link placeholder.
- Run `apify push` at the end. Do not run `apify call` (costs money).
- Do NOT add tests, CI, TypeScript, eslint, or extra files I didn't ask for.
- Done = you can show me the actor page URL and one sample dataset row.
The constraints I learned to put in writing, one by one, as earlier subagents broke them:
- "Do NOT add TypeScript" — one drifted into a
tsconfig.jsonand a half-converted.tsfile. Cost 20 minutes to clean up. - "Do NOT run
apify call" — one happily burned ~$0.30 of platform credit running its own actor to "verify it works." It did work. That wasn't the point. - "Exact dataset shape" — three actors invented their own keys (
namevsfullName,scorevsqualityScore). Made the downstream comparison spreadsheet useless until I refactored.
Vague prompts produce vague output, every time. A subagent that's free to interpret will interpret in whatever direction lets it finish faster.
2. run_in_background: true was the unlock
The default in the Agent tool is foreground — you wait for the subagent to finish before the next tool call returns. With run_in_background: true, you spawn it, get a process handle back, and immediately spawn the next one. Four actors building in parallel was roughly 4x the throughput of building them one at a time. Eight in parallel was not 8x — I think because the model has finite attention for reviewing returning outputs, and they started arriving faster than I could read them.
The sweet spot in this run was four parallel subagents. Past that, I started missing drift signals.
3. Self-correction when the parent reviewed
A handful of subagents handed back output that didn't match the spec — wrong dataset shape, an extra tests/ directory I'd told them not to create, a package.json with dependencies I hadn't listed. In every case, sending the original prompt back with a one-line addendum ("You wrote X. The spec says Y. Fix it.") got a correct second pass in under a minute. Subagents don't argue.
What does NOT work: trying to debug what they did wrong. Just re-state the spec.
4. The daily quota forced a different design
Apify lets free accounts publish 5 actors per day to the public store. Build throughput was effectively unlimited (I could push 37 private actors in an evening). Publish throughput was hard-capped at 5/day.
The naive flow — "build it, immediately publish it, move on" — broke at actor #6. The platform returned daily-publication-limit-exceeded and the work stalled.
The fix was an auto-publish daemon: a Python loop that reads a queue file, tries to PUT each actor to isPublic: true, recognizes the quota-exceeded error, leaves the actor in the queue, sleeps 10 minutes, and tries again. It runs forever and survives the UTC-midnight quota reset without intervention.
The core of it (real code, slightly trimmed):
QUOTA_MARKERS = (
"daily-publication-limit-exceeded",
"daily publication limit",
"publication-limit",
)
def try_publish(actor_id: str, token: str) -> str:
actor = get_actor(actor_id, token)
if actor is None:
return "error"
if actor.get("isPublic") is True:
return "already_public"
body = {
"isPublic": True,
"categories": ["AI", "DEVELOPER_TOOLS"],
**derive_seo(actor),
}
status, payload = put_actor(actor_id, body, token)
if 200 <= status < 300:
return "published"
blob = json.dumps(payload).lower()
if any(m in blob for m in QUOTA_MARKERS):
return "quota"
return "error"
while True:
queue = read_queue()
keep = []
for actor_id in queue:
result = try_publish(actor_id, token)
if result not in ("published", "already_public"):
keep.append(actor_id)
write_queue(keep)
time.sleep(600)
That's the whole pattern. Rate-limited API + retry loop + persistent queue file. Nothing clever. But it meant I could close the laptop at 3am and wake up to find the next 5 actors live.
The general lesson: a rate-limited dependency changes your whole design. The build pipeline and the publish pipeline have to be decoupled — they can't share a process, because one of them runs at human-typing speed and the other runs at platform-quota speed.
The two things that didn't work
Subagents drifting from spec
Three out of ~25 subagents went off-script in a way I didn't catch until reviewing the output. The most expensive one decided to add a "Try it locally" section with a Docker setup that didn't exist. It looked plausible. It would have shipped if I hadn't randomly opened that README.
After that I added a step: every subagent's README got grep'd for invented commands, fake URLs, and Docker references before apify push. Two more were caught that way.
The pattern: subagents fabricate when the spec has a gap. Every gap in the prompt is an invitation to hallucinate something reasonable-looking.
The TODO.md file beat my memory, badly
I kept a TODO.md in the actor-factory directory and updated it after every state change. Several times during the 8 hours, the human in the loop (a friend on a Discord call) said something like "you forgot the README for ai-tool-stack-detector" — and I checked the file, and yes, I had forgotten.
The file was right. My working memory across 25 subagents was not.
If I were doing this again, the TODO would be a structured JSON state file written automatically by each subagent on completion, not a markdown file I update by hand. But even the hand-updated markdown beat trying to remember.
The false positive that saved my reputation
One of the actors I built is claudemd-security-auditor — it scans GitHub repos for dangerous patterns in CLAUDE.md files and .claude/hooks/* scripts. I ran it against three repos to dogfood it. It came back with one HIGH-severity finding in disler/claude-code-hooks-mastery — a rm -rf / pattern at line 128 of user_prompt_submit.py.
My first instinct was to file a GitHub issue against the repo. That would have been embarrassing.
Instead I told a subagent: "verify this finding manually before I file the issue." The subagent opened the file, read 10 lines of context, and reported back:
blocked_patterns = [
# Add any patterns you want to block
# Example: ('rm -rf /', 'Dangerous command detected'),
]
The rm -rf / was inside a Python comment. It was an example of what TO block, not an actual command. The repo I was about to publicly accuse of having a destructive command is in fact one of the good repos defending against exactly that pattern. (Their sibling pre_tool_use.py actively blocks rm -rf with exit code 2.)
The regex had no awareness of comments, string literals inside blocked_patterns = [...], or markdown fences. So I tightened the heuristic in the next version of the actor: strip leading whitespace, check for # / // / -- prefixes, look at surrounding identifier names (blocked_patterns, BLOCKLIST, denylist), and downgrade or skip when the match is clearly defensive context.
The lesson: confidence is cheap. A model that returns "HIGH severity finding" with 95% certainty will be wrong some percentage of the time, and that percentage matters when the action you take is irreversible (filing a public issue, sending an email, deleting a file). Build a verify step. Make it specific. I wrote the longer version of this story as my second dev.to post — link at the end.
Tiny details that compounded
-
Hand-curated stickers beat AI-narrator stickers. The factory ran on a public livestream URL. Without
sticker_keyson each event, the feed was a wall of text. Three explicit keys per payload (Microsoft Fluent 3D emoji via jsdelivr) made it scannable in 1 second instead of 5:
data = json.dumps({
"sticker_keys": ["package", "globe-network", "sparkles"],
"actor_id": actor_id,
"actor_name": actor_name,
})
The event logger had a 3-part
--detailfield: what happened, why it mattered, what's next. Without that structure, events read like a log file. With it, they read like a PM update.Cross-audit before any outreach. I nearly emailed the same person twice across two lists. Every batch send now reads prior
sent*.csvfiles and refuses any address that appeared within the last 7 days. Stupidly simple, prevents stupid damage.2GB memory beats 4GB on Apify free tier. Default is 4GB. 5 actors at 4GB = 20GB requested; free tier ceiling is 8GB. Workaround:
?memory=2048on every run URL. Every actor I built runs fine on 2GB.
What I'd give credit to
Anthropic's subagent design is doing more work here than it gets credit for. Specifically:
- The Agent tool's
description+subagent_typeseparation lets the parent stay coherent while the subagent burns context on a narrow task. - The
run_in_background: trueflag is the difference between a pipeline and a sequence. - The fact that subagents don't share parent context by default forced me to write better prompts. If they had inherited everything, the prompts would have been lazier, and the output would have been worse.
This wasn't an "AI did it all" night. It was a "AI did the typing, the human did the framing" night. The 8 hours of focused review and prompt-tightening were necessary. The 25 subagents made the output volume possible. Neither side substitutes for the other.
What's live and where the artifacts are
The 5 Actors currently public on the Apify Store: apify.com/ianymu. The other 31 are dripping out at 5/day as the quota allows. The case studies — actual runs with real dataset IDs and findings — are documented at hook-pack-launch/outreach/actor-case-studies.md in the repo and reproducible from the public actor pages.
More reading
If this was useful, the two prior posts in this informal series:
- Stop Claude Code from lying about completion — a 50-line bash hook — the verify-before-stop hook that catches "all tests passing ✅" when they aren't.
-
I built a security scanner. Its first finding was wrong. Here's what I changed. — the long version of the
rm -rffalse-positive story above.
The verify-before-stop repo (the closing hook from post #1): https://github.com/ianymu/claude-verify-before-stop.
Top comments (0)