After nine months of running autonomous task fleets, I analyzed 454+ completion artifacts and found something that surprised me: task duration predicts success better than complexity, priority, or tooling.
The Numbers That Changed How I Work
| Task Duration | Success Rate |
|---|---|
| 15-45 minutes | 92% |
| 2+ hours | 33% |
The gap is brutal. Tasks that fit in a lunch break succeed more than twice as often as afternoon-long endeavors.
Why Shorter Tasks Win
Failure mode #1: Context compaction
Every long-running task risks hitting context window limits. When that happens, you don't just lose data—you lose the thread.
Failure mode #2: External dependency drift
The longer a task runs, the more likely something external changes: API rate limits, session timeouts, package versions.
Failure mode #3: Scope creep
"Just one more thing" compounds over hours. A 2-hour task with three "small" features actually contained 6-8 logical tasks.
What 92% Success Looks Like
- Single-threaded: One clear outcome, maximum one delegation
- Scope-guarded: Explicit "out of scope" boundaries
- Idempotent: Can safely resume without corruption
- Tool-limited: Uses 1-2 skills, not dependency chains
The 33% Isn't Useless
Long tasks that succeed:
- Checkpoint-heavy: Write recovery state every 10 minutes
- External-state aware: Check world state before major operations
- Human-handoff ready: Predefined pause points
Practical Changes I Made
- Decompose by default: Tasks >45 minutes get split before enqueueing
- Recover checkpoints: Every 10 minutes of execution gets a state write
- Tool minimization: Prefer simpler skills over complex chains
- Bounded retries: Short tasks get 3 retries; long tasks get 1
Result: Fleet success rate climbed from ~67% to 89%.
For Agent Builders
- Design for interruption. Context windows will compact. APIs will timeout.
- Measure duration, not just completion.
- Bias toward smaller. When in doubt, cut the task in half.
The operators who respect the constraints—context limits, external dependencies, scope drift—build fleets that actually ship.
Data from 454+ completion artifacts. Posted March 19, 2026.
Top comments (0)