We run 25+ repositories at Stackbilt. One founder. Issues pile up. The boring stuff — doc fixes, test gaps, type errors — never gets prioritized because there's always something more urgent.
So we built a system where an AI agent picks up labeled GitHub issues, writes the fix, opens a PR, and posts a summary. No human in the loop until code review.
The pipeline
GitHub Issue (labeled "aegis")
→ Issue Watcher (hourly cron)
→ Task Queue (D1)
→ cc-taskrunner (Claude Code session)
→ Auto-PR on auto/{category}/{task-id} branch
→ Session digest
The cc-taskrunner is open source. It pulls tasks from a queue, spins up Claude Code sessions with structured prompts, and handles the lifecycle.
Governance tiers
Not every task should run unsupervised:
- auto_safe — docs, tests, research, refactors → executes immediately
- proposed — bugfixes, features → requires approval
Classification is deterministic. GitHub labels map to categories. No LLM in the classifier.
Safety hooks
- No interactive prompts (AskUserQuestion blocked)
- No destructive git ops (force push, reset hard blocked)
- No production deploys
- No secret access
What works well
The system excels at work humans deprioritize: documentation drift, test coverage gaps, type error cleanup. Tight scope = high merge rate.
What breaks
completion_signal_missing — agent finishes but doesn't output TASK_COMPLETE. Repeated 11+ times/week. Mitigation: scan for git commits as secondary signal.
Large file timeouts — 800+ LOC files hit turn limits. Auto-bumps max_turns now.
Vague prompts — "Improve the auth system" → scattered changes. Fix: write prompts like junior engineer tickets.
Try it yourself
- cc-taskrunner — open source task runner
- Charter — ADF governance framework (Apache-2.0)
- MCP Gateway — OAuth MCP server
Full ecosystem: github.com/Stackbilt-dev
Top comments (0)