So like most people here I'm building out my platform and overall product. Going well, thanks for asking. But over time my workflow became managing and orchestrating Claude Code agents that would just dry repeat mistakes made by previous sessions. As the codebase grew larger the mistakes got worse, and the gaps in integration between different features became more obvious.
That was until about 2 months ago when I started using an in-house system I built called ForgeDock.
The basic idea: it converts GitHub issues, pull requests, comments, and everything else accessible through the GitHub CLI into a citable knowledge base for Claude Code agents. Each agent that picks up an issue gets full context on what, where, how, when, who. Not "read the whole repo and figure it out." Structured annotations written by the previous agent, ready to read.
It looks like this in practice. When an investigation agent traces a bug, it writes this directly to the GitHub issue:
FORGE:INVESTIGATOR
Verdict: CONFIRMED
Confidence: HIGH
Root Cause: payment_validator.py:143 assumes all users have a billing profile.
Free-tier users don't. Introduced in commit e8f21a3 (PR #38).
Affected Files: payment_validator.py, api/routes/payments.py
Then before the builder agent writes any code, a context agent pulls institutional memory from GitHub for those same files:
FORGE:CONTEXT
Known Pitfall: Don't skip the audit log write on early returns in this module.
Prior bug #34 did exactly this and caused a compliance gap.
Historical Finding: Free-tier guard was also missing in invoices.py (#43),
check for the same pattern there.
The builder reads all of this before touching anything. No rediscovering known pitfalls. No burning tokens on investigations that already happened three months ago.
The model routing is a big part of why this works within a normal Claude Code subscription. Opus handles the orchestration, investigation, and architecture planning, the stuff where reasoning quality actually matters. Sonnet handles the implementation where you need speed and volume. So you get good decisions on what to build paired with fast execution on the code itself. Same subscription you already have.
Sitting on top of all this is an orchestration layer that can spin up multiple agents at once in waves. Waves split the work into non-conflicting groups, so if 4 issues touch the same file it'll put them in separate waves to prevent merge conflicts. You go to Claude Code, say "orchestrate the new features milestone," walk away, come back to merged PRs with full audit trails.
The full pipeline per issue is: investigate, gather context, architect, implement, quality gate (14 domains), review (9 specialized agents covering security, billing, DB, concurrency, auth, frontend, API, performance, infra), merge. If a review agent finds something it creates a new issue that flows right back into the pipeline.
This thing has processed over 20,000 issues across production codebases. I'm a solo developer shipping a production application serving real clients and users, built almost entirely through this workflow. The ForgeDock repo itself has 600+ issues from dogfooding its own pipeline.
The part that keeps surprising me is the compound effect. Run 100 is just fundamentally better than run 1 because 99 runs worth of findings, review comments, and architectural decisions are sitting in GitHub as structured data. Every future agent reads them.
npx forgedock
npx forgedock init
First command installs the pipeline. Second generates config for your repo. Then just:
/work-on #42
Full source at github.com/RapierCraftStudios/ForgeDock. AGPL-3.0. Would love feedback, especially from people running Claude Code on bigger codebases. What breaks, what's missing, what would make you actually use this.

Top comments (0)