I have spent years practicing extreme programming and TDD. So when AI coding tools became good enough to handle a meaningful share of day-to-day work, I adopted them quickly and enthusiastically.
Then I hit a very predictable wall.
I became the bottleneck.
AI could write code quickly. It could write tests quickly too. But the final question, "is this actually correct?", still landed on me. I had to review the implementation, run the environment, click through flows in the browser, inspect application logs, check database state, and decide whether the output was real or just superficially plausible.
In other words, AI had accelerated generation, but I was still manually carrying too much of the verification burden. The faster the model became, the more manual review and QA work accumulated around me.
That was the moment I started pushing testing even further left, but this time not just in the classic TDD sense. I started pushing the entire validation loop left.
Shifting the validation loop left
I stopped waiting until after implementation to think seriously about testing.
Before writing code, I started requiring AI to produce explicit test plans. After implementation, AI was not allowed to stop at "done". It had to execute the validation plan: drive the browser, inspect logs, check database state, compare outcomes against expectations, create structured tickets for failures, fix them, and rerun the relevant checks until the results converged.
Over time, this stopped feeling like a set of prompting tricks and started feeling like a method. The method included:
- test plans before implementation
- structured docs for QA, security, and UI/UX verification
- ticket-driven repair loops
- doc governance to keep the verification layer from rotting
- reusable Skills that encode repeatable development and validation behavior
At that point I needed a real proving ground.
Why Auth9 mattered
I chose Auth9, a full identity platform, because I wanted something difficult enough to make the method fail if it was weak.
Identity systems are full of dangerous edges: protocol semantics, state transitions, interoperability, security constraints, permission models, and long tails of compatibility work. If a method can help govern that kind of system, it is probably doing something real.
Auth9 was where this approach became concrete. While building it, I kept refining the method itself: how docs should be governed, which checks needed to become standard, how to turn recurring behavior into Skills, and how to keep the ticket/fix/retest loop honest.
As the project evolved through real iterations, I became convinced that this was not just a convenient way to ship features faster. It was becoming a viable way to govern complex software over time.
That was when Agent Orchestrator began.
I did not begin with a plan to build a platform. I began with a method that was proving useful, and I no longer wanted to supervise every step manually. If the method was real, it should be able to keep running after I stepped away from the keyboard. That requirement naturally pulled me toward a control plane.
The mid-March test
By mid-February, I was already using early Orchestrator-style automation inside Auth9. In mid-March, I decided to run a high-risk experiment: I wanted to see whether Orchestrator and this method could actually carry a complex low-level refactor.
I replaced the headless Keycloak setup under Auth9 with a native auth9-oidc engine.
For an identity platform, replacing the underlying OIDC engine is not a cosmetic change. It touches protocol behavior, state flow, interoperability assumptions, and a long list of edge cases that often surface only after the "main" work seems complete.
By then, I was already using the early Orchestrator to help govern the process. It did not magically remove the difficulty, but it did provide a structure around the work: execution flow, task state, logs, tickets, and repeatable validation.
The core replacement landed over three days. OIDC conformance and Keycloak legacy cleanup followed within the same week.
More importantly, the story did not end there. The same method and Orchestrator-assisted workflow helped converge the technical debt that surfaced after the change, and eventually completed the community OIDC Certification tests on the native engine.
That sequence mattered more to me than the initial three-day number. It showed that the method and the tool were not only useful for greenfield development. They could also help govern a high-risk system through long-running, uncomfortable change.
Why the project is called Orchestrator
At the time, the word I cared most about was orchestration.
What I wanted most was a reliable way to orchestrate this method: execute the next step, keep state, record logs, preserve intermediate outputs, stop safely, and resume later. So the project became Agent Orchestrator.
The broader conceptual framing came later.
Later, I found a better name for the category
When OpenAI later used the term Harness Engineering, I immediately recognized the shape of the work. Not because I thought I had coined the idea first, but because the term described something I had already been converging on through practice.
The point was larger than orchestration alone. What mattered was the full harness around the agent: workflow, constraints, observability, recovery, and feedback loops.
That is why I now describe Agent Orchestrator as a Harness Engineering control plane. The project name came first; the clearer positioning came later.
What Agent Orchestrator actually is
It is a local-first control plane for shell-based coding agents. Agents, workflows, and step templates are declared in YAML. A daemon (orchestratord) schedules steps, routes work by capability, keeps task state in SQLite, streams logs, and enforces guardrails such as sandboxing and output redaction. The CLI is machine-parseable so agents can drive it too.
Some design choices matter a lot here:
- local-first runtime so the control plane stays close to the repository
- SQLite-backed task state so long-running work remains inspectable
- machine-readable CLI output so agents can participate directly
- declarative YAML resources so the workflow logic lives outside one model session
- support for heterogeneous shell agents so the method does not depend on one vendor
To me, Orchestrator is not another code-generation plugin. It is the control surface that lets this method keep running.
The part I trust most
One reason I trust this framing is that the project has also been used on itself.
Self-bootstrap and self-evolution were important validation paths from early on. If the method is real, it should not only work on downstream projects. It should also survive contact with the control plane itself.
If you are also trying to turn repeated AI-assisted engineering work into something more systematic and more durable, that is exactly the gap I built Orchestrator to address.
You can try it here:
brew install c9r-io/tap/orchestrator
# or
cargo install orchestrator-cli orchestratord
orchestratord --foreground --workers 2 &
orchestrator init
orchestrator apply -f manifest.yaml
orchestrator task create --goal "My first task"
- Docs: docs.c9r.io
- GitHub: github.com/c9r-io/orchestrator
- Auth9: github.com/c9r-io/auth9
- License: MIT
It is open source, still evolving, and I would genuinely like to hear how other people are turning repeated AI-assisted engineering work into something that can survive real software delivery.
Top comments (0)