A few months ago, my experience with AI assistance was limited to tab completions in Cursor. The kind where you type a function name and the AI guesses the next line. Useful, but not exactly transformative.
Then I got the chance to try something bigger: build an entire CLI project from scratch, letting an AI agent write all the code and documentation, with me supervising.
The project is kdn, a tool for orchestrating AI agents in sandboxed environments. Building a tool for AI agents, using an AI agent. The most direct form of dogfooding I could think of.
Day 1
I started by asking the agent to create a Go CLI using the Cobra framework. The first real commit was already a complete skeleton: root command, version command, module initialization, .gitignore.
But the architecture wasn't right yet.
Over the next few commits, the agent refactored. main.go moved to cmd/kdn/. The root command moved to pkg/cmd/. The version constant moved to its own package. Each refactoring was small, supervised, and intentional.
What surprised me: the agent didn't just write code. It knew how to structure a Go project. I just had to ask, very clearly.
The agent added unit tests. It added a README, an AGENTS.md file, a Makefile, a dependabot configuration. It added copyright headers. Not because I planned these steps in advance, but because I asked, one thing at a time.
At the end of the day, there were no issues, just PRs. A stream of incremental changes. I was finding my footing: learning how to collaborate with an agent, how much to let it do, when to push back.
Week 1
The next few days, I started the real work: having the agent code the creation and management of instances, and design a runtime abstraction, with a fake implementation.
The agent was writing tests associated with code without being asked. Documentation was different. At the end of every coding session, I had to explicitly ask it to update the README, the AGENTS.md. If I didn't, it wouldn't. I learned to always ask at the end of the session, while the agent still had the full context of what it had just built.
Later, when new contributors joined the project, I noticed the same pattern. Some of their PRs arrived without documentation. I ended up adding a complete-pr skill: a checklist the agent could follow to make sure every PR included the necessary doc updates.
The agent was impressively capable at this kind of work, as if it had built hundreds of instance managers before. I barely had to push back on algorithmic or architectural choices.
But I had to help the agent choose better design patterns for the code. During the first iterations, I realized that some tests were missing because the agent hadn’t chosen the right pattern for packages, and was not able to mock them, making unit tests impossible.
I had to help the agent introduce an interface-based design pattern and dependency injection to help mock packages during unit tests. And make the agent document the pattern in the AGENTS.md file, to be sure it followed this pattern for every new package.
In particular, I realized that the agent was very easily influenced: sometimes, just asking why it had chosen a given solution, it was changing with another solution, without explaining its choice. And because I was relying on it to propose good solutions (it does), I was doing my best not to influence it through my prompts: I was giving only functional specifications, asking it to write the code, and if I was not happy with the solution, asking it to propose another ones, and at the end, asking it to document the chosen solution in the skills or AGENTS.md file, to be sure that it remembers this solution.
Month 1
With the foundation in place, the agent was ready to start iterating on more complex and specific work. I asked the agent to implement a workspace runtime based on containers using podman, and a multi-level configuration system for the workspaces.
I wanted to make the agent more independent, so I changed my workflow. I started writing GitHub issues describing what I wanted, then launching the agent with: "work on issue #xxx".
The agent would read the issue, read the codebase, write the code, write tests, write user and agent documentation, and write the commit message. I reviewed, requested changes, and pushed the PR.
For bigger features like the Podman runtime implementation or the multi-level configuration system, I added a step before implementation.
I started with: "work on issue #xxx and start by writing a plan".
The agent produced a structured plan: which files to create, which interfaces to define, which edge cases to handle. I read it, adjusted it, copied it into the PR description. Then I let the agent continue with the implementation.
This two-phase approach changed the quality of the output. The agent would catch design issues before writing a single line of code. And I had a shared understanding of what we were building before we built it.
One more reviewer was in the loop: CodeRabbitAI, an AI that automatically reviews every PR on GitHub. After the agent opened a PR, I would sometimes ask it to address the automated review: "check the reviews done in PR #xxx". The agent would read CodeRabbit's comments and push fixes. Two AIs, reviewing each other's work.
The agent iterated quite comfortably, adding new features to the CLI, documenting every new convention, every architectural decision, every pattern, in the AGENTS.md file, and every new user-facing feature into the README.
At some point the AGENTS.md file became too large. So I asked the agent to split the file: extract the focused topics into individual skills, and keep AGENTS.md as the entry point.
Same for user documentation. The README contained all the documentation, but was just a very long file. I asked the agent to add a new CI job to publish this README as a multi-page website, by splitting the source file based on headings. This way, the agent could continue working on a single README file, while the user could have access to multi-page documentation.
Dogfooding
From the very beginning, I was running the agent inside a container. Not using kdn, which didn't exist yet, but using an existing project: claude-container. It handled the container setup so the agent could work in an isolated environment.
That experience was the direct inspiration for the Podman runtime implementation in kdn.
There was something satisfying about this: a tool for managing AI agents, built by an AI agent running in a container, whose container runtime was inspired by another container tool used to build it.
My role throughout was not to write code. It was to have ideas, write issues, review work, be the first user of the tool, and push back when something was wrong. A different kind of engineering, but engineering nonetheless.
Top comments (0)