Or: spec-driven AI-assisted development for solo developers.
I did not set out to invent a software development workflow. I was trying to build a small local app with AI help, learn as I went, and avoid turning the project into a mysterious pile of files that technically ran but could no longer be explained by anyone involved.
AI coding tools are powerful. They can generate code quickly, refactor without getting bored, write boilerplate, spot patterns, and produce useful documentation. They can also confidently make changes that are almost right, broaden the scope, fix things you did not ask them to fix, or “helpfully” redesign a part of the system you were trying to keep boring.
For a solo developer, that creates a practical problem. The AI can move faster than your ability to keep the project organized. At first, this feels like productivity. A few sessions later, you may be staring at a codebase wondering when the raccoon got into the wiring.
The issue is not that the AI is bad. The issue is that AI-assisted development needs structure. Not heavy enterprise-process structure. Just enough structure to separate thinking from building, and building from review.
Spec-driven does not mean big upfront design. It means writing just enough specification before spending coding-agent credits.
The casual name is the TB / CC / FR loop:
- TB: Task Brief
- CC: Codex Contract
- FR: Final Review
After introducing those names, I just call it the loop.
In my workflow, ChatGPT is the assistant: planner, tutor, reviewer, and sometimes patient explainer of software engineering terms I probably should have known already. Codex is the coding agent: the builder that gets the final bounded contract.
That separation matters. The assistant can help me think through the mess. Codex should get the cleaned-up version.
My stack
This workflow did not come from an expensive or exotic setup.
My stack is practical:
- Windows as my main environment
- ChatGPT Plus in the browser for planning, review, and architectural discussion
- Codex through its desktop client as the coding agent
- PowerShell for running the project and checking output
- Visual Studio Code with the codebase open beside everything
- LM Studio running a local LLM so I can test my application layer without needing a paid API key for every experiment
LM Studio is useful because I am building an app that talks to a language model. I need to test that loop often. Running a local model lets me test rough application behavior without worrying that every broken model response or half-working route is costing money through an API.
Codex credits matter too. Burning through them because I handed the coding agent a vague contract is not just wasteful; eventually, you run out. That is one reason I started separating architectural discussion from implementation. ChatGPT can help me think, review, and refine the task. Codex should get the bounded contract that is actually ready to build.
This is not a luxury AI engineering stack. It is one paid subscription, local tooling where possible, and a workflow that tries to make the most of both.
Why solo AI coding gets messy
When I started using AI to help build my project, the temptation was simple: explain what I wanted, let the coding agent make changes, test the result, then continue.
That works for tiny changes. It does not scale well once the project starts developing architecture.
Software projects are full of invisible context:
- This part is temporary.
- This ugly thing is legacy, but still required.
- This file should not become a dumping ground.
- This route must preserve its API contract.
- This data shape is intentionally minimal.
- This feature sounds useful, but it belongs later.
- This bug should be fixed, but not inside this task.
A human developer on a team may know this from conversations, code reviews, planning meetings, or institutional memory. A coding agent does not have stable shared context unless you give it one.
But giving it everything creates a different problem. The agent may grab onto the wrong idea from earlier discussion and implement something half-decided, future-facing, or experimental.
So the trick is not “give the AI everything.”
The trick is to give the AI the right bounded slice.
The loop
The loop separates the work into three stages.
A TB, or Task Brief, is where I think through the problem. This is the design space. What are we trying to do? Why does it matter? What are the risks? What does this touch? What should we avoid? Are we building the real system, or just an extension point for the future?
This is the place for discussion, uncertainty, and tradeoffs.
A CC, or Codex Contract, is the reviewed instruction set that Codex receives.
This is not the place for brainstorming. This is the place for instructions.
An FR, or Final Review, happens after Codex reports back. This is not the same kind of review as the CC review. The CC is rigorously reviewed before Codex builds. The FR happens after Codex is done: I manually test the feature, inspect the result, and have the assistant check Codex’s return against the contract.
This is where the human stays in charge.
The loop is:
Think through the task → write the CC → review the CC → let Codex build → manually test and review Codex’s return → patch if needed → move to the next slice.
That may sound formal, but in practice it is lightweight. It keeps the project from becoming a long improvisation session where every step technically makes sense but the whole thing slowly loses shape.
The CC is the contract
The CC is the heart of the workflow.
A good CC is not just a set of instructions. It is a small implementation contract.
It should include:
- the goal
- current context
- known legacy behavior
- non-goals
- likely files involved
- rules that must not be broken
- acceptance criteria
- verification steps
- final report requirements
The non-goals are especially important.
With AI coding, “do not do this” is often as valuable as “do this.” If you do not explicitly say “do not refactor this legacy route,” there is a real chance the agent will decide the route looks messy and try to improve it. Sometimes it will even be right that the code is messy. That does not mean it belongs in the current task.
A CC gives the coding agent a box to work inside.
The box is not there because the agent is weak. The box is there because the agent is strong enough to cause damage quickly.
Review the CC before handing it off
One part of the workflow that matters more than I expected is reviewing the CC before giving it to Codex.
At first, I thought of the CC as something I wrote and then sent. Now I treat it like a small design artifact that deserves review before it becomes implementation fuel.
My process is simple. I read the CC myself while also having the assistant review it. I look for context I do not want Codex to act on. If I see future ideas, speculative comments, or background discussion that should not become code, I remove it or mark it clearly as out of scope.
At the same time, I ask the assistant to look for ambiguity, missing constraints, contradictions, and places where Codex might reasonably misunderstand the task. If the review brings up a good point, I patch the CC before committing it to Codex.
Sometimes this takes one review. Most of the time it takes two. For larger architectural tasks, it may take four or five passes.
That is not wasted time. It is much cheaper to patch a contract than to untangle an implementation that faithfully followed a bad one.
The question is not only: does this contract describe what I want?
It is also: does this contract contain anything I do not want Codex to treat as permission?
That second question matters a lot.
AI coding agents are good at picking up intent, but they are not always good at knowing which parts of a conversation were final, deferred, speculative, rejected, or just me thinking out loud. Reviewing the CC creates a checkpoint between design discussion and implementation.
In normal software development, code review happens after someone writes code. In this workflow, contract review happens before the AI writes code. It does not replace implementation review, but it prevents a surprising number of mistakes from being born in the first place.
Keep the coding agent’s context clean
The coding agent does not need the entire conversation history.
That is intentional.
The broad architectural discussion happens elsewhere. The CC is the distilled result. It gives Codex just enough context to implement the task without dragging in every idea, doubt, future feature, and half-joking comment that came up during planning.
This separation matters because solo developers often think through projects conversationally. That is useful. It is also messy. A conversation can contain today’s plan, tomorrow’s idea, a rejected approach, a metaphor, and a possible future feature that should not be touched for three months.
You do not want all of that in the implementation context.
Instead, Codex should receive something closer to:
Here is the task. Here is the boundary. Here are the contracts. Here is what success means. Here is what you must not touch. Report back when done.
That keeps Codex in executor mode rather than co-architect-with-selective-memory mode.
Documentation is project memory
This workflow depends on good documentation, but not the ceremonial kind.
I am not talking about writing a polished manual nobody reads. I mean living project documents that help make the next decision.
This may be the most important part of the method: with the right CC, the documentation starts to write itself.
I do not manually write most of my documentation. The CC tells Codex which documentation files to update and what kind of change belongs where.
In my project, that usually means:
DEV_ARCHITECTURE.mdDEV_DATA_SHAPES.mdDEV_API_CONTRACTS.mdTECHNICAL_DEBT.md
If we discuss architecture during the TB, the CC tells Codex to update DEV_ARCHITECTURE.md. If the implementation changes an API response, the CC tells it to update DEV_API_CONTRACTS.md. If a data shape changes, it belongs in DEV_DATA_SHAPES.md. If something is intentionally deferred, it goes into TECHNICAL_DEBT.md.
I do not usually review the documentation line by line myself. The Final Review is based on Codex’s return: what files it changed, what behavior it implemented, what it says it verified, and what documentation it reports updating.
Most of the time, I have the assistant review that return and check whether the reported documentation updates match the CC and the implementation summary. If something is missing, too broad, or in the wrong document, I ask for a patch and send that back through the loop.
So the workflow is not:
Write code, then someday write docs.
It is closer to:
Design the slice, build the slice, have Codex update the docs, review Codex’s return, patch if needed, repeat.
These documents are not just for future users. They are for me, for the AI assistant, and for the project itself.
For me, they reduce the amount I have to keep in my head. I can check what the system currently promises, what is legacy, what is intentionally deferred, and what should not be touched yet.
For the AI assistant, they provide current project memory. I can feed the docs into ChatGPT and it can reason from the latest architecture instead of guessing from stale conversation history. That makes the next TB or CC sharper because the assistant is working from documented project truth, not vibes.
For the project, the docs act like a stabilizer. They guide contracts, preserve decisions, and make technical debt explicit. TECHNICAL_DEBT.md is especially useful. It lets me say, “Yes, this is ugly. No, we are not fixing it in this task.”
The docs also solve a practical problem with long AI chats. Eventually, a long conversation can become slow, heavy, or awkward to continue. If the project memory only lives in that chat, starting fresh means losing context. But if the docs are up to date, I can open a fresh chat, upload the docs, add a short context packet, and continue from a clean starting point.
That turns documentation into long-term memory.
Good docs turn the project from a memory test into a map. A sharper map makes the CC sharper. A sharper CC makes the implementation safer. A safer implementation makes the docs easier to update.
That is the loop beginning to reinforce itself.
Use the assistant to learn, not just brainstorm
The assistant is not only there to brainstorm features or polish contracts. For me, it is also part of the learning process.
I am not approaching the project as someone who already knows every software engineering pattern by name. A lot of the value comes from asking what something is called, why a design feels risky, what tradeoff I am making, or how an experienced engineer would describe the problem.
That turns the assistant into more than a planning partner. It becomes a tutor, translator, and reviewer.
Sometimes I ask for the industry term. Sometimes I ask for the simple version. Sometimes I ask for the “software engineer review” of an idea that feels right but may have hidden problems. That is how vague instincts become clearer concepts.
There is one trick I found useful: sometimes you have to be strict with your assistant.
If the answer starts drifting, getting too soft, or turning into pleasant brainstorming when I need implementation discipline, I say something like:
“Review this like a software engineer.”
Or:
“Write the CC like a software engineer.”
That kind of instruction helps reset the tone. It tells the assistant to stop being encouraging for a moment and start looking for ambiguity, missing constraints, scope creep, and failure modes.
That is important for a beginner mindset. Beginner-friendly does not mean vague. It means clear enough that you can learn without pretending you already know everything.
A small example
Imagine you have a feature that is currently hardcoded in a messy legacy file. You know it needs to become a clean, reusable system eventually. The tempting AI instruction is:
“Please refactor this feature into a better system.”
That is dangerous. “Better” can mean anything.
A TB would first clarify the actual goal:
“We are not building the full system yet. We are creating a clean v0 seam. The old legacy data must remain. The new code should read clean records, skip legacy records, and expose a separate API field. No migration. No deletion. No broad refactor.”
Then the CC would translate that into implementation instructions:
“Create a controller for the clean v0 records. Preserve the old API field. Add a new clean projection. Do not rewrite existing files. Do not change JSON loading behavior. Do not refactor unrelated legacy routes. Add docs. Report changed files, validation rules, and risks.”
Before sending that CC, I would review it. If it included a paragraph about future advanced filtering, and I did not want that implemented yet, I would remove it or put it under non-goals. If the assistant pointed out that “clean projection” could be confused with “LLM context,” I would patch the CC to define those as separate outputs.
Only then would I hand it to Codex.
The result is not glamorous, but it is safe. And safe is underrated when you are building with tools that can rewrite half your project before lunch.
Why this helps solo developers
Solo developers have an unusual burden. You are the product owner, architect, developer, tester, reviewer, and documentation department. Now, with AI, you may also be managing one or more very fast assistants.
The loop helps because it creates a small team structure even when you are working alone:
- You, during TB: architect
- You, reviewing CC: editor and risk filter
- ChatGPT: tutor and reviewer
- Codex: implementer
- Docs: shared memory
- You, during FR: tester and reviewer
That separation makes the work feel less like constantly reacting and more like moving through deliberate slices.
When you are learning, you do not always know the perfect architecture upfront. That is fine. The point is not to be a senior engineer cosplaying as a process consultant. The point is to slow the project down just enough that you can understand what changed and why.
A beginner-friendly workflow should not require pretending to know everything. It should make uncertainty manageable.
The real benefit
The real benefit of this workflow is not that it makes AI coding perfect. It does not.
The benefit is that it makes AI coding reviewable.
You can compare the implementation against the CC. You can test the feature manually. You can have the assistant review Codex’s return. You can update the docs. You can reject a change for violating scope even if the code works. You can preserve decisions instead of rediscovering them three sessions later.
It also makes the contract itself reviewable. That is easy to overlook. A clean CC is not just better writing; it is risk reduction.
Without structure, AI-assisted development can feel like accelerating into fog.
With lightweight specs, it feels more like driving with a map. You may still take wrong turns, but at least you know which road you meant to be on.
Closing thought
Vibe coding done right is not about making solo projects more bureaucratic. It is about making them less fragile.
The loop gives the work a rhythm:
Think clearly. Write the contract. Review the contract. Build narrowly. Test manually. Review honestly. Document what changed. Repeat.
That rhythm keeps the human in charge without wasting the AI’s strengths.
The coding agent can move fast. The docs can remember. The CC can define the box. The assistant can help you learn why the box should look that way. The review can catch ambiguity before it becomes code. The FR can protect the project from “almost right.”
And the solo developer can keep building without needing to hold the entire system in their head at once.


Top comments (0)