I’m building a local-first multi-agent workflow for AI-assisted coding
Over the last few days, I have been working heavily on an open-source project called Codex Engineering Workflow Pack.
GitHub: https://github.com/SetraTheXX/Codex-Engineering-Workflow-Pack
npm: https://www.npmjs.com/package/@setrathex/codex-engineering-workflow-pack
It started as a simple idea:
What if Codex had a reusable engineering workflow instead of starting from scratch every time?
At first, the project was only a skill pack. The goal was to give Codex structured workflows for things like:
- writing PRDs
- slicing issues
- debugging
- TDD
- handoffs
- architecture review
- codebase analysis
That was useful, but after using AI coding tools more seriously, I started seeing a bigger problem.
AI coding often becomes messy when the workflow is not structured.
A model can generate code quickly, but the process around it is usually unclear:
- What exactly is the task?
- Which files is the AI allowed to edit?
- How do we isolate work?
- How do we review the output?
- What happens if two agents edit overlapping files?
- How do we prevent accidental changes outside the task scope?
- How do we keep an audit trail?
That is the problem I wanted to solve.
The goal
My goal with CEWP is not simply to make AI write more code.
The goal is to make AI-assisted development more structured, auditable, and closer to a real engineering workflow.
Instead of one AI model randomly editing the repo, I want a workflow where:
- A manager role plans the work.
- Worker roles implement isolated tasks.
- A reviewer role checks the outputs.
- File scopes are enforced.
- Dangerous boundaries are guarded.
- The user gets a final report before critical actions.
In other words:
AI should not just “code”. It should work inside a controlled engineering system.
What CEWP became in v0.2
With v0.2.0-beta.1, CEWP is no longer just a skill pack.
It now includes a local-first workflow runtime built around the cewp CLI.
Some of the current pieces are:
- Codex skill pack
-
cewpCLI - Coordinator Mode runtime
- Git worktree isolation
- worker / reviewer roles
- dispatch planning
- guarded execution
- sequential and parallel workers
- reviewer gates
- finalize / cleanup / prune helpers
- operator policy modes
- harness smoke tests
The project is still beta, but it now behaves much more like a developer tooling product than a collection of prompt files.
Why local-first?
I wanted CEWP to be local-first because AI coding workflows usually happen inside a real repository.
The repo already contains the important context:
- source code
- README files
- docs
- roadmap files
- PRDs
- issues
- tests
- local scripts
- Git history
So CEWP stores its runtime state inside the repo under .cewp/.
A run has its own folder:
.cewp/runs/<run-id>/
That run can contain things like:
- run metadata
- board/task state
- prompts
- reports
- review packets
- adapter output
- event logs
This makes the workflow visible and inspectable. The user can see what happened instead of treating the AI as a black box.
Worktree isolation
One of the most important parts of CEWP is Git worktree isolation.
When workers run, they do not all edit the same working directory.
Each worker can get its own Git worktree.
That matters because parallel AI work can become dangerous very quickly if multiple agents edit the same files in the same directory.
With separate worktrees, each worker has a separate workspace. This makes it easier to inspect, collect, and validate changes.
The basic idea is:
worker-a -> separate worktree -> task A
worker-b -> separate worktree -> task B
reviewer -> checks collected output
This is the foundation for controlled parallel work.
Parallel workers
The part I spent the most effort on recently was the parallel worker system.
The idea is simple:
If two tasks are independent, two workers should be able to work at the same time.
But “parallel AI agents” only make sense if the system checks the risk first.
Before parallel execution, CEWP checks things like:
- Are the workers using separate worktrees?
- Are their file scopes overlapping?
- Are
allowedFilesdefined? - Are
forbiddenFilesrespected? - Are output paths separate?
- Are worktree paths safe?
- Will one worker accidentally touch the other worker’s scope?
For example, these should be treated as overlapping:
docs/**
docs/install.md
Because docs/install.md is inside docs/**.
CEWP now detects these cases and blocks unsafe parallel execution.
The goal is not “run everything at once”.
The goal is:
Run things in parallel only when the workflow can prove that the tasks are isolated enough.
File scope guardrails
AI coding tools can accidentally edit files outside the intended task.
So CEWP uses task-level file scopes.
A worker task can define:
{
"allowedFiles": ["README.md", "docs/install.md"],
"forbiddenFiles": ["package.json"]
}
In v0.2.0-beta.1, real worker execution now requires explicit non-empty allowedFiles.
That means a worker cannot run with an empty scope and freely edit the repo.
CEWP also checks both:
- uncommitted changes
- committed branch changes
This part is important.
At one point, an automated review pointed out a real problem: if a worker committed a file before returning, git status could look clean, and the scope check might miss the change.
That was fixed by recording a baseCommit and checking committed changes since that base.
So the worker cannot bypass scope checks just by committing its changes.
Reviewer gate
The workflow does not end after workers finish.
CEWP collects worker reports and creates a review packet.
The reviewer then checks the output and writes a decision.
Finalize requires:
Decision: PASS
This means finalize is not just “whatever the worker did is accepted”.
There is a separate review gate.
This is important because AI-generated changes should be inspected before becoming the final run state.
Operator policy modes
Another thing I wanted was a permission model.
Some users want a safe, step-by-step flow.
Some users want to give more authority to the tool.
I personally often use AI coding tools with a lot of local permission, but only when the task is clear and the repository boundaries are well defined.
So CEWP now has policy modes:
safe
trusted
full-authority
The default is safe.
In safe mode, high-impact actions are blocked.
In full-authority mode, CEWP can run local workflow actions with fewer pauses, such as:
- workers
- reviewer
- pipeline
- finalize
- cleanup
- prune deletion
But full-authority does not disable the guardrails.
Even in full-authority mode:
-
allowedFilesstill matters -
forbiddenFilesstill matters - worktree isolation still matters
- reviewer gates still matter
- target worktree safety still matters
- no automatic push / publish / release happens
That distinction is important.
Full authority means:
The user trusts CEWP to run the local workflow.
It does not mean:
The AI can do anything with the repository.
What I hardened in v0.2.0-beta.1
The 0.2.0-beta.1 release was mostly about hardening.
Some of the main improvements were:
- runtime policy enforcement
- required
allowedFilesfor real worker execution - better parallel scope overlap detection
- external/absolute
targetWorktreepath blocking - safer cleanup behavior
- stronger harness tests
- npm scripts for validation
- clearer docs and release notes
The package now has scripts like:
npm test
npm run smoke
npm run check
npm run pack:dry-run
The harness tests cover things like:
- policy gates
- worktree creation
- committed diff visibility
- outside-allowed file detection
- parallel overlap detection
- target worktree policy
- package surface checks
What the workflow looks like
A simplified CEWP flow looks like this:
run init
↓
worktrees create
↓
dispatch plan
↓
dispatch check
↓
dispatch prompts
↓
workers execute
↓
collect
↓
reviewer executes
↓
finalize
The important part is not the command list.
The important part is the model:
plan -> isolate -> execute -> collect -> review -> finalize
That is the workflow I want AI coding tools to follow.
The longer-term vision
Right now CEWP is Codex-focused.
But I do not want the idea to stay limited to one model or one tool.
The long-term vision is an adapter-based system where different models can take different roles:
- Codex as manager
- Claude as reviewer
- Gemini as worker
- OpenCode or API-based models as workers
- manual adapters for human-in-the-loop steps
The goal is not to blindly run many models.
The goal is to create a role-based workflow where each model can be used where it makes sense.
For example:
manager -> plans the tasks
worker-a -> implements one scope
worker-b -> implements another scope
reviewer -> checks the result
user -> approves critical boundaries
This also helps with limits and cost. If everything depends on one model, usage limits become a bottleneck. A future adapter system could make CEWP more flexible.
What I learned
A few things became clear while building this:
1. AI coding needs boundaries
The more power we give to coding agents, the more important scope boundaries become.
allowedFiles, forbiddenFiles, worktrees, and reviewer gates are not optional details. They are the difference between a useful workflow and a risky one.
2. Parallelism is only useful with isolation
Running two agents at the same time sounds impressive, but it is only useful if their work is isolated.
Otherwise, it just creates confusion faster.
3. Local-first workflows are easier to audit
When the workflow writes reports, events, prompts, and review packets locally, the user can inspect what happened.
This is much better than relying only on chat history.
4. “Full authority” should not mean “no rules”
Some users really do want to give AI more permission.
That is valid.
But the system should still keep hard safety rules. CEWP’s full-authority mode is designed around that idea.
Current status
CEWP is still beta.
The current version is:
0.2.0-beta.1
Published on npm:
npm install @setrathex/codex-engineering-workflow-pack
GitHub:
https://github.com/SetraTheXX/Codex-Engineering-Workflow-Pack
npm:
https://www.npmjs.com/package/@setrathex/codex-engineering-workflow-pack
What comes next
The next big direction is v0.3.
The main things I want to explore are:
- adapter registry
- fake adapter for deterministic tests
- package install smoke tests
- CI
- better operator docs
- commandless usage patterns
- future Gemini / Claude / OpenCode adapter experiments
The bigger goal is still the same:
Make AI-assisted development more structured, auditable, and safe enough to use on real projects.
Feedback
I am especially interested in feedback from people who use:
- Codex
- Claude Code
- Cursor
- GitHub Copilot
- Gemini
- OpenCode
- other AI coding tools
Questions I am thinking about:
- How would you design file scope rules for AI workers?
- Should full-authority mode ever include commit/push/publish?
- What should a model-independent adapter contract look like?
- How should multi-agent coding workflows be reviewed?
- What would make this easier to try in a real repo?
If this topic interests you, I would love feedback on the project.
GitHub: https://github.com/SetraTheXX/Codex-Engineering-Workflow-Pack
Top comments (0)