DEV Community

shrey
shrey

Posted on • Originally published at shreyshahh.substack.com

Steal my prompt to turn Codex into an Orchestration Manager

The mistake with coding agents is treating them like a single chat window.

You paste a task. The agent writes a patch. You check it. Something is missing. You paste another task. Then another. Then CI fails. Then review comments come in. Then you realize you are still the project manager, the QA loop, the scheduler, and the person remembering what every thread was supposed to do.

The better workflow is to make one Codex thread responsible for orchestration.

Not just the code.

The loop around the code.

You give Codex a defined batch of work and ask it to act as the Orchestration Manager. It breaks the work into threads, gives them goals, checks progress on a heartbeat, watches PRs or handoffs, routes feedback, requires verification, and keeps going until the work is actually complete.

This works for any scoped work.

It can be a bug fix, a refactor, a migration, a documentation cleanup, a test suite repair, a research task, or a backlog of small improvements.

The whole thing starts from one prompt.

The copy-paste prompt

Use this as the starting prompt for the manager thread:

You are the Orchestration Manager for this work.

Your job is to move the work forward until it is actually complete. Do not personally implement everything unless that is the smallest safe path. You are responsible for planning, scheduling, coordination, follow-up, verification, and keeping momentum.

Work scope:
[PASTE PLAN.md, issue list, checklist, bug report, feature list, refactor plan, research brief, or task description here]

Before starting, review the work scope and tell me if anything is unclear, risky, too broad, underspecified, or likely to cause worker threads to collide. If the scope is clear enough, continue. If not, ask the minimum number of questions needed to make it executable.

Responsibilities:
1. Break the work scope into worker-thread tasks that can run independently.
2. Decide which tasks should run now and which should wait.
3. Create or instruct worker threads with clear /goal prompts.
4. Make sure each worker has a narrow scope, a definition of done, and a reporting format.
5. Make sure code changes include tests where practical.
6. Make sure non-code work includes a clear verification step.
7. Make sure workers open PRs, prepare patches, produce artifacts, or deliver handoffs as appropriate.
8. Track worker status, changed files, test results, PR status, CI status, review feedback, and blockers.
9. Create follow-up worker threads when tests fail, review feedback appears, work stalls, artifacts are incomplete, or the implementation does not meet the definition of done.
10. Keep moving until the work is complete, verified, and ready for human review or merge.

Use /goal to keep yourself and the worker threads on track.

Use a heartbeat every 10 minutes.

At every heartbeat:
- check the status of all worker threads
- identify stale or blocked work
- inspect open PRs or current artifacts
- check whether tests, CI, or verification passed
- check whether review feedback needs action
- decide the next action
- create follow-up work if needed

Worker threads should also check in immediately when:
- they open or update a PR
- they produce a draft, patch, report, or artifact
- tests pass or fail
- CI fails
- review feedback is addressed
- they are blocked
- they believe their work is complete

Every worker thread should report back in this format:

Status:
Done / Blocked / Needs review

Summary:
What changed or what was produced in plain English.

Files or artifacts changed:
- ...

Verification:
- command, check, review step, or result

PR or handoff:
- link, artifact path, or not created yet

Open issues:
- none, or list blockers

Recommended next action:
- merge
- review
- create follow-up thread
- wait for another thread
- manual test needed
- publish artifact
- stop

Rules:
- Keep worker scopes small.
- Avoid assigning two workers to edit the same files unless there is a clear reason.
- Do not mark work complete just because something was written.
- Work is complete only when the definition of done is met and verification has passed.
- If UI behavior is involved, require manual testing or a clear test plan before calling it done.
- If research or writing is involved, require source review and a final editorial pass before calling it done.
- If a worker thread becomes unresponsive, start a replacement thread and preserve the current state.
- If you are unsure whether the next step is safe, pause and ask.

Start by producing:
1. The proposed worker-thread schedule
2. The /goal prompt for each worker
3. The heartbeat plan
4. The verification plan
5. The first action you will take
Enter fullscreen mode Exit fullscreen mode

That is the workflow.

One prompt turns a Codex thread into the Orchestration Manager for a piece of work.

Why this works

Most agent workflows break at the handoff points.

The first output might be decent. The problem is everything around the output:

Did the agent run tests?
Did it verify the result?
Did it open a PR?
Did CI pass?
Did review comments come in?
Did a worker get stuck?
Did two agents edit the same file?
Did anyone follow up after the first attempt failed?
Enter fullscreen mode Exit fullscreen mode

That is where the human usually gets dragged back in.

The Orchestration Manager prompt moves those handoff points into the agent's job description.

The manager thread is not there to be clever. It is there to keep state and keep pushing.

The mental model

Think of the system as three layers.

Human
-> defines the work and reviews judgment calls

Orchestration Manager
-> schedules work, checks progress, tracks handoffs, starts follow-ups

Worker threads
-> execute narrow tasks in their own context or worktree
Enter fullscreen mode Exit fullscreen mode

The human still owns taste, judgment, and final approval.

The manager owns momentum.

The workers own execution.

That separation matters because coding agents get worse when one thread has to be planner, implementer, reviewer, tester, project manager, and memory all at once.

A manager thread should stay at the work-loop level. A worker thread should stay at the task level.

The work scope has to be real

This workflow works best when the work is already defined.

Bad input:

Make the app better.
Enter fullscreen mode Exit fullscreen mode

Better input:

Work scope:
1. Replace the legacy billing webhook handler with the new event router
2. Add tests for invoice.paid, invoice.failed, subscription.updated, and subscription.deleted
3. Backfill missing webhook fixtures
4. Update the deployment runbook with the new rollback steps
5. Open a PR with the migration, tests, risks, and verification notes
6. Address review comments until the PR is ready to merge
Enter fullscreen mode Exit fullscreen mode

That scope is specific enough for an Orchestration Manager to split into useful worker threads:

Worker 1: event router implementation
Worker 2: webhook fixture backfill
Worker 3: test coverage
Worker 4: runbook update
Worker 5: PR review follow-up
Enter fullscreen mode Exit fullscreen mode

The manager can help refine a plan, but it cannot read your mind.

If you give it vague work, it will create vague workers. If you give it crisp work, it can schedule useful threads.

The worker goal matters

The worker prompt should not be a feature request.

It should be a finish line.

Weak:

Fix the billing webhook code.
Enter fullscreen mode Exit fullscreen mode

Better:

/goal Replace the legacy billing webhook handler with the new event router. Keep changes limited to webhook routing, billing event handlers, and directly related tests. Preserve existing behavior for invoice.paid, invoice.failed, subscription.updated, and subscription.deleted. Add or update tests proving the migrated paths work. Report changed files, test results, blockers, and recommended next action. Stop only when the implementation is PR-ready or you are blocked.
Enter fullscreen mode Exit fullscreen mode

That one prompt gives the worker a role, scope, test expectation, handoff format, and stop condition.

The manager can generate these goals automatically, but the goal shape should stay this explicit.

Heartbeats are the safety net

The heartbeat is what makes this feel different from a normal chat.

Every 10 minutes, the manager checks the system:

What is active?
What is blocked?
What changed?
Which PRs or artifacts exist?
Did tests pass?
Did CI fail?
Did review feedback appear?
What needs a follow-up worker?
Enter fullscreen mode Exit fullscreen mode

The heartbeat should not be the only way information moves.

Workers should check in immediately when they hit a milestone. A heartbeat catches stale work. Worker check-ins keep the loop responsive.

Use both.

Feedback is part of the workflow

A lot of agent workflows stop at "PR opened" or "draft produced."

That is too early.

The Orchestration Manager should treat feedback as new work.

If review comments appear, the manager should classify them:

bug
style
test request
architecture concern
documentation request
unclear feedback
Enter fullscreen mode Exit fullscreen mode

Then it should either assign all comments to one worker or split them across follow-up workers.

Example follow-up goal:

/goal Address the unresolved review comments related to the billing event router. Only modify files needed for those comments. Add or update tests if behavior changes. Run the relevant test suite. Report which comments were resolved, changed files, test results, and whether any review feedback remains. Stop only when these comments are resolved or blocked.
Enter fullscreen mode Exit fullscreen mode

This is where the workflow becomes useful.

The first pass is rarely the final pass. The manager exists to keep the second and third passes from being forgotten.

Checks and balances

Do not confuse this with fully trusting the agent.

The manager thread can stall. Worker threads can hallucinate completion. Heartbeats can keep a bad loop alive. UI changes can look correct in code and still feel wrong in the app. Research can sound complete while missing the primary source.

So the prompt should force checks:

Code changes need tests where practical.
PRs need review.
CI status matters.
UI work needs manual testing or a clear test plan.
Research work needs source review.
Writing work needs an editorial pass.
Unresponsive threads should be replaced.
The manager should pause when the next step is unsafe.
Enter fullscreen mode Exit fullscreen mode

The point is not to remove human judgment.

The point is to remove human babysitting.

The reusable pattern

The pattern is simple:

Define the work.
Start an Orchestration Manager Codex thread.
Ask it to schedule worker threads.
Use /goal for every worker.
Run a 10 minute heartbeat.
Require verification, tests, review, or artifacts depending on the work.
Spawn follow-up workers when anything fails.
Repeat until the work is complete.
Enter fullscreen mode Exit fullscreen mode

That is a better default than one giant prompt.

A single agent gives you an attempt.

An Orchestration Manager gives you an operating loop.

The human still decides what matters.

The agent keeps the work moving.

Top comments (0)