Many people use AI for coding by placing the whole workflow inside one chat: describe the task, ask the agent to read the repository, edit files, run tests, and summarize the result.
That works for small experiments. It becomes fragile in long-running projects, shared repositories, production systems, or professional software automation. The problem is not only whether the model is smart enough. The problem is that the model is being asked to own too much of the delivery process.
The better pattern is to place AI capability inside a project-specific delivery pipeline.
AI is the worker. The project pipeline constrains, validates, records, and escalates.
Why project-specific matters
A general AI tool cannot know a project's real risk boundaries by default.
In a trading system, payout, KYC, funded accounts, order states, and production release are hard boundaries. In a SketchUp modeling tool, the real boundaries are the structured design model, source evidence, bridge trace, SketchUp execution, and visual review. In a personal knowledge publishing system, the boundaries become source traceability, bilingual publication candidates, site rendering, and deployment ownership.
These constraints do not come from a generic model. They come from project truth.
So the goal is not to build a more general replacement for Codex or Claude Code. The goal is to build a stable AI delivery pipeline inside a real project.
What the pipeline owns
A useful project-specific AI delivery pipeline must answer questions like these:
- Is this request mature enough to execute?
- Should AI run automatically, analyze only, move fast under guardrails, or only run a spike?
- What context should AI receive before execution?
- What can AI change, and what is out of bounds?
- When must the AI stop and ask for a human?
- What evidence proves the work is complete?
- Should the result become a PR, a release candidate, a knowledge note, or only an experiment record?
If the project does not answer these questions through its own mechanisms, the AI is still improvising inside a chat.
A minimal structure
I break the pipeline into a few parts.
Task Intake turns discussion into an executable task contract.
Execution Mode Router decides how much autonomy AI gets.
Context Package gives the AI the narrow context it should see.
Work Isolation keeps AI execution inside a branch, worktree, slot workspace, or sandbox.
Stage-Gated Worker separates triage, analysis, implementation, validation, evidence packaging, and handoff.
Evidence Contract requires tests, screenshots, API output, logs, traces, or other reviewable proof.
Human Gate puts humans at real risk boundaries.
Feedback Capture turns repeated failures into rules, tests, skills, templates, or knowledge base entries.
Together, these parts are what I mean by a harness. It is not a prompt. It is not a single tool. It is the project control layer that lets AI participate in delivery.
Where TDD fits
TDD is useful, but it is not the whole answer.
When a behavior is clear and testable, writing tests before implementation is a strong pattern. But many real tasks are not function-level exercises. Frontend changes need screenshots. Data-link changes need API or log proof. SketchUp modeling needs structured model diffs and visual review. Knowledge publication needs source trace and bilingual route validation.
So the better rule is not "everything must be TDD." The better rule is "every delivery must have an evidence contract."
Tests are one kind of evidence. They are not the only kind.
Why this is more stable than vibe coding
Vibe coding is fast. Its weakness is that boundaries and evidence are often too weak.
A project-specific AI delivery pipeline does not reject speed. It puts speed on rails.
Low-risk tasks can auto-run. Complex but bounded tasks can run in guarded full-speed mode. High-risk tasks should stop at analysis or human confirmation. Exploratory work can be a spike, but it should not be treated as production-ready delivery.
AI can still move quickly. It just does not get to mix immature requirements, high-risk actions, and unverified completion into one vague "done."
The core idea
The future value is not just making AI more impressive inside a chat window. The value is making projects better at using AI inside their delivery systems.
Models will change. CLI tools will change. MCP, hooks, skills, and subagents will change.
The durable asset is the project mechanism: how tasks are defined, how context is provided, how execution is constrained, how evidence is collected, how humans intervene, and how failures improve the next run.
That is the value of a project-specific AI delivery pipeline.
Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/project-specific-ai-delivery-pipeline/
More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord
Top comments (0)