Vladimir Panov

Posted on May 10 • Edited on May 17

AI-Native Software Delivery

#ai #softwareengineering #software #programming

A process for shipping software when AI generates most of the implementation and humans own intent, validation, and behavioral guarantees.

This is a rewrite of an earlier article that introduced the idea. The thinking has matured into a working, clone-and-go process toolkit:

github.com/DominicTylor/ai-software-process

What follows is the methodology in its current shape — what changed since the first article, why those changes mattered, and what the resulting process looks like in practice.

The problem this process exists to solve

AI now generates a meaningful share of production code, and that share keeps growing. Engineers spend less time writing code and more time reviewing what was generated, often less deeply than we like to admit. The question stops being "is this code well-written?" and becomes "does this code do what we said it should do?" — and the systems most engineering teams use today were designed for a world where humans wrote everything by hand.

A few pains, all interlocking:

Story trackers describe intent in plain English in one place; the code that implements it lives somewhere else; the tests that verify it live somewhere else again. Each drifts at its own rate. A Jira ticket that was current six months ago describes behavior that the code no longer has — and nobody knows when it stopped being accurate.

BDD frameworks tried to bridge this. They duplicated scenarios in Gherkin (.feature files) alongside the executable code. Two artifacts that say the same thing must be kept in sync, and they never are. The feature file becomes documentation; the code becomes the truth; the gap between them is where bugs live.

Decision history fragments across chat threads, old tickets, deleted documents, and the heads of senior engineers. When a new contributor asks "why did we drop password authentication?", the answer is gone. The current code shows what we do; nothing shows why.

Tests check internal state — Redis keys, database rows, in-memory variables — instead of customer-observable behavior. They pass while the user-visible flow is broken, because they were written to be easy rather than honest.

And underneath all of this: when AI generates the implementation, none of the safeguards a team built around human authorship apply the same way. Code review catches less because reviewers skim more. Linters and type checks don't catch semantic drift from the spec. The acceptance criteria — if they exist — are in a ticket nobody reads.

The standard practices were designed for a different world. The reshape is not a productivity tweak; it is a structural change in where trust lives.

What changes

Implementation-centric engineering optimized for the people writing code. Result-centric engineering optimizes for the property the product must hold. The shift is concrete:

Humans increasingly own intent, validation, and behavioral guarantees. AI increasingly generates the code that satisfies them. The center of human attention moves from "did I write this well?" to "have I described what must hold, and is it verified?"

This is not a productivity argument. The argument is that as AI-generated code becomes the default, the only durable form of trust is executable behavioral validation: a machine-checkable through-line from customer intent to verified behavior. Everything in the process below serves that through-line.

Stories are folders, not tickets

The unit of work is not a row in a tracker. It is a folder in git.

stories/auth/user-signup/
  user-spec.md           ← what the user wants, who they are, what must hold
  e2e/
    signup-via-github.spec.ts
    signup-via-magic-link.spec.ts
  perf/                  (optional)
  security/              (optional)
  a11y/                  (optional)

The user-spec describes intent, personas, high-level user goals, functional constraints, architect tech notes, quality-gate notes, scenario references, and which platform invariants the Story is subject to. What it deliberately does not contain: step-by-step scenarios in prose (those are in the commented tests), changelogs (in git), resolved-question annotations (in git), future-state plans (elsewhere), implementation choices (in tech specs in code repos).

A Story always describes current behavior — not history, not roadmap, not the journey of how it got here. If the system behaves this way right now, it goes in the spec. Otherwise, it goes elsewhere. That single rule eliminates a class of drift that older spec formats accept by default.

A scenario, concretely

Acceptance criteria live as commented executable tests:

test('User signs up via GitHub for the first time', async ({ user }) => {
  // # User opens the signup page
  await user.opensSignupPage();

  // # User sees three auth options with GitHub marked as recommended
  await user.seesAuthOptions({ recommended: 'github' });

  // # User clicks "Continue with GitHub"
  await user.clicksContinueWithGitHub();

  // # System completes OAuth and lands the user on an empty workspace dashboard
  await user.expectsDashboardWithEmptyWorkspace();
});

The Owner writes the comments first. An AI helper, or the Implementer, fills in executable code under each comment without removing it. The comment stays as documentation; the code stays as verification; they live in the same file for the life of the test.

Three properties fall out of this shape with no extra effort:

The Owner reads only the comments to verify acceptance criteria. The Quality Gate Specialist reads comments and code together to verify they agree. Drift between described behavior and verified behavior is physically impossible — they share a file and appear in the same diff. There is no second artifact to keep synchronized.

A scenario that has only comments and no code under them is wrapped in test.todo(). It appears in the test runner as TODO. The Story's observable state — what's drafted, what's verified, what's still pending — is whatever the test runner reports. Not a status field; the report.

Frameworks as bilateral contracts

Notice the test above. No selectors. No data-testid strings. No sleeps. No mocks. The verbs — user.opensSignupPage(), user.clicksContinueWithGitHub() — come from a framework owned by the Quality Gate Specialist.

The framework's PageObjects hold one end of a bilateral contract:

// frameworks/e2e/page-objects/login-page.ts
class LoginPage {
  async entersEmail(email: string) {
    await this.page.fill('[data-testid="login-email"]', email);
  }
}

The test-id "login-email" lives in this file, in plain code. Code-perimeter implementation reads this file to know what the UI must expose. There is no separate test-id registry; the framework's PageObjects are the registry.

When the Story uses user.entersLoginEmail(...), the framework declares that the UI must render data-testid="login-email" on the login form's email input. The code-perimeter team reads the framework to know what to build. The team building the test reads the framework to know what verbs are available.

Same pattern for other frameworks. A probe.scanTable('account', { where: 'password IS NOT NULL' }) call in the security framework rests on a database probe helper that names the table and column explicitly; code that creates the schema reads that helper to know what names are expected.

This is the through-line from intent to verified code: scenarios consume framework verbs, framework PageObjects declare identifiers, code-perimeter implementation honors those identifiers. The contract is not implicit. It is readable code on both sides.

Constitution: platform-wide invariants

Stories describe features. Some rules apply across all features — they are invariants of the platform, not properties of any one Story. Those live in a separate constitution.md document at the repository root.

A constitution is short, prose, self-contained:

No user password is ever stored, transmitted, or accepted by any code path.

The SMTP capture service never originates outgoing connections on ports 25, 465, or 587.

No request authenticated by tenant A can read or modify data belonging to tenant B. Cross-tenant access attempts return 404 or 403, never 500 and never partial data.

All persistent data is encrypted at rest under platform-managed keys.

The constitution declares; it does not enumerate enforcement. Each rule has an owning Story under stories/ — same shape as any other Story, but with an attacker or system-probe persona, scenarios that attempt to violate the rule, and assertions that the violation is refused.

A Story may reference a constitution rule it depends on (enforces: no-passwords in frontmatter) for traceability. The constitution itself does not reference back. It does not need to know how each rule is verified or by which Story.

The Architect holds final word over constitution.md. When a Story's design would conflict with a rule, the Architect-review skill blocks the PR. If the team genuinely wants to change the rule, that goes through a separate constitution PR — explicit, reviewed, recorded.

Decision history in commit messages

Every behavioral change in the system is captured in a structured git commit message:

behavior: drop password auth from auth flow

Why: minimize attack surface; password storage adds risk for a developer-tool
audience that is comfortable with OAuth and magic links.
Considered: keep with bcrypt, move to passkeys only, drop entirely.
Chose: drop entirely; OAuth + magic link cover all signup and login paths.
Affects: stories/auth/user-signup/, constitution.md §3.2.

A commit-msg git hook enforces the Why / Considered / Chose / Affects shape on every commit whose subject starts with behavior:. Non-behavioral commits (chore, fix, docs) are not gated. The hook is local; a matching CI check protects main against commits that bypassed the hook.

Significant decisions are tagged for direct addressability:

git tag decision/no-password-auth <sha>
git tag -l "decision/*"

To answer "why did we decide X?", the workflow is git log --grep="X", follow tags, read the structured sections of the relevant commit. There is a /decision-search skill that makes this queryable in natural language, but the storage layer is just git.

This rule has one specific consequence: no artifact in the repository contains a changelog. No "Resolved on 2026-05-11" annotations inside specs. No version-history blocks. Specs describe current behavior. The history of why behavior is current lives in commit messages, addressable through tags.

State is observed, not declared

A Story is not a thing that has a status: "approved" field. The tracked unit is a change vector — a (branch, PR) pair against the master repository — and its state is whatever git and the forge say it is.

Branch	PR	Vector state
Doesn't exist (or merged into main)	—	Doesn't exist, or is live
Exists, no PR	—	Private work in progress
Exists, PR open (any state)	Draft / Open	In review or iteration
Exists, PR open, all approvals + CI green	Open	Ready to merge
Merged	Merged	Live

There is no status field anywhere. No external state store. git branch -a plus the forge's PR list answers "who's working on what, what's in review, what's blocked, what just shipped" — without a separate tool.

Parallel work follows from this naturally. Multiple branches mean multiple vectors in flight. Two vectors touching the same area are a coordination signal — usually a sign that two people are attempting the same change without realizing it. Tooling on top of the process (a dashboard, a query) can surface this; the process itself does not block it.

Master and code perimeters: asymmetric awareness

A non-trivial project usually has two physical territories: the master repository where Stories, frameworks, and constitution live; and one or more code repositories where implementation, tech specs, and code review live. In small projects these can coexist in one physical repo; the boundary remains conceptual.

Awareness flows in one direction only.

The master perimeter never queries, inspects, or coordinates with code repositories. Its references to them are descriptive ("this Story affects services A and B"), not operational. A skill running in the master perimeter never opens a code repository's pull request list, never reads code-side tech specs to decide what to do, never coordinates code-side merges.

The code perimeter, in contrast, reads the master perimeter as its source of truth. To produce a tech spec or implementation, an engineering agent in a code repository reads the corresponding Story's user-spec, the architect tech notes inside it, the quality gates, the constitution rules it must respect, and the framework's PageObjects to know what identifiers to expose.

This asymmetry keeps responsibilities clean. The master perimeter never has to know about deployment topology, CI runners, or how many code repositories the company has. The code perimeter never has to argue with product about what a feature should do.

Roles as areas of final word

The process names six roles: Owner, Architect, Quality Gate Specialist, Implementer, Code Owner, UI/UX Specialist. These are responsibilities, not positions. In a one-person project, one human holds all six and switches modes consciously. In a fifteen-person team, they tend toward distinct people. The role structure stays the same regardless of headcount.

Stories are written collaboratively. Anyone with something to contribute writes into the Story — Owner sets intent, Architect adds tech notes, Quality Gate Specialist refines scenarios, UI/UX Specialist adds accessibility constraints. A Story does not "belong" to a role.

What roles hold is areas of final word — domains where, when a decision is contested, that role's approval is required to ship. These are enforced through CODEOWNERS rules on path patterns, not through social convention. Owner over stories/**/user-spec.md, Architect over constitution.md, Quality Gate Specialist over frameworks/** and scenario folders, and so on.

A second pattern governs how non-final-word roles still carry real weight: mandatory review with required engagement. Horizontal roles — Architect, Quality Gate Specialist, UI/UX Specialist — review every relevant Story automatically. Their comments are blocking. The Owner is free to overrule their advice in the Owner's own product domain, but only by explicit acknowledgment — a written statement that the risk is read, accepted, and carried. Silent dismissal is not allowed.

When a horizontal role believes an Owner's acknowledgment underestimates a systemic risk, the open path is to escalate by opening a constitution PR. The discussion moves from Story scope (Owner's call) to platform scope (Architect's call). The right argument resolves at the right altitude.

Skills and sub-agents

Skills are AI-assisted helpers — focused operations a contributor invokes like commands: /spec-brainstorm, /architect-review, /scenario-implement, /decision-search. Sub-agents are role-specialists with deeper expertise in a single domain (spec-spec for product consistency, architect-spec for constitution and system invariants, quality-spec for frameworks and coverage, ui-ux-spec for visible-state completeness, decision-historian for git history). Skills invoke sub-agents when judgment in a domain is required.

The operational principle is: skills are soft assistance inside a branch, hard gates on the pull request. While work is private, every skill is invokable and ignorable. An author can use them or not. Once a PR is opened, the horizontal-role review skills run automatically and their comments are blocking. Resolution takes one of two forms — fix the concern, or write an explicit acknowledgment of the risk. There is no third option.

This separation matters. A contributor who prefers to write everything by hand is not punished. A contributor who relies heavily on AI is not given a shortcut around the gates. The path is different; the destination is the same.

What this process is not

To prevent drift toward familiar models:

Not Jira-as-spec. A Story is not a ticket. Stories live in git and always describe current behavior. There is no parallel ticket queue describing the same work.

Not BDD with Gherkin. Scenarios are first-class TypeScript inside test files, not a parallel .feature layer that has to be kept in sync. One source of truth.

Not test-after. Acceptance criteria are written before or alongside implementation, as commented executable tests. Code is generated to satisfy them, not the other way around.

Not waterfall, despite the milestone names. Stages cycle. Implementation can expose gaps in scenarios; review can redirect back to spec. The process recognizes legitimate return points instead of pretending the flow is linear.

Not vendor-locked. The methodology shape is independent of any specific AI tool. The current incarnation uses Claude Code; the same shape works with any AI environment given equivalent primitives.

Not headcount-prescriptive. Roles are responsibilities. One person can hold all six. A team of fifteen can split them across people. The process does not require any specific organizational chart.

Where to start

The process lives at github.com/DominicTylor/ai-software-process.

The repository is a clone-and-go Claude Code toolkit. process.md is the full canon. constitution.md is a working template adopters rewrite in place. templates/story/ holds the per-Story scaffolds skills consume when creating new artifacts. .claude/skills/ and .claude/agents/ ship a complete master-perimeter toolkit (twelve skills, five sub-agents) plus a code-perimeter starter pack drawn from a real TypeScript/Node monorepo. .githooks/commit-msg enforces the structured commit format locally; .github/workflows/ enforces it (and spec validation, and ai-review) on the PR.

Adopt by forking, rewriting the constitution and high-level project description in place, and piloting a Story end-to-end. MIT No Attribution license — no permission, no obligation to credit, no friction. Attribution back is appreciated but never required.

If anything here resonated, that's where it goes from idea to applied.

Top comments (1)

Jakub • May 11

wow)cool)🫡