idavidov13

Posted on Apr 20 • Originally published at idavidov.eu

AI-Native Workflow: The Operating Manual for Your Agent

#testing #agents #softwaredevelopment #qa

Imagine your first day on a new team. The codebase is well organized, the docs are written, the linter is configured.

Nobody, however, tells you how this team actually works. Which channel for what, when to ask before pushing, when the senior dev expects to review.

That's exactly what a new engineer faces when they first open the scaffold we've been building across this series. The folders are clean, CLAUDE.md is loaded, the skills are sitting in .claude/skills/.

Then you open the chat, type your first prompt, and... what are you supposed to say?

The previous six articles built the scaffold. The first gave it structure, the second explained what makes an agent different from a chatbot, and the third gave the agent its rules.

The fourth gave it deep expertise. The fifth gave it eyes. The sixth showed all of those parts working together on a real task.

But none of that tells you how to drive the machine. That's what this article is about, and it kicks off a new sub-series inside Agentic QA called Working With the Agent.

🗺️ The Operating Manual: What ai-native-workflow Is

ai-native-workflow is the meta skill that ties every other skill together. Where the deep skills tell the agent how to write a page object or an API test, the meta skill tells the agent how to behave on this scaffold. It's the operating manual for the machine you've been handed.

Think of the system as three layers, each loaded at a different time.

Layer	What it is	When it loads
L1: Orchestrator	`CLAUDE.md` - the constitution, the workflow, the skills index	Always
L2: Specialized skills	`.claude/skills/{name}/SKILL.md` - deep rules, phased instructions	Triggered by your wording
L3: Code conventions	The actual TypeScript - fixtures, page objects, enums, factories	Read on demand from the repo

You usually don't think about which skill loads. You describe the work in plain language and the meta skill routes the request to the right specialist. The orchestrator is the table of contents, the skills are the brain, the code is the truth.

🤝 The Conversation Contract

This is the single most important habit to internalize. Every non-trivial task on the scaffold follows the same five-step loop, called audit-then-edit.

You state the goal in plain language.
The agent loads the relevant skill and proposes scope - what will change, in which files, why, with trade-offs.
You approve, modify, or reject.
The agent applies the change.
The agent reports what landed and asks whether to commit.

For one-line fixes and obvious typos there's a faster path called direct mode - the agent just does it. You can opt into direct mode for a whole session by saying "just do it" once.

The contract has hard stops baked in. The agent must stop and ask whenever a path, an enum value, a message, or an endpoint is unknown. It must refuse to ship guessed selectors, hardcoded credentials, suppressed test failures, any types, XPath, or page.waitForTimeout(...). These aren't preferences. They're refusals.

Audit-then-edit is the difference between an agent that helps you and an agent that surprises you.

🧭 How the Right Skill Loads Itself

You don't pick the skill. The way you phrase the task picks it for you. This is the full routing table the meta skill uses to dispatch work.

You say...	First skill that loads	Then chains to
"Add tests for `POST /api/...`"	`api-testing`	`data-strategy`, `enums`, `type-safety`, `debugging`
"Add a page object for the settings page"	`page-objects`	`selectors`, `playwright-cli`, `enums`, `fixtures`
"How do I add Y? / Generate the prompt for X"	`common-tasks`	the matching specialized skill
"Test is failing / behaving unexpectedly"	`debugging`	`api-testing`, `selectors`, `fixtures`, `refactor-values`
"Rename this enum value / change a static-data row"	`refactor-values`	`enums` or `data-strategy`, then `debugging`
"Create a new factory"	`data-strategy`	`type-safety`, `api-testing`
"Add a helper / fixture"	`helpers` or `fixtures`	`api-testing` Phase 8 (promotion criteria)
"Add an env var / config / utility URL"	`config`	`enums`, `type-safety`
"Add an enum / endpoint / message"	`enums`	`playwright-cli` for live-text verification
"Refactor a Zod schema / `any` to typed"	`type-safety`	`api-testing`
"Add a new spec file / tagging question"	`test-standards`	`data-strategy`, `api-testing`, `page-objects`

If nothing matches, the agent defaults to common-tasks or asks you. The lesson: describe the work, don't name the skill. Naming a skill is a fallback for when the agent loaded the wrong one.

🔄 The 7-Phase Rhythm of Every Task

Once a skill loads, the work moves through the same seven phases. The rhythm is the same whether you're adding a test, refactoring an enum, or hunting a flaky failure.

Identify the work category. New artifact, edit, refactor, debug, or investigation.
Explore before generating. playwright-cli for UI, OpenAPI for API, ls pages/ and ls enums/ for repo conventions. If exploration is impossible, the agent stops and notifies you.
Propose scope. What, where, why. You approve before any file changes.
Apply the critical rules from each loaded skill. These are hard stops, not suggestions.
Verify against the skill's checklist - the api-testing coverage matrix, the page-objects fixture registration, the refactor-values tsc + eslint + targeted tests gate.
Run the affected tests. npx playwright test <file>, never the full suite. On red, the debugging skill loads.
Commit with a why message. Title imperative and specific, body lists substantive changes, one logical change per commit.

Notice that exploration is phase two, not phase four. The scaffold treats "look before you build" as non-negotiable.

🧱 Five Principles That Make the Scaffold AI-Native

Underneath all the skills and phases, five principles do the heavy lifting. They exist so the agent never has to guess.

Single source of truth per value class. URLs and credentials live in process.env.*. Endpoint paths and UI messages live in enums/{area}/. Universal invalid values live in test-data/static/util/invalid-values.ts. Dynamic happy-path data comes from Faker factories in test-data/factories/{area}/. There is exactly one right place for every kind of value.
Hard-stop forbidden patterns. Every Critical block in every skill has a list of NEVER rules with concrete anti-examples. They trigger refusal, not warnings.
Mandatory exploration discipline. playwright-cli for UI, OpenAPI or docs first for API. No guessing selectors. No inventing endpoints.
Strict folder discipline. Every artifact has exactly one home. The folder layout maps cleanly to skill names so keyword routing works.
Phased instructions inside skills. You don't invent a workflow per task. You follow the phases the skill already defined.

The result is consistency. The same prompt, given on a Monday or a Friday, produces the same shape of output.

🎬 A Worked Example: From Prompt to Commit

Say you ask: "Add API tests for POST /api/products."

CLAUDE.md is already loaded. The wording routes through common-tasks to api-testing. The agent confirms an OpenAPI spec exists for /api/products, then runs ls fixtures/api/schemas/, ls tests/, and ls enums/ to ground itself in the conventions.

It proposes scope: schema name and location, factory name and location, spec structure, the full status-code coverage matrix, the validation tiers. You approve.

It applies the change using z.strictObject() from type-safety, expect(Schema.parse(body)).toBeTruthy() from api-testing, the ApiEndpoints.PRODUCTS enum from enums, and a Faker factory from data-strategy. It tags the spec with @api per test-standards. It verifies against the api-testing Phase 5 coverage matrix.

It runs npx playwright test tests/{area}/api/products.spec.ts. Green. It commits with the message Add POST /api/products tests with full coverage matrix and asks whether to continue.

That's the whole loop. Seven phases, three skills, one approval point.

🚧 Common Gotchas And How to Steer Out of Them

A few things go sideways often enough to be worth naming.

The agent generates something off-convention. It loaded the wrong skill. Name the skill in your next prompt: "use the api-testing skill". Then ask it to redo.
The agent invents a folder, an enum value, or an env var. Reject it. Require re-verification with ls, the OpenAPI spec, or playwright-cli. The fix is exploration, not regeneration.
The agent suppresses a failing test with .skip, raised timeouts, or weakened assertions. Reject the suppression. Require the debugging skill's root-cause phases instead.
Cursor and Claude Code give different answers. The .claude/skills/ directory is the canonical source. The mirrors in .cursor/skills/ and .github/instructions/ may lag. When in doubt, defer to .claude/.

The pattern in all four: when the agent goes off the rails, the fix is to push it back into the workflow, not to do the work yourself.

🚀 Get Started

You have everything you need to start working with the agent on a real task.

You can find the public README for the scaffold on GitHub: Playwright Scaffold.

You can get access to the private GitHub repository here: Get Access.

🙏🏻 Thank you for reading! This article opens the Working With the Agent sub-series inside Agentic QA. The next ones will go deeper into the daily moments of working with an AI agent on the scaffold, one habit at a time.