Sound familiar? In the first article, we set up a project scaffold designed for AI. But a good structure only gets you so far if the AI is just a code suggester. Useful, but not transformative. You still have to know what to ask, verify what it wrote, adapt it to your project, and repeat for every file.
In the second article, we saw what makes an AI agent different from a chatbot. It reads your code, takes actions, and works inside your project. But here's the catch: an agent is only as good as the instructions it follows.
In the third article, we saw how CLAUDE.md gives the agent its rules and workflow. But rules without depth only get you so far. "Use the Page Object Model" is a rule, but how exactly do you structure a page object? What's the difference between a locator getter and an action method? How do you compose components into page objects?
In the fourth article, we gave the agent deep expertise through skill files. Now it knows how to build page objects, selectors, and fixtures. But there's still a gap: the agent has never seen your application.
Here's a scenario every automation engineer knows. You ask an AI to generate a page object for your login page. It confidently produces:
get loginButton() {
return this.page.getByTestId('login-btn');
}
You run the test. It fails. The button doesn't have that test ID. It never did. The AI made it up because it had no way to know the real structure of your page.
This is the core problem with AI code generation for UI testing: the AI is writing about a UI it has never seen. The result is locators that look generally correct but don't work.
The scaffold's answer to this is a principle called explore first, and a tool called playwright-cli.
π€ What Is playwright-cli?
playwright-cli is a browser automation CLI that lets the AI agent control a real browser. Navigate to URLs, read the page's DOM, discover element roles and labels, take screenshots, and extract structured information.
When an agent has playwright-cli, it doesn't have to guess what's on your login page. It can go look.
# The agent runs something like this before generating code
playwright-cli "Navigate to https://myapp.com/login and list all interactive
elements: their role, accessible name, and any associated label text"
What comes back is a real inventory of the page:
- role: heading
name: "Sign in"
level: 1
- role: textbox
name: "Email address"
required: true
- role: textbox
name: "Password"
required: true
- role: button
name: "Sign in"
- role: link
name: "Forgot your password?"
- role: link
name: "Create an account"
Now when the agent writes a page object, it uses real information:
get emailInput() {
return this.page.getByLabel('Email address');
}
get passwordInput() {
return this.page.getByLabel('Password');
}
get signInButton() {
return this.page.getByRole('button', { name: 'Sign in' });
}
These locators work on the first run. No guessing, no iteration, no debugging brittle selectors.
πΊοΈ The Explore-First Workflow
The scaffold's CLAUDE.md makes exploration a required step before any code generation:
For UI pages:
1. Use playwright-cli to navigate to the target URL
2. Discover: element roles, accessible names, label text, form structure
3. Note any dynamic content or state-dependent elements
4. Only then: generate the page object
For API endpoints:
1. Make a real request to the endpoint
2. Capture: field names, data types, optional vs required fields
3. Note the exact error response structure
4. Only then: generate the Zod schema
Skip exploration only when the user has already provided the exact structure. Everything else, the agent goes to look first.
π§© What the Agent Discovers Beyond Selectors
A browser exploration session doesn't just find locators. A thorough agent also discovers:
Navigation flows. What happens after you click "Sign in"? Where does the page go? What element should the test assert against to confirm success?
Form validation. Does the form validate on blur or on submit? What do the error messages actually say? ("Email is required" or "Please enter your email address"?)
Dynamic content. Is there a loading spinner? A toast notification? An element that only appears after an API call? These affect how the test should wait for state.
Page structure. Is the "Settings" link in a sidebar, a dropdown, or a navigation bar? This determines whether it belongs in the page object or a shared component.
All of this context shapes better tests, tests that reflect how the application actually works, not how the agent imagined it might work.
π Exploration for APIs
The same principle applies to API testing. Before the agent writes a Zod schema (if the API documentation is not available), it makes a real request to the endpoint:
# The agent calls the actual API
POST /auth/login
{ "email": "test@example.com", "password": "secret" }
Response:
{
"id": 42,
"email": "test@example.com",
"firstName": "Test",
"lastName": "User",
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresAt": "2026-03-01T00:00:00.000Z"
}
Now the schema is built from reality:
export const LoginResponseSchema = z.strictObject({
id: z.number(),
email: z.string().email(),
firstName: z.string(),
lastName: z.string(),
token: z.string(),
expiresAt: z.string().datetime(),
});
Every field name is correct. Every type is verified. z.strictObject() means if the API later adds an unexpected field, the test flags it immediately.
β οΈ When Exploration Reveals Surprises
Sometimes what the agent finds is not what you expected, and that's valuable information.
The label on the form says "E-mail" (with a hyphen), not "Email". The button says "Log in", not "Login". The error message says "Your password is incorrect." with a trailing period. These small differences matter for locators and assertions.
Without exploration, the agent would have guessed and been wrong. With exploration, it finds the truth, and so do you.
π§βπ» Your Role in the Explore-First Loop
With this workflow, your job shifts. You're not writing locators. You're directing exploration.
Before: You open DevTools, inspect the DOM, copy selectors, paste them, test them, adjust them.
After: You tell the agent what page to explore and what to generate. You review the output, confirm the locators look right, run the tests.
The agent does the tedious part. You do the thinking part.
ππ» Thank you for reading! The explore-first principle is what makes AI-generated tests reliable instead of plausible. In the final article, we bring everything together: a complete end-to-end example of an agentic QA session from prompt to passing test.
You can find the Public README.md file for the scaffold on GitHub: Playwright Scaffold
You can get access to the private GitHub repository here: Get Access




Top comments (0)