<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Darpan Shah</title>
    <description>The latest articles on DEV Community by Darpan Shah (@darpancshah).</description>
    <link>https://dev.to/darpancshah</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866246%2Fca34fd60-ba60-43ad-853e-120e03bafa4c.PNG</url>
      <title>DEV Community: Darpan Shah</title>
      <link>https://dev.to/darpancshah</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/darpancshah"/>
    <language>en</language>
    <item>
      <title>Agent-Driven E2E Testing with Cypress: A Practical Guide to Harness Engineering with Cursor Subagents</title>
      <dc:creator>Darpan Shah</dc:creator>
      <pubDate>Tue, 07 Apr 2026 22:13:28 +0000</pubDate>
      <link>https://dev.to/cypress/agent-driven-e2e-testing-with-cypress-a-practical-guide-to-harness-engineering-with-cursor-5fob</link>
      <guid>https://dev.to/cypress/agent-driven-e2e-testing-with-cypress-a-practical-guide-to-harness-engineering-with-cursor-5fob</guid>
      <description>&lt;p&gt;Teams have done end-to-end testing deliberately for years: exploring the app, writing tests from what they see, fixing failures in focused sessions. That's skilled work, not guesswork.&lt;/p&gt;

&lt;p&gt;The hard part is usually organizational. Knowledge sits in people's heads or scattered across chat histories and tickets. What you see on a live screen is tough to describe clearly to whoever writes the automated test. Each new flow forces everyone to reload the same context from scratch.&lt;/p&gt;

&lt;p&gt;Agent-driven development doesn't replace that judgment. It packages skilled work into narrow roles (explore, implement, execute, repair) with clear inputs and outputs. Quality builds over time instead of starting from zero every sprint.&lt;/p&gt;

&lt;p&gt;This approach mirrors &lt;strong&gt;harness engineering&lt;/strong&gt;: the system around the agents that makes them reliable, not just capable.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is a Harness, and Why Does It Matter?
&lt;/h3&gt;

&lt;p&gt;The term "harness" has emerged as shorthand for everything in an AI agent system except the model itself. Put simply: &lt;strong&gt;Agent = Model + Harness&lt;/strong&gt;. "The core challenge of long-running agents is that they must work in discrete sessions, and each new session begins with no memory of what came before." According to Anthropic's engineering research, imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift. Without structure, agents drift, repeat work, or declare victory too early.&lt;/p&gt;

&lt;p&gt;Their solution? A two-fold approach: an initializer agent that sets up the environment on the first run, and a coding agent that makes incremental progress in every session, while leaving clear artifacts for the next session.&lt;/p&gt;

&lt;p&gt;When you talking about a coding agent. Martin Fowler's team breaks harness engineering into key components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context engineering&lt;/strong&gt;: Provides us with the means to make guides and sensors available to the agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architectural constraints&lt;/strong&gt;: Rules that mechanically enforce quality (not just suggestions).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback loops&lt;/strong&gt;: The human's job is to steer the agent by iterating on the harness. Whenever an issue happens multiple times, the feedforward and feedback controls should be improved to make the issue less probable or even prevent it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Here's the counterintuitive insight: increasing trust and reliability in AI-generated code requires constraining the solution space rather than expanding it. Narrow roles, explicit handoffs, and clear boundaries make agents more productive, not less.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  How This Applies to E2E Testing with Cypress
&lt;/h3&gt;

&lt;p&gt;This article describes four agents specialized for &lt;strong&gt;E2E testing&lt;/strong&gt; using Cypress and how they form a closed loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;cypress-browser-explorer&lt;/strong&gt;: Maps UI flows with live browser tooling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cypress-builder&lt;/strong&gt;: Implements specs per team conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cypress-runner&lt;/strong&gt;: Executes tests consistently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cypress-debugger&lt;/strong&gt;: Classifies failures and applies fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent produces a structured artifact (exploration report, spec file, run summary, debug notes) that becomes the input for the next agent. This is the harness in action: each step creates a plan that keeps the next agent on track.&lt;/p&gt;

&lt;p&gt;In Cursor, each of these agents maps directly to a custom &lt;a href="https://cursor.com/docs/subagents" rel="noopener noreferrer"&gt;subagent&lt;/a&gt; -- a markdown file in &lt;code&gt;.cursor/agents/&lt;/code&gt; with a name, description, and focused prompt. The explorer subagent leverages Cursor's built-in &lt;a href="https://cursor.com/docs/agent/tools/browser" rel="noopener noreferrer"&gt;browser tool&lt;/a&gt; to navigate your app, take snapshots, read the live DOM, and capture network activity without leaving the IDE. That means the exploration report isn't hand-written -- it's generated from real page state.&lt;/p&gt;

&lt;p&gt;It seems reasonable that specialized agents like a testing agent, a quality assurance agent, or a code cleanup agent could do an even better job at sub-tasks across the software development lifecycle. That's exactly what this workflow does for E2E automation when Cypress is your tool.&lt;/p&gt;

&lt;p&gt;Evidence from the real UI flows into code. Code gets verified by a standard test run. Failures get handled with clear escalation rules instead of improvisation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Feedback Loop
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka7lzijupyz59ydgsmay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka7lzijupyz59ydgsmay.png" alt="The feedback loop: Explore → build → run; on failure, debug and re-run; if the UI changed, explore again and rebuild" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The loop in one sentence: Explore → build → run; on failure, debug and re-run; if the UI changed, explore again and rebuild.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This closed loop is where the efficiency gains come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Less rework&lt;/strong&gt;: Selectors and URLs come from live exploration, not memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster green builds&lt;/strong&gt;: Runner standardizes execution; debugger applies evidence-based fixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear escalation&lt;/strong&gt;: Stale DOM leads to re-explore; flaky patterns get documented&lt;/li&gt;
&lt;li&gt;Single-test discipline: Fix one failure, re-run, then move on&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Four Agents at a Glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Primary Inputs&lt;/th&gt;
&lt;th&gt;Primary Outputs&lt;/th&gt;
&lt;th&gt;Must Not&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cypress-browser-explorer&lt;/td&gt;
&lt;td&gt;Map scoped UI flows using Cursor's browser tool&lt;/td&gt;
&lt;td&gt;URL, steps, ticket scope&lt;/td&gt;
&lt;td&gt;Exploration report with selectors, network map&lt;/td&gt;
&lt;td&gt;Wander outside scope; invent selectors without proof&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cypress-builder&lt;/td&gt;
&lt;td&gt;Implement specs per team rules&lt;/td&gt;
&lt;td&gt;Exploration report&lt;/td&gt;
&lt;td&gt;Spec and support code; handoff to runner&lt;/td&gt;
&lt;td&gt;Skip exploration for unfamiliar pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cypress-runner&lt;/td&gt;
&lt;td&gt;Execute tests consistently&lt;/td&gt;
&lt;td&gt;Spec path, tags/env&lt;/td&gt;
&lt;td&gt;Pass/fail summary with failure context&lt;/td&gt;
&lt;td&gt;Fix failing tests (send to debugger)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cypress-debugger&lt;/td&gt;
&lt;td&gt;Classify failures, apply fixes&lt;/td&gt;
&lt;td&gt;Failure output, artifacts&lt;/td&gt;
&lt;td&gt;Code changes; handoff to runner or explorer&lt;/td&gt;
&lt;td&gt;Invent selectors when DOM has changed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: These agents are blueprints, not universal standards. Your stack, auth flow, and naming conventions will differ. Expect to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Edit agent instructions to reference your scripts and config&lt;/li&gt;
&lt;li&gt;Pair agents with project rules (lint, selector policy, test ID format)&lt;/li&gt;
&lt;li&gt;Add or trim steps where your org needs tighter guardrails&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The value is the shape of the workflow and clean handoffs, not a one-size-fits-all prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handoff Templates: Structured Artifacts That Bridge Context
&lt;/h3&gt;

&lt;p&gt;The key insight here was finding a way for agents to quickly understand the state of work when starting with a fresh context window. Structured handoffs are what prevent "context amnesia" between agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explorer → Builder&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Handoff to cypress-builder&lt;/span&gt;

Prompt: "Create cypress/e2e/[feature].cy.js using this exploration report:
&lt;span class="p"&gt;-&lt;/span&gt; Scope source: [quote from ticket/steps]
&lt;span class="p"&gt;-&lt;/span&gt; URL map: [ordered list]
&lt;span class="p"&gt;-&lt;/span&gt; Selector inventory: [element, purpose, selector, stability]
&lt;span class="p"&gt;-&lt;/span&gt; Network map: [method, pattern, suggested alias]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Builder → Runner&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Handoff to cypress-runner&lt;/span&gt;

Prompt: "Run &lt;span class="nt"&gt;&amp;lt;spec&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt; to verify the new/updated spec."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Runner → Debugger (on failure)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Handoff to cypress-debugger&lt;/span&gt;

Prompt: "Triage these E2E test failures (Cypress):

&lt;span class="gs"&gt;**Failing specs:**&lt;/span&gt; cypress/e2e/&lt;span class="nt"&gt;&amp;lt;spec&amp;gt;&lt;/span&gt;.cy.js

&lt;span class="gs"&gt;**Failures:**&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; [TEST-ID] &lt;span class="nt"&gt;&amp;lt;describe&amp;gt;&lt;/span&gt; &amp;gt; &lt;span class="nt"&gt;&amp;lt;it&amp;gt;&lt;/span&gt;
   Error: &lt;span class="nt"&gt;&amp;lt;message&amp;gt;&lt;/span&gt;
   Screenshot: cypress/screenshots/&lt;span class="nt"&gt;&amp;lt;path&amp;gt;&lt;/span&gt;

&lt;span class="gs"&gt;**Notes:**&lt;/span&gt; &lt;span class="nt"&gt;&amp;lt;auth&lt;/span&gt; &lt;span class="na"&gt;errors&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timeouts&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="na"&gt;etc.&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Debugger → Runner (after fix)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Handoff to cypress-runner&lt;/span&gt;

Prompt: "Re-run &lt;span class="nt"&gt;&amp;lt;spec&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt; to verify the fix for [TEST-ID]."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Debugger → Explorer (stale DOM)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Handoff to cypress-browser-explorer&lt;/span&gt;

Prompt: "Re-explore &lt;span class="nt"&gt;&amp;lt;URL&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="na"&gt;flow&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt; because selectors are stale for &lt;span class="nt"&gt;&amp;lt;spec&amp;gt;&lt;/span&gt;. Return updated report to builder."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Explorer Report Checklist
&lt;/h3&gt;

&lt;p&gt;When using the explorer agent, require a report that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope source&lt;/strong&gt;: Ticket, pasted steps, or URL/feature&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flow summary&lt;/strong&gt;: Scoped path, completion or blocked state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URL map&lt;/strong&gt;: Ordered URLs visited&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selector inventory&lt;/strong&gt;: Element, purpose, selector, stability rating&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network map&lt;/strong&gt;: Method, pattern, suggested intercept alias&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test strategy&lt;/strong&gt;: E2E vs shift-left rationale per scenario&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notes&lt;/strong&gt;: Gaps, fragile selectors, missing test hooks&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Steering the Harness: How to Keep Agents Aligned
&lt;/h3&gt;

&lt;p&gt;Rather than personally inspecting what the agents produce, we can make them better at producing it. The collection of specifications, quality checks, and workflow guidance that control different levels of loops inside the how loop is the agent's harness. The emerging practice of building and maintaining these harnesses, Harness Engineering, is how humans work on the loop.&lt;/p&gt;

&lt;p&gt;This is working "on the loop" rather than just "in the loop." You're not micromanaging every output. You're improving the harness so agents naturally produce better results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope every request&lt;/strong&gt;: URL, role, numbered steps, or ticket excerpt. The explorer especially needs to know what path to follow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encode standards in the repo&lt;/strong&gt;: Lint rules, skills files, and agent instructions should match. Otherwise the model follows whatever file it read most recently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use explicit handoffs&lt;/strong&gt;: Paste the structured blocks so the next agent gets data, not a summary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review diffs like any PR&lt;/strong&gt;: Generated specs need scrutiny, especially auth, network mocks, and assertions on money or permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep secrets out of chat&lt;/strong&gt;: Credentials belong in .env or your secret manager.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turn fixes into constraints&lt;/strong&gt;: When an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again. Add a lint rule, update the instructions, or create a check.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Review Gates: Keeping Humans on the Loop
&lt;/h3&gt;

&lt;p&gt;Agents execute evaluations automatically, but human oversight remains important for initial calibration and quality validation. Keep humans at judgment points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;After build&lt;/strong&gt;: Review spec structure, selector quality, and assertion coverage before treating a run as final.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After green&lt;/strong&gt;: Quick coverage and risk check before merge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After repeated debug failure&lt;/strong&gt;: If the same failure persists after three fix attempts, escalate to a person.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agents handle the repetitive cycle. Engineers keep the judgment calls.&lt;/p&gt;




&lt;h3&gt;
  
  
  Team-Owned Content
&lt;/h3&gt;

&lt;p&gt;The harness above doesn't define these items. Your team documents them in skills, rules, or extended agent files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authentication flows, secrets file layout, and commands patterns&lt;/li&gt;
&lt;li&gt;Exact run commands (local vs Docker), CI script names&lt;/li&gt;
&lt;li&gt;Tag/grep filters, base URLs per environment&lt;/li&gt;
&lt;li&gt;Selector policy beyond "prefer stable hooks" (&lt;code&gt;data-*&lt;/code&gt;, roles, aria)&lt;/li&gt;
&lt;li&gt;Test ID formats, coverage scripts, Lint/Cypress config conventions&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Why This Approach Works
&lt;/h3&gt;

&lt;p&gt;The principles from Anthropic and Martin Fowler's research explain why the four-agent pattern is effective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Constraints as multipliers:&lt;/strong&gt; Paradoxically, constraining the solution space makes agents more productive, not less. When an agent can generate anything, it wastes tokens exploring dead ends.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured artifacts bridge context&lt;/strong&gt;: Structured progress files and feature lists let a new agent quickly understand the state of work, analogous to a shift handoff between engineers who've never met.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback loops catch issues early&lt;/strong&gt;: Run follows build. Debug follows failure. Re-explore only when needed. This order cuts rework.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean escalation prevents endless retries&lt;/strong&gt;: If the DOM is wrong, hand to explorer. If three fixes fail, hand to a human. No guessing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The harness evolves&lt;/strong&gt;: Coding agents make it much cheaper to build more custom controls and more custom static analysis. Agents can help write structural tests, generate draft rules from observed patterns, scaffold custom linters, or create how-to guides from codebase archaeology.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Implementing This in Cursor with Subagents and Browser
&lt;/h3&gt;

&lt;p&gt;The four-agent workflow maps to four Cursor subagents: one markdown file per role under &lt;code&gt;.cursor/agents/&lt;/code&gt;, each with YAML frontmatter (name, description, model and any optional fields you need) plus a focused instructions and prompt body. How you create them is always the same—only the name, description, and instructions change to match explorer, builder, runner, or debugger.&lt;/p&gt;

&lt;p&gt;Below is one example (the browser explorer). The other three files use the identical shape; plug in the responsibilities from the agent table and handoff templates earlier in this article instead of pasting four full prompts here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cypress-browser-explorer&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inherit&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Explores the application UI using browser tools to discover selectors, network calls, and page flows for Cypress test development. Use when exploring a new feature, finding selectors, mapping user flows, building new tests, or when the user says to explore a page. ALWAYS launch the browser - never assume selectors without navigating and snapshotting.&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

You are a browser exploration specialist for E2E tests using Cypress.

When invoked:
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**Authenticate**&lt;/span&gt; if the target page requires login (see above)
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**Navigate**&lt;/span&gt; to the target URL or flow entry point
&lt;span class="p"&gt;3.&lt;/span&gt; &lt;span class="gs"&gt;**Take a snapshot**&lt;/span&gt; to capture the page structure
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Follow the exploration checklist**&lt;/span&gt; below for every flow

&lt;span class="gu"&gt;## Exploration Checklist&lt;/span&gt;

&lt;span class="gu"&gt;### Page URLs&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Record the entry page URL
&lt;span class="p"&gt;-&lt;/span&gt; Navigate through each step of the flow, recording intermediate URLs
&lt;span class="p"&gt;-&lt;/span&gt; Record the confirmation/success page URL

&lt;span class="gu"&gt;### Selectors (capture in priority order)&lt;/span&gt;
Priority order:
&lt;span class="p"&gt;1.&lt;/span&gt; &lt;span class="sb"&gt;`[data-cy]`&lt;/span&gt;, &lt;span class="sb"&gt;`[data-test]`&lt;/span&gt;, &lt;span class="sb"&gt;`[data-testid]`&lt;/span&gt; -- purpose-built for testing
&lt;span class="p"&gt;2.&lt;/span&gt; Any other &lt;span class="sb"&gt;`[data-*]`&lt;/span&gt; attribute -- stable, not styling-dependent
&lt;span class="p"&gt;3.&lt;/span&gt; Any &lt;span class="sb"&gt;`[test-*]`&lt;/span&gt; attribute (e.g. &lt;span class="sb"&gt;`test-auto`&lt;/span&gt;, &lt;span class="sb"&gt;`test-id`&lt;/span&gt;) -- also for testing
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="sb"&gt;`[role="..."]`&lt;/span&gt;, &lt;span class="sb"&gt;`[aria-label="..."]`&lt;/span&gt;, &lt;span class="sb"&gt;`[aria-labelledby]`&lt;/span&gt; -- semantic/accessible
&lt;span class="p"&gt;5.&lt;/span&gt; &lt;span class="sb"&gt;`label[for="..."]`&lt;/span&gt; + associated input -- form elements
&lt;span class="p"&gt;6.&lt;/span&gt; Stable visible text via &lt;span class="sb"&gt;`cy.contains()`&lt;/span&gt; -- only when text itself is the assertion
&lt;span class="p"&gt;7.&lt;/span&gt; Tag + attribute combos (e.g. &lt;span class="sb"&gt;`input[name="email"]`&lt;/span&gt;) -- last resort

&lt;span class="gs"&gt;**Never use**&lt;/span&gt;: CSS classes, generated IDs, tag names alone, XPath, positional selectors

&lt;span class="gu"&gt;### Network Calls&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Monitor network requests during the flow using browser tools
&lt;span class="p"&gt;-&lt;/span&gt; For each significant API call, record:
&lt;span class="p"&gt;  -&lt;/span&gt; HTTP method and URL pattern
&lt;span class="p"&gt;  -&lt;/span&gt; Suggested intercept alias (e.g., &lt;span class="sb"&gt;`get:cart-items`&lt;/span&gt;, &lt;span class="sb"&gt;`post:place-order`&lt;/span&gt;)
&lt;span class="p"&gt;  -&lt;/span&gt; Whether the response contains data needed for assertions
&lt;span class="p"&gt;-&lt;/span&gt; Pay attention to: auth calls, data fetching, form submissions, redirects

&lt;span class="gu"&gt;## Authentication&lt;/span&gt;

When the target page requires login (e.g. &lt;span class="sb"&gt;`/dashboard`&lt;/span&gt;, &lt;span class="sb"&gt;`/account`&lt;/span&gt;, any page that
redirects to &lt;span class="sb"&gt;`/login`&lt;/span&gt;), authenticate &lt;span class="gs"&gt;**before**&lt;/span&gt; exploring. Never ask the user
for credentials -- resolve them from project files.

&lt;span class="gu"&gt;### Credential Resolution (priority order)&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**`.env`**&lt;/span&gt; file in the project root -- parse &lt;span class="sb"&gt;`KEY=VALUE`&lt;/span&gt; lines.
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**`cypress.env.json`**&lt;/span&gt; in the project root -- parse JSON object.

&lt;span class="gu"&gt;## Handoff to cypress-builder&lt;/span&gt;

Prompt: "Create cypress/e2e/[feature].cy.js using this exploration report:
&lt;span class="p"&gt;-&lt;/span&gt; Scope source: [quote from ticket/steps]
&lt;span class="p"&gt;-&lt;/span&gt; URL map: [ordered list]
&lt;span class="p"&gt;-&lt;/span&gt; Selector inventory: [element, purpose, selector, stability]
&lt;span class="p"&gt;-&lt;/span&gt; Network map: [method, pattern, suggested alias]
&lt;span class="p"&gt;-&lt;/span&gt; Draft spec: [snippet if applicable]
"

&lt;span class="gu"&gt;## Output Format&lt;/span&gt;

Return a structured report:
&lt;span class="p"&gt;1.&lt;/span&gt; &lt;span class="gs"&gt;**Scope source:**&lt;/span&gt; Ticket, pasted steps, or URL/feature 
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**Flow summary**&lt;/span&gt;: Scoped path, completion or blocked state
&lt;span class="p"&gt;3.&lt;/span&gt; &lt;span class="gs"&gt;**URL map:**&lt;/span&gt; Ordered URLs visited
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Selector inventory:**&lt;/span&gt; Element, purpose, selector, stability rating
&lt;span class="p"&gt;5.&lt;/span&gt; &lt;span class="gs"&gt;**Network map:**&lt;/span&gt; Method, pattern, suggested intercept alias
&lt;span class="p"&gt;6.&lt;/span&gt; &lt;span class="gs"&gt;**Test strategy:**&lt;/span&gt; E2E vs shift-left rationale per scenario
&lt;span class="p"&gt;7.&lt;/span&gt; &lt;span class="gs"&gt;**Notes:**&lt;/span&gt; Gaps, fragile selectors, missing test hooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save as &lt;code&gt;.cursor/agents/cypress-browser-explorer.md&lt;/code&gt;. Add &lt;code&gt;cypress-builder.md&lt;/code&gt;, &lt;code&gt;cypress-runner.md&lt;/code&gt;, and &lt;code&gt;cypress-debugger.md&lt;/code&gt; the same way, then invoke with &lt;code&gt;/cypress-browser-explorer&lt;/code&gt; (and so on) or let the parent agent delegate from each file’s description.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor's browser tool powers the explorer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The explorer subagent is where Cursor's built-in browser tool becomes essential. Rather than asking an engineer to describe what's on screen, the agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Navigates&lt;/strong&gt; directly to URLs and follows multi-step flows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Takes snapshots&lt;/strong&gt; of live DOM state, capturing element structure, attributes, and text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reads selectors&lt;/strong&gt; from the actual page -- &lt;code&gt;data-testid&lt;/code&gt;, ARIA roles, form labels -- instead of guessing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Captures network activity&lt;/strong&gt; to identify API calls that need &lt;code&gt;cy.intercept()&lt;/code&gt; aliases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means the exploration report is evidence-based from the start. Selectors come from the real DOM, not from memory or a other sources that may be out of date. When the debugger detects stale selectors and hands back to the explorer, the browser tool re-navigates and captures the current state -- closing the feedback loop with live data.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why subagents fit this workflow
&lt;/h3&gt;

&lt;p&gt;Cursor subagents provide three properties that align with the harness model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context isolation&lt;/strong&gt;: Each subagent gets its own context window. The explorer's noisy DOM snapshots and network logs don't pollute the builder's context. The debugger's stack traces don't crowd the runner. This is the same isolation principle the harness pattern demands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution&lt;/strong&gt;: Multiple subagents run simultaneously, cutting wall-clock time on multi-spec work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured handoffs&lt;/strong&gt;: A subagent returns a final message to the parent agent. That message is the handoff artifact -- the exploration report, the run summary, the debug notes. The templates in this article become the return format each subagent follows.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  The Orchestration Pattern
&lt;/h3&gt;

&lt;p&gt;The parent agent acts as an orchestrator, coordinating the four subagents in sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Invoke &lt;code&gt;/cypress-browser-explorer&lt;/code&gt; with URL and steps -- get exploration report&lt;/li&gt;
&lt;li&gt;Pass the report to &lt;code&gt;/cypress-builder&lt;/code&gt; -- get spec files&lt;/li&gt;
&lt;li&gt;Hand spec paths to &lt;code&gt;/cypress-runner&lt;/code&gt; -- get pass/fail summary&lt;/li&gt;
&lt;li&gt;On failure, send details to &lt;code&gt;/cypress-debugger&lt;/code&gt; -- get fixes, then back to step 3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each handoff uses the structured templates from earlier in this article. The parent agent doesn't need deep knowledge of Cypress APIs—it routes data between specialists. This is the same orchestrator pattern Cursor's documentation recommends for complex workflows.&lt;/p&gt;

&lt;p&gt;If you use &lt;a href="https://docs.cypress.io/cloud/integrations/cloud-mcp" rel="noopener noreferrer"&gt;Cypress MCP&lt;/a&gt;, you can also point &lt;code&gt;/cypress-debugger&lt;/code&gt; at MCP tools &lt;strong&gt;to fetch failures from Cypress Cloud&lt;/strong&gt;. The debugger triages, patches the spec or support code, then uses the &lt;strong&gt;Debugger → Runner handoff&lt;/strong&gt; to re-run and stays in that loop until failures are addressed. That keeps run, fail, fetch, fix, re-run inside one workflow.&lt;/p&gt;




&lt;h3&gt;
  
  
  Closing
&lt;/h3&gt;

&lt;p&gt;Treating exploration, implementation, execution, and repair as separate agent roles mirrors how strong teams already work. The harness makes this pattern repeatable and easy to hand off inside the IDE.&lt;/p&gt;

&lt;p&gt;The largest efficiency win is the closed loop: run follows build, debug follows failure, re-explore only when the page structure actually changed.&lt;/p&gt;

&lt;p&gt;The most effective harnesses don't just constrain the agent. They create an environment where the agent naturally produces better output with less correction needed. This is a critical insight. The best harnesses aren't restrictive. They're enabling.&lt;/p&gt;

&lt;p&gt;Since shipping these specialized Cypress agents, I have hardly written tests by hand. The agents produce specs; I review them, merge when they are right, and when something drifts or misfires I adjust the agent definitions, skills, or prompts so the next run is better. The work shifts from typing cy.* to curating the harness -- continuous improvement on the automation itself, not just on individual tests.&lt;/p&gt;

&lt;p&gt;The loop is sequential, but each step stays small: one subagent, one job, less noise in context than doing it all in a single chat.&lt;/p&gt;

&lt;p&gt;Agent-driven development pays off when agents are blueprints you maintain. With Cursor subagents, those blueprints live in your repo as markdown files -- versioned, reviewable, and shared across the team. The browser tool gives the explorer agent direct access to your running app, so the entire loop from live UI to green test stays inside the IDE. Tighten instructions as your app and pipeline evolve. Keep guidance in the loop so automation stays trustworthy, not just clever.&lt;/p&gt;




&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic: &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" rel="noopener noreferrer"&gt;Effective Harnesses for Long-Running Agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Martin Fowler: &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;Harness engineering for coding agent users&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cursor: &lt;a href="https://cursor.com/docs/subagents" rel="noopener noreferrer"&gt;Subagents&lt;/a&gt; and &lt;a href="https://cursor.com/docs/agent/tools/browser" rel="noopener noreferrer"&gt;Browser Tool&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>cursor</category>
      <category>cypress</category>
    </item>
  </channel>
</rss>
