Gergely Szerovay for This is Angular

Posted on Oct 30 • Originally published at aiboosted.dev

Is This the Future of E2E Testing? How AI Automates Browser Tests from Plain English Requirements

#e2e #angular #react

End-to-end testing has always been the gold standard for verifying that your application actually works the way users experience it. But let's be honest: writing and maintaining E2E tests is often a thankless job. Flaky selectors, timing issues, brittle test code that breaks with every UI change, the pain points are real and numerous.

What if instead of writing test code, you could simply describe what you want to test in plain language, and have an AI assistant execute those tests for you, then generate a detailed report of what happened? That's exactly what becomes possible when you combine the Chrome DevTools MCP server with AI-powered testing workflows.

This is Part 2 of our Chrome DevTools MCP series. In Part 1 (Chrome DevTools MCP Server Guide), we explored the fundamentals of the Chrome DevTools MCP server and its tools for browser automation. Now, we're going to put those tools to work in a complete, practical workflow: taking a product requirements document, generating test scenarios, executing them automatically, and producing comprehensive test reports, all orchestrated by your AI assistant.

By the end of this article, you'll understand how to transform requirements documents into executable test scenarios and leverage AI to execute Gherkin-style tests via browser automation.

Let's dive in with a real example: a Tic-Tac-Toe game. I've generated both React and Angular versions of this demo app in the companion GitHub repository, so you can follow along with whichever framework you prefer. The repository contains everything you need: the complete PRD, architecture documentation, generated Gherkin scenarios, and test reports.

The Traditional E2E Testing Workflow (The Old Way)

Before we explore the AI-assisted approach, let's acknowledge how E2E testing typically works and why it's often neglected. The traditional workflow starts with writing specifications or a PRD to document what the application should do. Then you manually create test scenarios by thinking through user flows and edge cases. Next comes the real work: writing Puppeteer, Playwright, or Cucumber Gherkin scripts to translate those scenarios into code, followed by hours of debugging flaky selectors and timing issues. Finally, you maintain that test code as the UI inevitably changes, updating selectors, refactoring tests, and fixing breakages.

Introducing the AI-Assisted Testing Workflow

Here's what changes when you let AI orchestrate your E2E testing using the Chrome DevTools MCP server:

The Five-Step Process

Document requirements: PRD, you're doing this anyway
Generate Gherkin scenarios: AI translates requirements into test scenarios
AI executes tests: via Chrome DevTools MCP, no test code to write or maintain
AI generates detailed test reports
Iterate and maintain: Update requirements and regenerate tests as needed

Why This Works

The key insight is that we're working in natural language throughout the entire process. Your requirements get written in natural language (PRD), then translated into natural language test scenarios (Gherkin). The AI orchestrates test execution by understanding these natural language instructions, and finally generates test reports in natural language that anyone can read. There's no translation layer into brittle test code, no fighting with selectors, just clear descriptions of what should happen, executed and verified automatically.

Let's see this in action with our Tic-Tac-Toe example.

Step 1: Foundation: The Product Requirements

Every good testing strategy starts with clear requirements. For our Tic-Tac-Toe game, we have a comprehensive Product Requirements Document (PRD) that lays everything out. You can find the complete PRD in the GitHub repository.

The Product Requirements Document (PRD)

Our PRD covers everything we need to know about how the game should behave:

User Stories organized by category:

Core Gameplay: Placing marks, seeing turns, winning, draws
Game Management: New game, undo, move counter
User Experience: Hover effects, mobile support, accessibility

Functional Requirements (FR-1 through FR-19) that specify exact behaviors:

FR-1: The game must support two players: X and O
FR-2: Player X always goes first
FR-3: Players alternate turns
FR-4: A player wins by placing three marks in a row
FR-5: The game ends in a draw if all cells are filled
FR-6: Cells cannot be overwritten once marked
...and so on

Non-Functional Requirements covering performance, usability, and accessibility:

NFR-8: All interactive elements must have proper ARIA labels
NFR-9: Game must be fully keyboard navigable
NFR-10: Screen readers must be able to announce game state

Why This Documentation Matters

This PRD serves as the single source of truth for both development and testing. When requirements are this clear and structured, AI can extract testable behaviors, generate appropriate test scenarios, understand the expected outcomes, and verify that the implementation matches the specification.

The investment in writing good requirements pays dividends throughout the entire development lifecycle, but especially in testing. Think of it as building a solid foundation, everything else gets easier when you start with clarity.

Step 2: From Requirements to Gherkin Scenarios

Gherkin is a business-readable, domain-specific language for describing software behaviors without detailing how those behaviors are implemented. It uses a simple structure:

Feature: Description of the feature
  Scenario: Description of a specific behavior
    Given [initial context]
    When [action taken]
    Then [expected outcome]

If you're new to Gherkin, the Gherkin reference has everything you need to know about the syntax and best practices. But don't worry, you'll pick up the basics just by following along with our examples.

The Translation Process

Here's the simple prompt I used to transform our PRD into executable test scenarios:

Create Gherkin feature files based on the PRD (Product Requirements Document) located at:
`tic-tac-toe-react/src/features/tic-tac-toe/PRD.md`

Save the generated feature files to the following directory:
`e2e`

Additionally, create a README.md file in the same `e2e` folder that explains:
- How each feature file maps to specific PRD requirements
- The relationship between test scenarios and product specifications

That's all it took. The AI read through the PRD, identified testable behaviors, and organized them into four focused feature files covering everything from basic gameplay to accessibility.

Let me show you how it translates requirements into Gherkin. The following examples are from the core-gameplay.feature file in our repository. Take this functional requirement:

From this requirement:

FR-2: Player X always goes first

To this Gherkin scenario:

Scenario: First move is always Player X
  When the game starts
  Then the status should display "Player X's turn"
  And all cells should be empty
  And the move counter should show "0"

Or this user story:

From this user story:

As a player, I want to click on an empty cell to place my mark, so I can make my move

To this Gherkin scenario:

Scenario: Player places mark in empty cell
  When Player X clicks on cell 0
  Then cell 0 should display "X"
  And it should be Player O's turn
  And the move counter should show "1"

Pretty straightforward, right? The AI maintains the intent of each requirement while converting it into a testable format that anyone on your team can read and understand.

For our Tic-Tac-Toe game, the AI organized everything into four feature files covering core gameplay, game outcomes, controls, and accessibility. Rather than walk through all of them in detail, let me show you what the core gameplay scenarios look like, this will give you a good feel for how comprehensive these tests become.

The Traceability Matrix

One of the most valuable outputs is the traceability matrix that links each requirement to its corresponding test scenarios. You can see the complete traceability matrix in the repository:

Requirement ID	Test Scenario	Feature File	Status
FR-1	Two players support	core-gameplay.feature	✅
FR-2	X goes first	core-gameplay.feature	✅
FR-3	Turn alternation	core-gameplay.feature	✅
FR-4	Win detection	game-outcomes.feature	✅
FR-5	Draw detection	game-outcomes.feature	✅
NFR-8	ARIA labels	accessibility.feature	✅
NFR-9	Keyboard navigation	accessibility.feature	✅

This matrix ensures complete test coverage and makes it easy to verify that every requirement has been tested.

Step 3: AI Executes Tests Using Chrome DevTools MCP

Now comes the magic: executing these Gherkin scenarios automatically using the Chrome DevTools MCP server. Let's walk through the actual conversation that orchestrated the entire test suite execution.

The Initial Request

Before running tests, you need your application running locally. In our case, we have both React and Angular implementations available in the GitHub repository. To start the React version, run npm start from the tic-tac-toe-react directory, it'll launch on port 4305. For Angular, navigate to tic-tac-toe-ng and use ng serve. Once your app is running, you can begin testing.

Here's the prompt I used to kick off the test execution:

I have a React app located in the `tic-tac-toe-react` directory that's running on http://127.0.0.1:4305/.

Please use the Chrome DevTools MCP server to:
1. Execute the test steps defined in `e2e/core-gameplay.feature`
2. Generate a test report and save it to `e2e/core-gameplay.report.md`

For each Gherkin step in the report, include a text result indicating whether it passed or failed (and any relevant details).

That's it. No test code to write.

Advanced Approach: Using Subagents for Large Test Suites

For complex applications with many test scenarios, there's a more scalable approach using subagents. Instead of executing all test steps in a single AI conversation context, you can delegate each scenario to its own subagent. The main orchestrator agent maintains only the Gherkin file, coordinates subagent executions, and generates the final consolidated report, while each subagent independently runs its assigned scenario's steps.

This architecture provides significant benefits for large test suites. The main agent's context stays clean and focused on orchestration rather than getting cluttered with hundreds of detailed test execution steps. Each scenario executes in isolation with its own fresh context, preventing context pollution and token limit issues. Most importantly, this approach allows you to run comprehensive test suites with many scenarios and steps without hitting context length limitations.

Here's how you structure the prompt for subagent-based testing. The following example is for the Angular app and GitHub Copilot:

I have an Angular app located in the `tic-tac-toe-ng` directory 
that's running on http://127.0.0.1:4200/.

Please use the Chrome DevTools MCP server to execute the test 
scenarios defined in `e2e/core-gameplay.feature`.

For each scenario in the feature file:
1. Use `runSubagent` to execute the test steps for that specific scenario
2. Each subagent should test the scenario independently and return 
   a detailed text result indicating whether it passed or failed, including:
   - Pass/fail status for each Gherkin step in the scenario
   - Any relevant details, observations, or error messages
   - Verification data (e.g., DOM state, element content, computed styles)

After all subagents complete:
3. Aggregate all the scenario results and generate a comprehensive test report
4. Save the final report to `e2e/core-gameplay.report-ng-subagents.md`

The report should include the pass/fail status and details for 
each scenario returned by the subagents.

This subagent approach becomes essential as your test suite grows beyond a dozen scenarios or when individual scenarios involve complex multi-step interactions.

How AI Interprets Gherkin Steps

The AI reads the Gherkin scenarios and translates them into Chrome DevTools MCP tool calls. Let's look at a concrete example from the actual test execution:

Example: Preventing Cell Overwrites

Gherkin:

Given Player X clicks on cell 4
When Player O attempts to click on cell 4
Then cell 4 should still display "X"
And it should still be Player O's turn

AI's internal process:

1. navigate_page to http://127.0.0.1:4305/
2. take_snapshot → Find cell 4 (uid: "48_8")
3. click uid "48_8"
4. take_snapshot → Verify cell 4 now shows "Cell with X" (disabled)
5. click uid "49_8" → Attempt to click again
   Result: Click timed out (element disabled, click prevented)
6. take_snapshot → Verify state unchanged:
   - Cell 4: Still "Cell with X"
   - Status: Still "Current player: O"
   - Move counter: Still "Moves: 1"

The insight: The click timeout actually confirms the test passed! The cell is disabled, so the click cannot complete, which is exactly the expected behavior.

The Power of Accessibility-Driven Testing

Notice how every test relies on the accessibility tree rather than implementation details. This approach provides remarkable stability since accessibility properties remain unchanged even when you refactor CSS or restructure DOM elements. Tests verify what users actually experience rather than just confirming that certain classes are applied. When your accessibility tree is well-structured, you're simultaneously ensuring your app is accessible to screen reader users. The snapshots themselves are immediately understandable, showing "Button", "Empty cell", "Cell with X" instead of cryptic selectors or implementation details that make sense to humans.

Step 4: The Test Report: Automated Documentation

After executing all tests, the AI generates a comprehensive test report in Markdown format. Let's examine the structure and value of this automatically generated documentation. You can view the complete test report in the repository.

The report opens with a summary table, this immediately tells you the health of your application at a glance.

For each Gherkin step, the report includes:

Pass/Fail status (✅ or ❌)
Detailed result description explaining what was observed
Element UIDs referenced during the test
Evidence from Chrome DevTools interactions

Limitations and Considerations

While this AI-assisted testing approach offers significant advantages, let's be realistic about where it fits in your testing strategy.

Standardization for Production Use

For real-world usage in production environments, you'll want to establish consistency across your test artifacts. This means defining report templates that standardize how test results are formatted and presented, making it easier to compare results across different test runs and share reports with stakeholders. You'll also want to establish a coding standard for Gherkin files that defines naming conventions, step patterns, and structural guidelines, ensuring that AI-generated scenarios follow your team's conventions and integrate smoothly with your existing test documentation.

These standards become especially important when multiple team members are generating tests or when you're maintaining a large test suite over time. Consider creating example templates and style guides that you can reference in your prompts to the AI.

Requirements Are Your Foundation

The quality of your tests directly reflects the quality of your requirements. Vague or incomplete PRDs lead to vague or incomplete tests, there's no magic here. This approach shines when your requirements are specific, testable, and well-structured. The good news? The discipline of writing better requirements benefits your entire project, not just testing.

Complex Interactions May Need Detailed Steps

Simple interactions like clicking buttons and filling forms work automatically. But complex scenarios, think multi-step wizards, drag-and-drop operations, or dynamic content loading, often need more granular Gherkin steps. Instead of "When the user completes the checkout flow," you'll want to break it down: "When the user adds items to cart, And navigates to checkout, And fills in shipping address..." The AI can handle complexity, but it needs clear, step-by-step instructions.

Human Review Still Matters

AI can generate tests from your documented requirements, but it won't catch missing requirements, undocumented edge cases, or critical integration points you forgot to mention. A human reviewer should verify test coverage, ensure both happy paths and error cases are tested, and confirm that scenarios match real-world usage patterns.

Complement, Don't Replace

E2E tests through Chrome DevTools MCP excel at verifying user-facing behaviors in the actual browser. They complement but don't replace unit tests for isolated functions, integration tests for component interactions, or API tests for backend functionality. Think of this as the top layer of your testing pyramid or thropy, you still need the foundation underneath.

About the Author

My name is Gergely Szerovay, and I've worked as a data scientist and full-stack developer for many years. Currently, I work as a frontend tech lead focusing on Angular-based development. As part of my role, I'm constantly following how Angular and the frontend development scene in general is evolving.

To share my knowledge, I started the Angular Addicts monthly newsletter and publication in 2022, sending subscribers the best resources each month. Whether you're a seasoned Angular developer or just getting started, I've got you covered.

In the past year, with the rise of generative AI, our software development workflows have evolved rapidly. To closely follow this evolution, I decided to start building AI tools in public and publish my progress on AIBoosted.dev.

Join me on this learning journey:

🔥 Subscribe to Angular Addicts
🚀 Subscribe to AIBoosted.dev

Connect with me:

Learn more about Angular, TypeScript, React, and how to build AI-assisted development workflows that make you more productive!

DEV Community