DEV Community: Denis Skvortsov

Selective test execution mechanism with Playwright using GitHub Actions

Denis Skvortsov — Sat, 24 May 2025 09:49:04 +0000

TL;DR

Selective test execution: run only the tests related to the actual code changes.

Saves time and resources: speeds up the process and reduces CI/CD load, especially in cloud environments.

Works for both monorepos and split repositories: the solution fits projects with either a monorepo or separate frontend/backend repositories.

Uses GitHub Actions and Playwright: configures CI/CD to filter tests by tags and run only the relevant ones.

Example implementation: available in a public GitHub repository

Introduction

When I first faced the task of setting up testing, I realized that running the entire test suite after every code change is not just slow - it's wasteful. Especially in large projects where microservices, frontend, and backend each have their own test sets. In such setups, running all tests for every commit or PR is unnecessary and significantly slows down the development cycle.

For example, if a frontend developer changes a button or a small UI component, why would we run all backend tests? Or if a backend developer updates a single endpoint, there’s no point in triggering all UI tests - they’re completely unrelated. In those cases, tests become unnecessary overhead that slows down progress.

In some teams, test success is a hard requirement before merging code. If you can’t merge until all tests pass - but your change affects only a small part of the system - triggering all tests becomes a bottleneck.

To solve this, I implemented selective test execution - running only the tests that are actually affected by code changes. This approach helps save both time and infrastructure resources, making the testing and release process faster. In this article, I’ll share my experience and show how to set up such a mechanism using Playwright and GitHub Actions - whether you’re working in a monorepo or with separate frontend/backend repositories.

What problem are we solving?

Running the full e2e test suite on every code change is not always justified. In projects that include frontend, backend and shared modules, this often leads to problems like:

Developers waiting for CI feedback longer than it takes to write the fix itself
CI infrastructure usage skyrockets (and if you’re in the cloud - so do the costs)
Tests are triggered that have nothing to do with the actual changes

Typical scenarios:

A frontend developer changes a visual component, but all backend test chains are executed
A backend developer tweaks business logic, and the entire UI test suite runs too
A shared utility is updated, and suddenly no one knows whether to run a full regression or not

It gets even worse when tests are a mandatory check before merging. Even a minor change can block the entire process - while dozens or even hundreds of unrelated tests are waiting to complete.

My goal wasn’t just to speed up testing. I wanted CI to run only what truly matters. Automatically. Without manual rules or exception lists.

How i implemented it

To demonstrate how you can run only the relevant e2e tests, I put together a small monorepository with a simple but flexible architecture. This isn’t a production-ready setup - it’s a demo project, intentionally simplified to make the mechanism easy to understand and adapt to any real-world structure.

My main goals were:

Everything should work transparently - no magic config files
The implementation should be reusable - something you can easily apply to another project
The structure should remain flexible - easy to extend with new services without rewriting the pipeline

Project structure:

├── frontend/                    # Frontend applications
│   └── apps/                    # Frontend microservices
│       ├── microservice1/
│       └── microservice2/
│       └── shared/             # Common frontend components

├── backend/                     # Backend services
│   └── apps/                    # Backend microservices
│       ├── microservice3/
│       ├── microservice4/
│       └── microservice5/

├── .github/                     # GitHub Actions
│   ├── workflows/               # CI/CD configuration
│   │   └── e2e-runner.yml       # Selective test runner
│   └── preconditions/           # Reusable actions
│       └── e2e/                 # E2E tests environment setup
│           └── action.yml       # Composite action for environment setup

└── tests/                       # Tests
    └── e2e/                     # E2E tests

Everything is split into logical areas - just like in real projects. There's frontend, backend, a shared area and a dedicated folder for e2e tests. The entire pipeline runs on GitHub Actions. Dependencies are installed using a reusable precondition action. The core logic is tag-based.

The tests themselves are intentionally primitive - because this article isn’t about test coverage or scenarios, it’s about the mechanism for running only what matters. Everything else is just context.

1. Tagging Tests

To determine which tests should run, I use tags directly in the test definitions. The logic is simple: if a PR changes apps/microservice3, the CI system looks for e2e tests tagged with @apps/microservice3 and runs only those.

Each tag is tied to a specific microservice or module. For example:

  test('Test 3', { tag: '@apps/microservice3' }, async ({ ui }) => {
    await ui.google.goto();
    await ui.google.openAppsMenu();
    await ui.google.assertServiceVisible('Maps');
    await ui.google.assertServiceVisible('Gmail');
  });

If microservice3 is affected, this test will be included. If another service is changed, it will be skipped-even if it's in the same file.

I chose the format @apps/<service-name> for two reasons:

It matches the actual project structure.
It’s easy to extract from file paths using grep and sed.

Another example:

  test('Test 1', { tag: '@apps/microservice1' }, async ({ ui }) => {
    await ui.google.goto();
    await ui.google.openAppsMenu();
    await ui.google.assertServiceVisible('YouTube');
    await ui.google.assertServiceVisible('YouTube Music');
  });

If a test is not tagged, it won’t be picked up during selective test runs. For example:

test('Test 5', async ({ ui }) => {
  await ui.google.goto();
  await ui.google.openAppsMenu();
  await ui.google.assertServiceVisible('Calendar');
});

This test will only run during a full test execution (for example, when shared or tests/e2e changes are detected).

Tagging is a core part of this mechanism. It doesn’t require additional tooling and can be maintained manually if needed.

Playwright docs on tags: Test annotations

2. Analyzing сhanges and mapping to tags

The next step is to determine what exactly has changed in the PR and which tests should be triggered. This is done in GitHub Actions using a simple git diff.

What we look for:

Changes in paths like apps/microserviceX → means a specific service was affected
Changes in shared → means potentially all services are affected
Changes in tests/e2e → likely the tests themselves were modified, so the full suite should be executed

Here’s a simplified example from the Find changes step:

changed_apps=$(git diff --name-only origin/main HEAD | grep -E "/apps/[^/]+/" || true)
changed_test_files=$(git diff --name-only origin/main HEAD | grep -E "^tests/e2e/" || true)

Then we normalize the paths. For example, frontend/apps/microservice1/pages/page.tsx becomes apps/microservice1, which maps directly to the tag @apps/microservice1.

full_paths=$(echo "$changed_apps" | sed -E 's#^(.*/apps/[^/]+)/.*#\1#' | sort -u | paste -sd "|" -)
test_paths=$(echo "$full_paths" | tr '|' '\n' | sed -E 's#.*(apps/[^/]+)#\1#' | paste -sd "|" -)

Then what happens:

If the changes include shared or tests/e2e, we skip filtering and run all tests
If only specific services were changed, we convert the paths into tags, which are passed to Playwright via --grep

The result is saved into a GitHub Actions output variable called test_scope, which is used in later steps:

echo "test_scope=$test_paths" >> $GITHUB_OUTPUT

If test_scope ends up empty, it means either:

No relevant changes were found
Or no tests are tagged to match the changes

In such cases, you can either skip test execution or fall back to running the full suite - depending on your project’s policy.

3. Running tests with `grep`

Once we’ve determined which parts of the code have changed and built a list of tags, the next step is simply passing those tags to Playwright. For that, we use the built-in --grep flag, which filters the tests by tags.

Example run:

If the PR affects both apps/microservice1 and apps/microservice4, the resulting scope looks like this:

test_scope=apps/microservice1|apps/microservice4

Then the tests are run like this:

npx playwright test --grep "@apps/microservice1|@apps/microservice4" || true

Why || true?

Because grep might return no matches - for example:

The service is new and doesn’t have tests yet
The change is minor and not yet covered
Tests exist but lack proper tags

In these cases, we don’t want the CI to fail. It’s okay to skip test execution if no relevant tests were found. Failure should only happen when tests exist and they fail - not when there are simply no tests to run.

When do we run all tests?

If the changes include any of the following:

A shared directory (frontend or backend)
Files in tests/e2e
Or if test_scope is completely empty (depending on your policy)

We simply skip filtering and run the full suite:

if: needs.detect-changes.outputs.test_scope == ''

And then:

npx playwright test

All of this is wrapped inside a pipeline with two jobs: selective-tests and all-tests. Based on the content of test_scope, only one of them runs.

What we get in the end

After implementing selective test execution on GitHub Actions, the benefits were immediately obvious - both in terms of human effort and infrastructure usage.

Faster builds
CI no longer runs the entire test suite for every little change. If only one microservice is affected, only its e2e tests are triggered. As a result:

Tests run significantly faster
You get feedback almost immediately
Releases are no longer blocked by irrelevant checks

Resource savings
If you're running tests in the cloud, CI/CD can get expensive - especially when all tests are triggered for every pull request. Selective execution helps save actual money by avoiding unnecessary resource usage.

Easy to maintain
You don’t need to manually build complex test mappings. The entire system is driven by:

A simple tag in each test (@apps/xxx)
git diff and bash (2 lines of code)
grep

The approach is flexible and adaptable to any structure. You can filter by modules, features, user flows, folders, roles - whatever makes sense in your project.

Works out of the box
This mechanism performs well in:

Monorepos
Projects with split frontend and backend
Any project where e2e tests are a mandatory merge requirement

Want real-world scenarios?
You can explore common situations - shared modules, new services, test changes, combinations - in the project’s README

Example GitHub Actions workflow

name: E2E Tests
on:
  pull_request:
    branches: [ main ]

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      test_scope: ${{ steps.scope.outputs.test_scope }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Find changes
        id: changes
        run: |
          echo "🔄  Analyzing changes in PR..."
          changed_apps=$(git diff --name-only origin/main HEAD | grep -E "/apps/[^/]+/" || true)
          changed_test_files=$(git diff --name-only origin/main HEAD | grep -E "^tests/e2e/" || true)

          echo "📦 Files changed:"
          [ -z "$changed_apps" ] && echo "  No changes in files" || echo "$changed_apps" | sed 's/^/  /'

          echo "🧪 E2E tests:"
          [ -z "$changed_test_files" ] && echo "  No changes in e2e tests" || echo "$changed_test_files" | sed 's/^/  /'

          echo "✨ Affected services:"
          full_paths=$(echo "$changed_apps" | sed -E 's#^(.*/apps/[^/]+)/.*#\1#' | sort -u | paste -sd "|" -)
          [ -z "$full_paths" ] && echo " No changes in services" || echo "$full_paths" | tr '|' '\n' | sed 's/^/  /'

          test_paths=$(echo "$full_paths" | tr '|' '\n' | sed -E 's#.*(apps/[^/]+)#\1#' | paste -sd "|" -)
          test_files=$(echo "$changed_test_files" | paste -sd "|" -)

          echo "test_paths=${test_paths}" >> $GITHUB_OUTPUT
          echo "changed_test=${test_files}" >> $GITHUB_OUTPUT

      - name: Check shared modules and modified e2e tests
        id: shared
        run: |
          test_paths="${{ steps.changes.outputs.test_paths }}"
          changed_test="${{ steps.changes.outputs.changed_test }}"

          echo "🔄 Checking shared modules and e2e tests:"

          has_shared=$(echo "$test_paths" | tr '|' '\n' | grep -q "shared" && echo "true" || echo "false")
          has_e2e_changes=$([ ! -z "$changed_test" ] && echo "true" || echo "false")

          $has_shared && echo "⚠️ Changes in shared modules detected" && echo "$test_paths" | tr '|' '\n' | grep "shared" | sed 's/^/  /'
          $has_e2e_changes && echo "⚠️ Changes in e2e tests detected" && echo "$changed_test" | tr '|' '\n' | sed 's/^/  /'

          { $has_shared || $has_e2e_changes; } && test_paths="" || echo "✅ No changes in shared modules or e2e tests"

          echo "test_paths=$test_paths" >> $GITHUB_OUTPUT

      - name: Set final scope
        id: scope
        run: |
          test_paths="${{ steps.shared.outputs.test_paths }}"
          echo "🔄 Result:"
          [ -z "$test_paths" ] && echo " ✅ All tests will be run" || {
            echo " ✅ Running tests for:"
            echo "$test_paths" | tr '|' '\n' | sed 's/^/    /'
          }
          echo "test_scope=$test_paths" >> $GITHUB_OUTPUT

  selective-tests:
    needs: detect-changes
    if: needs.detect-changes.outputs.test_scope != ''
    timeout-minutes: 60
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.52.0
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/preconditions/e2e
      - name: Run selective tests
        run: npx playwright test --grep "${{ needs.detect-changes.outputs.test_scope }}" || true

  all-tests:
    needs: detect-changes
    if: needs.detect-changes.outputs.test_scope == ''
    timeout-minutes: 60
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.52.0
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/preconditions/e2e
      - name: Run all tests
        run: npx playwright test

Can you run only the tests that were changed?

This is a common question that usually comes up first:
"If we already know which files have changed - why not run only the tests that were also modified?"

At first glance, it sounds reasonable. But in practice it’s not always a good idea.

Why I didn’t go down that path

In my setup, I run all tests related to the changed components, not just the .spec.ts files that were edited in the PR. The reason is simple - in some projects, tests are not properly isolated.

A typical example:

one test modifies some data,
another test relies on the state left behind by the first one,
or there’s a shared setUp that affects the behavior of all tests.

This is an anti-pattern, of course, but it's still common - especially in older or fast-growing projects.

If you run only the modified test files, you can easily end up with a green build - even though the logic is actually broken.
That’s why I went with a safer approach:
we filter by the affected areas, but within that area, we run all the tests - even if the test files themselves weren't changed.

What about `--only-changed`?

Playwright does support a --only-changed flag that runs only the .spec.ts files which were changed.
It can be helpful as a temporary solution or for small PRs.

But it’s important to understand: this flag only works at the file level.
It doesn’t track which modules or helpers were changed, nor does it understand which tests depend on them.

So if you modify something like auth.ts, which is used across all tests - --only-changed won’t pick that up, because the test files themselves didn’t change.

What you can do if you want to take it further

If your tests are well-isolated and your project architecture is clean, it’s possible to run only those tests that are truly affected by a change.

At the git diff stage, you can track not only changes in app code, but also updates to shared modules or e2e utilities. To identify which tests depend on those changes, you can build a dependency graph between source files and test files - using tools like ts-morph or dependency-cruiser. This graph reveals which .spec.ts files import or transitively rely on the modified code.

This approach works best when your test structure is modular and the dependency graph is accurate and regularly maintained. Without that, the risk of silently skipping important tests increases.

That’s why in many cases, it's safer and more predictable to run all tests within the affected scope - even if only a small change was made.
It keeps things simple and reduces the chance of hidden regressions in CI.

Conclusion

Selective execution of e2e tests isn’t a silver bullet - but it’s an effective way to reduce build time, lower CI load, and speed up the development cycle. It’s especially valuable in projects where tests are a required condition for merging, and every minute of CI time matters.

All you need is to:

tag your tests appropriately
analyze changes in the PR
and run tests using the --grep filter

The solution is simple, flexible and reusable. You can apply it in monorepos, in projects with separate frontend and backend repos or even as a shared internal standard within your team.

If you want to take this further - like running only the truly modified tests - that’s also possible if architecture makes it achievable.

If you’ve solved a similar problem differently - I’d love to hear how you approached it. Feel free to share your solutions.

Practical use of Cursor and MCP Playwright in test automation

Denis Skvortsov — Wed, 23 Apr 2025 11:12:02 +0000

Introduction
This article is not a documentation review. It's my personal experience working with the Cursor and Playwright MCP tools for frontend test automation (JavaScript/TypeScript). I want to share how these tools truly help in everyday work, especially when it comes to writing or improving automated tests.
The topic of automation is actively developing, and, perhaps in the near future, we will be able to automate tests almost by sheer willpower. But for now, I'll share what I've already found. Many developers are actively using AI for code generation, such as GitHub Copilot, but for me, Cursor in combination with Playwright MCP provides much more context. While these are far from perfect tools-sometimes they fix unnecessary places or don't work exactly as expected-they are still a powerful addition to the process.
In this article, I will show how to use Cursor and Playwright MCP in practice, with test examples and explanations of where and how these tools help, as well as situations where they are better avoided.
Here are the key points I would like to highlight in this article that may be useful when working with Cursor and MCP Playwright:
Cursor can automatically add data-testid or getByRole to the code using a screenshot of the screen and highlighting the desired area where it should be inserted. This is especially useful for beginner automation engineers.
MCP Playwright is useful for labeled pages, where elements have accessible attributes (e.g., data-testid, aria-label, etc.).
MCP Playwright is ideal for writing simple scripts that do not require complex preconditions and works well for basic E2E testing.
If you're using Cursor and MCP Playwright together, the best solution is to organize a monorepository, where both the frontend and automated tests are located in one repository.

Additionally, it's important to remember that while these tools speed up the process, they do not replace engineering work, especially for more complex scenarios and integration tests.

Test Architecture: Basic principles

Before implementing MCP Playwright, it's important to revisit your testing pyramid and define what e2e tests mean in your context. The testing pyramid can vary for different teams and projects. E2E and integration tests can overlap, but MCP Playwright is useful only for simple scenarios.

My tests and examples will be based on the website google.com, and I will use TypeScript for writing the tests. Google provides good accessibility for element recognition through MCP Playwright, which allows easy creation of stable tests for basic operations, such as searching on the page. I will also use the test architecture from the article Simple and Effective E2E Test Architecture for Playwright.

How MCP Playwright and Cursor fit different types of tests:
E2E Tests: MCP Playwright is ideal for simple e2e tests where you need to check UI interactions, especially if the page is properly labeled. For example, checking a button or an input field. Here, MCP automatically generates code with the correct locators, making test writing easier. However, it's important to remember that MCP Playwright works best with simple scenarios.
Integration Tests: For integration tests where you need to interact with both the UI and the API, traditional automation methods are better. Cursor can help with generating templates and structuring mocks, but the logic of interacting between multiple modules and handling errors requires an engineering approach.
Unit Tests: Unit tests remain the responsibility of developers, but for automation engineers, Cursor can be a useful tool for speeding up test creation, especially for typical cases.

Example of using MCP Playwright in e2e tests
Open Google
Search for "Playwright MCP"
Verify that the search results contain the text "Model Context Protocol"
Open Google again
Search for "Playwright automation"
Verify that the search results contain the word "testing"

This test interacts solely with the UI. For such simple scenarios, MCP Playwright will generate the necessary code, ensuring stable and fast test execution.

Integration Tests: An AQA Engineer's perspective
When it comes to more complex integration tests, Cursor can provide a test template and a structure for mocks, but for complex logic, such as error handling or interaction between multiple modules, you need to apply engineering thinking.
Example
Cursor generated a test for a successful scenario, but it didn't account for all the negative cases related to errors. I had to refine it manually, adding error handling:

This is an example of how Cursor helps with templates, but you still need to tweak it for more complex scenarios. Also, Cursor tends to overcomplicate solutions, so it's important to monitor what it generates and double-check its work.
Cursor works well in monorepositories and can reuse existing UI and API methods. To limit the complexity of the generated code, you can use Cursor rules. Cursor Rules Documentation.
Of course, this is not a solution to all problems, but it can significantly simplify your work.

Unit tests: How Cursor helps speed up test creation
Unit tests are primarily the responsibility of developers, but as an automation engineer, you can also get involved, especially if you actively participate in writing code and understand its logic. Cursor helps generate tests quickly, saving time, especially for common cases.
Example from Practice
With Cursor, I generated several tests for functions like formatCurrency and parseQuery. Here's an example of how it might look:

Cursor suggested edge cases like null, NaN, and empty strings, which are often forgotten in tests. This saves time, eliminates human errors, and helps make tests more comprehensive.

How to use Cursor to add data-testid or modify buttons on the frontend

Cursor has a great feature for adding a screenshot of the page to a request, which is especially useful if you have a monorepository for both frontend and e2e tests. This is a great opportunity to immediately fix a component and label it in an accessible way while writing a script. Of course after making changes, you should verify the display, but this significantly simplifies the process for beginner automation engineers or those just getting familiar with the frontend.

Monorepositories - A great solution for these tools

Although I haven't had the opportunity to work with these tools in a single monorepository where both frontend and backend are located, in practice, when all the automated tests are in the same repository as the frontend, this solution has proven to be excellent. Firstly, you can immediately label the necessary elements on the frontend, and with Cursor, you can quickly pull the required API request or examine how the frontend is structured overall.
Using a monorepository in combination with these tools gives you flexibility and improves collaboration between teams, especially in terms of writing and maintaining tests.

When to use MCP Playwright sensibly
MCP Playwright is really good for testing interfaces when the page is properly labeled, and the scenarios are simple. It speeds up testing and eliminates the need to manually write locators.

Use MCP Playwright if:
The UI is properly labeled with accessibility attributes.
The page being tested is simple: a form, a button, or a basic navigation.
The test logic is as simple as possible.

Don't Use MCP Playwright if:
The interface is custom, and the elements do not have accessible attributes.
The test logic is complex, involving multiple interactions or APIs.
You need to check internal values, not just the visual state.

Conclusion
MCP Playwright and Cursor are powerful tools that can significantly speed up test creation and help with routine tasks. However, they cannot replace the work of an engineer, especially in more complex scenarios.
Cursor is excellent for generating test templates and working with common cases. It helps quickly set up the structure of tests but requires refinement for more complex scenarios, such as error handling or complex logic.
MCP works perfectly when the page is properly labeled with accessibility attributes. It eliminates the need to manually write locators, speeding up the testing process. But if the elements on the page are dynamic or not properly labeled, MCP may not be as reliable, and you should consider using other methods for element search.

In any case, while these tools significantly speed up testing, they do not replace full engineering work. It's important to remember that for complex scenarios or integration tests, despite the convenience of these tools, you will still need manual intervention for correct logic setup.
If you have already used MCP and Cursor in your projects, I'd be interested to hear how these tools helped you in real-world cases.