Kevin Gomes

Posted on Apr 11

Test Automation (Playwright + Claude + GitHub Actions + GitHub Pages)

#automation #cicd #claude #testing

📊 Live Allure test report: https://kevingomes17.github.io/dashboard-and-playwright/

Republished from CI on every push to main. Each run appends to a 30-run history kept in the gh-pages branch — pass-rate trends, duration trends, and per-test history arrows accumulate over time.

Overview

A working example of an end-to-end testing pipeline that pairs a real React + Vite dashboard with a Playwright suite generated by Claude Code, executed in GitHub Actions, and reported via Allure with trend history published to GitHub Pages.

The system-under-test is a microservices observability dashboard (base-dashboard/) — five widgets fed by a deterministic mock data layer. The interesting half of the project is how it gets tested: how the tests are authored, how they run in CI, and how the report becomes a long-lived public artifact instead of a single-run zip file you have to download to read.

Two sibling projects in one repo:

dashboard-and-playwright/
├── base-dashboard/      → React 19 + Vite + Tailwind dashboard (system-under-test)
├── playwright/          → Standalone Playwright e2e project
└── .github/workflows/   → GitHub Actions: test → generate Allure → publish to Pages

Architectural Goals

AI-assisted test authoring. Use Claude Code with the playwright-cli skill to generate *.spec.ts files from natural-language flow descriptions, eliminating the boilerplate cycle of "open browser, find selector, copy-paste, write assertion".
Deterministic fixtures, not stub servers. The dashboard's mock data layer uses a seeded RNG so charts and traces produce identical values every render. No network mocking, no test-only data-testid attributes — tests use real text and ARIA roles.
Two complementary reporters, not one. Keep Playwright's built-in HTML reporter (the only place to access the interactive trace viewer) and add Allure for severity/feature/owner grouping, behavior-driven navigation, and trend graphs across runs.
History as a first-class artifact. Most CI pipelines treat each run as standalone — you download a zip, unzip it, look at it, throw it away. Publishing Allure to GitHub Pages keeps history accumulating in the gh-pages branch so flakiness and regressions show up as trends, not point-in-time snapshots.
Zero authentication friction. The published report is just a static site on Pages — anyone with the URL can browse pass-rate trends, severity grouping, and per-test history without logging into GitHub.

System Architecture

┌──────────────────────────────────────────────────────────────────┐
│                  1. Test Generation (local)                       │
│                                                                  │
│   Developer prompt ──→ Claude Code ──→ playwright-cli (browser)  │
│                                                                  │
│   "Verify the latency chart filters by service"                  │
│        │                                                         │
│        ▼                                                         │
│   playwright-cli open http://localhost:5173                      │
│   playwright-cli snapshot          (a11y tree → element refs)    │
│   playwright-cli click e42         (each action prints TS code)  │
│        │                                                         │
│        ▼                                                         │
│   playwright/tests/charts.spec.ts  (committed to git)            │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                  2. Test Execution                                │
│                                                                  │
│   Local                          GitHub Actions (Ubuntu)         │
│   ─────                          ────────────────────             │
│   npx playwright test            actions/checkout@v4              │
│   ↓                              actions/setup-node@v4            │
│   webServer auto-starts          actions/setup-java@v4 (Temurin)  │
│   `npm run dev` in               npm ci × 2                       │
│   ../base-dashboard              playwright install --with-deps   │
│   ↓                              npx playwright test              │
│   8 tests, ~2.5s                 8 tests, retries: 2              │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                  3. Dual Reporting Layer                          │
│                                                                  │
│   Built-in HTML reporter         allure-playwright reporter      │
│   ──────────────────────         ──────────────────────────       │
│   playwright-report/             allure-results/                  │
│   ├─ index.html                  ├─ <uuid>-result.json × 8        │
│   ├─ trace.zip per failure       ├─ <uuid>-attachment.*           │
│   └─ Interactive trace viewer    └─ epic/feature/story/severity   │
│      (DOM snapshots, network,                                     │
│       action timeline)                                            │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                  4. History Merge + Publish                       │
│                                                                  │
│   actions/checkout@v4 (gh-pages branch, continue-on-error)       │
│      ↓                                                            │
│   cp -R gh-pages/history/* playwright/allure-results/history/    │
│      ↓                                                            │
│   npx allure generate allure-results -o allure-history --clean   │
│      ↓                                                            │
│   peaceiris/actions-gh-pages@v4 (only on main)                   │
│      ↓                                                            │
│   gh-pages branch  ──→  https://kevingomes17.github.io/           │
│                                  dashboard-and-playwright/        │
└──────────────────────────────────────────────────────────────────┘

Core Components

1. AI-Assisted Test Generation (Claude Code + playwright-cli)

The repo ships with a playwright-cli skill at .claude/skills/playwright-cli/SKILL.md plus reference docs (test-generation.md, playwright-tests.md, element-attributes.md, etc.). When Claude Code is started from the repo root, it auto-discovers the skill and can drive a real browser via the globally-installed playwright-cli tool.

The workflow is conversational, not boilerplate-driven:

The developer describes a flow in plain English: "Generate a Playwright test that opens the dashboard, switches the latency chart to the payments service, and verifies the chart re-renders. Save it to playwright/tests/payments-filter.spec.ts."
Claude opens a browser (playwright-cli open http://localhost:5173) and reads the accessibility tree (playwright-cli snapshot), which lists every interactive element with stable refs (e1, e2, …).
Each action Claude takes (click e42, fill e7 "payments") prints the equivalent Playwright TypeScript: await page.getByRole('combobox').click(). The skill is configured so Claude collects these lines.
Claude stitches the lines into a @playwright/test spec, applying this repo's conventions automatically:
- Scope queries to page.locator("main") (the sidebar nav has its own "Services"/"Traces" buttons that would otherwise collide).
- Use getByRole("combobox") for Base UI Select triggers and getByRole("option", { name, exact: true }) for items rendered into a portal.
- Apply Allure metadata via the local tagTest({ feature, story, severity }) helper (covered below).
Claude runs the new test once with npx playwright test path/to/new.spec.ts to verify it passes before committing.

The result: a 10-line natural-language prompt produces a 30–60 line spec that follows the repo's conventions and works on the first run, because Claude validates against the live mock data while writing it.

2. Deterministic Mock Data (the secret to stable tests)

The dashboard's data layer (base-dashboard/apps/web/src/lib/metrics/mock-client.ts) is a typed MetricsClient interface with a MockMetricsClient implementation that uses a seeded mulberry32 RNG. The same input always produces the same chart values, error counts, and trace spans. This is the foundation that makes everything else possible:

Tests use real visible text as locators (getByText("Latency (p50 / p95 / p99)")) instead of data-testid attributes added solely for testing.
No request-mocking layer, no MSW config, no fixture files — the deterministic source IS the fixture.
Allure trends are meaningful — pass-rate variance reflects real regressions, not flaky randomness.

When the project moves to a real backend, the MetricsClient interface becomes the seam: drop in a PrometheusMetricsClient for production, keep MockMetricsClient for tests.

3. Local Test Execution

cd playwright
npx playwright test                    # 8 tests, headless, ~2.5s
npx playwright test --ui               # interactive runner, time-travel
npx playwright test --headed           # watch the browser drive itself

The playwright.config.ts has a webServer block that auto-starts the Vite dev server in ../base-dashboard:

webServer: {
  command: "npm run dev",
  cwd: "../base-dashboard",
  url: "http://localhost:5173",
  reuseExistingServer: !process.env.CI,
  timeout: 120_000,
}

reuseExistingServer: !process.env.CI means locally Playwright attaches to an already-running npm run dev; in CI it always starts a fresh process. One config, both environments.

4. Allure Reporter Wiring

The playwright/playwright.config.ts reporter array runs all reporters in parallel — none of them disable the others:

reporter: process.env.CI
  ? [
      ["html", { open: "never" }],
      ["allure-playwright", { resultsDir: "allure-results", detail: true }],
      ["github"],
    ]
  : [
      ["list"],
      ["allure-playwright", { resultsDir: "allure-results" }],
    ],

Two npm packages are needed, and the distinction trips a lot of people up:

allure-playwright — the reporter plugin. Hooks into Playwright's reporter API and writes raw JSON results into allure-results/. Without this, Playwright has no way to emit Allure-format output.
allure-commandline — a Node wrapper around the Java-based Allure CLI. Brings the allure binary into node_modules/.bin/ so you can run npm run allure:generate/open/serve locally without a system-wide install. Requires a JRE on PATH (the bundled binary is just a launcher).

Both go in playwright/devDependencies. The package.json exposes three scripts:

"allure:generate": "allure generate allure-results -o allure-report --clean",
"allure:open":     "allure open allure-report",
"allure:serve":    "allure serve allure-results"

allure:serve is the most useful one locally — it generates a temp report from allure-results/ and opens a browser tab in one command.

5. Test Annotations for Behavior-Driven Grouping

The Allure report's killer organizational view is Behaviors — tests grouped as Epic → Feature → Story. To populate it, each test calls a tiny helper at the top of its body:

// playwright/tests/_allure.ts
import { epic, feature, owner, severity, story } from "allure-js-commons"

export type Severity = "blocker" | "critical" | "normal" | "minor" | "trivial"

export async function tagTest(opts: { feature: string; story: string; severity: Severity }) {
  await epic("Dashboard")
  await owner("dashboard-team")
  await feature(opts.feature)
  await story(opts.story)
  await severity(opts.severity)
}

The functions are imported from allure-js-commons directly (allure-playwright's own allure re-export is marked deprecated in current versions). The underscore prefix on _allure.ts keeps it out of Playwright's *.spec.ts testMatch.

Per-test usage is one block at the top of the test body:

test("latency chart filters by service", async ({ page }) => {
  await tagTest({
    feature: "Latency chart",
    story: "Service filter",
    severity: "normal",
  })
  // ...test body unchanged
})

Result: the Allure home page Behaviors tab shows Dashboard → Latency chart → Service filter → latency chart filters by service with severity badges, and the Categories view groups failures by severity automatically.

6. CI/CD: GitHub Actions Pipeline

The full workflow (.github/workflows/playwright.yml) runs on every push to main and on every pull_request:

name: Playwright tests
on:
  push:
    branches: [main]
  pull_request:

permissions:
  contents: write

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
          cache-dependency-path: |
            base-dashboard/package-lock.json
            playwright/package-lock.json
      - uses: actions/setup-java@v4
        with:
          distribution: temurin
          java-version: "17"

      - name: Install base-dashboard dependencies
        working-directory: base-dashboard
        run: npm ci
      - name: Install Playwright dependencies
        working-directory: playwright
        run: npm ci
      - name: Install Chromium
        working-directory: playwright
        run: npx playwright install --with-deps chromium

      - name: Run Playwright tests
        working-directory: playwright
        run: npx playwright test

Key choices:

Two npm ci steps because base-dashboard/ and playwright/ are sibling projects, not workspace members. Both lockfiles are listed in cache-dependency-path so the npm cache invalidates correctly.
actions/setup-java@v4 with Temurin 17 — required by the Allure CLI which is a JVM application. The npm-installed allure-commandline is just a launcher; it needs a real JRE on PATH.
permissions: contents: write at the job level — required so the default GITHUB_TOKEN can later push to the gh-pages branch. Without this the deploy step will silently fail with a 403.
No browser matrix — Chromium only. ~70% of users, fastest iteration. Easy to extend with a projects array later.

7. History Merging & Allure Report Generation

This is the part that makes Allure trends work. Allure history isn't magic — it's a history/ subfolder containing JSON files (history.json, history-trend.json, categories-trend.json, etc.). To get trends across runs, the previous run's history/ folder has to be copied into the current run's allure-results/ before allure generate runs.

      - name: Get Allure history from gh-pages
        if: ${{ !cancelled() }}
        uses: actions/checkout@v4
        with:
          ref: gh-pages
          path: gh-pages
        continue-on-error: true

      - name: Merge previous Allure history into new results
        if: ${{ !cancelled() }}
        run: |
          mkdir -p playwright/allure-results/history
          if [ -d gh-pages/history ]; then
            cp -R gh-pages/history/* playwright/allure-results/history/ || true
          fi

      - name: Build Allure report
        if: ${{ !cancelled() }}
        working-directory: playwright
        run: npx allure generate allure-results -o ../allure-history --clean

A few notes on what's going on:

continue-on-error: true on the gh-pages checkout — the very first workflow run doesn't have a gh-pages branch yet, so the checkout 404s. The flag turns the failure into a warning so the workflow proceeds.
The if [ -d ... ] guard in the merge step makes the same first-run case a no-op instead of a failure on cp.
if: ${{ !cancelled() }} on every Allure step — this ensures the Allure report is generated and published even when tests fail. Failed runs are exactly when you want the trend data the most.
No third-party Docker action. An earlier draft used simple-elf/allure-report-action@v1.9 which was tempting because it looked turnkey, but its Dockerfile pulls openjdk:8-jre-alpine — an image Docker Hub deprecated and removed in 2023. The action is unmaintained. Replacing it with two plain shell steps eliminated the dependency, made the pipeline transparent, and removed a Docker pull from every CI run.

8. Publishing to GitHub Pages

      - name: Upload Allure report
        if: ${{ !cancelled() }}
        uses: actions/upload-artifact@v4
        with:
          name: allure-report
          path: allure-history
          retention-days: 14

      - name: Deploy Allure report to gh-pages
        if: ${{ !cancelled() && github.ref == 'refs/heads/main' }}
        uses: peaceiris/actions-gh-pages@v4
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_branch: gh-pages
          publish_dir: allure-history

Two things happen for every run:

actions/upload-artifact@v4 publishes the report as the allure-report artifact at the bottom of the run summary page. This works for every run including PRs — reviewers can grab the artifact, unzip, and open index.html in any browser without GitHub Pages access.
peaceiris/actions-gh-pages@v4 force-pushes allure-history/ to the gh-pages branch — but only on main. The if guard github.ref == 'refs/heads/main' keeps PR runs from polluting the live site. PRs see the artifact; merges to main update Pages.

A successful main run produces a fully-static Allure site at https://<owner>.github.io/<repo>/ within ~30–60 seconds of the workflow finishing.

9. Viewing the Live Report

The first time you push the workflow there's a one-time GitHub repo setup:

Settings → Pages → Source → "Deploy from a branch" → Branch gh-pages → Folder / (root) → Save.
After the first successful main run creates the gh-pages branch, the green banner at the top of the same Settings → Pages screen shows "Your site is live at …" within ~1 minute.

For this project the live URL is https://kevingomes17.github.io/dashboard-and-playwright/. Useful deep links once you're there:

#suites — file/describe-block view (matches the spec layout)
#behaviors — epic → feature → story view (where the tagTest annotations pay off)
#categories — failures grouped by Allure category
#graph — pass-rate trend, severity pie, duration chart (the killer feature; populated after the second main run)
#timeline — Gantt-style view of every test in the run

Per-test detail pages show a history arrow in the top-right that lists previous runs with status, duration, and direct links — the closest thing to "git blame for test results" any reporter offers.

Data Flow

Developer describes a flow in natural language to Claude Code in the repo root.
Claude opens a browser via playwright-cli, reads the accessibility tree, drives the page, and emits Playwright TypeScript for each action.
Claude stitches the actions into a *.spec.ts file under playwright/tests/, applies the local conventions (tagTest, page.locator("main"), role-based locators), and runs the test once to verify.
Developer commits and pushes. GitHub Actions checks out the repo, installs Node + Java + Chromium, and runs npx playwright test from playwright/.
Playwright's webServer config auto-starts npm run dev in ../base-dashboard and waits for http://localhost:5173. Tests execute against the deterministic mock data.
Two reporters write in parallel: the built-in HTML reporter into playwright-report/, the Allure reporter into allure-results/.
The workflow checks out the gh-pages branch into gh-pages/ (continue-on-error for the first run).
A shell step copies gh-pages/history/* into playwright/allure-results/history/ so the about-to-run allure generate picks up the previous run's trend data.
npx allure generate produces a fully-static allure-history/ site that includes both the new run and the merged history.
actions/upload-artifact@v4 uploads allure-history/ as the allure-report artifact (every run).
peaceiris/actions-gh-pages@v4 force-pushes allure-history/ to the gh-pages branch (only on main).
GitHub Pages picks up the new commit and refreshes the live site within ~30–60 seconds.

Key Libraries & Tools

@playwright/test — End-to-end testing framework with built-in test runner, deterministic auto-waits, ARIA-aware locators, parallel workers, retries, and webServer lifecycle management.
allure-playwright — Reporter plugin that hooks into Playwright's reporter API and writes raw JSON results plus attachments into allure-results/.
allure-commandline — npm wrapper around the Java-based Allure CLI; brings the allure binary into node_modules/.bin/ for local report generation.
allure-js-commons — The non-deprecated facade exporting epic, feature, story, severity, owner, link, etc. for in-test annotations.
Claude Code + playwright-cli skill — AI-assisted test authoring. The skill at .claude/skills/playwright-cli/SKILL.md lets Claude drive a browser, read the accessibility tree, and emit Playwright TypeScript.
GitHub Actions — CI runtime. Key actions: actions/checkout@v4, actions/setup-node@v4, actions/setup-java@v4, actions/upload-artifact@v4, peaceiris/actions-gh-pages@v4.
GitHub Pages — Static hosting for the published Allure report. The gh-pages branch is the publish source.

Configuration Highlights

The full playwright.config.ts driving everything above:

import { defineConfig, devices } from "@playwright/test"

const PORT = 5173
const BASE_URL = `http://localhost:${PORT}`

export default defineConfig({
  testDir: "./tests",
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 1 : undefined,
  reporter: process.env.CI
    ? [
        ["html", { open: "never" }],
        ["allure-playwright", { resultsDir: "allure-results", detail: true }],
        ["github"],
      ]
    : [
        ["list"],
        ["allure-playwright", { resultsDir: "allure-results" }],
      ],
  use: {
    baseURL: BASE_URL,
    trace: "on-first-retry",
    screenshot: "only-on-failure",
  },
  projects: [
    {
      name: "chromium",
      use: { ...devices["Desktop Chrome"] },
    },
  ],
  webServer: {
    command: "npm run dev",
    cwd: "../base-dashboard",
    url: BASE_URL,
    reuseExistingServer: !process.env.CI,
    timeout: 120_000,
    stdout: "pipe",
    stderr: "pipe",
  },
})

Lessons & Trade-offs

A few things I'd flag for anyone setting up the same pipeline:

Don't pick "Allure vs. built-in HTML" — run both. Playwright's interactive trace viewer (DOM snapshots, network log, action timeline replay) is the single most valuable failure-debugging surface in any e2e framework, and Allure cannot replicate it. Allure's strengths (history, severity grouping, BDD navigation) are orthogonal. Stack the reporters in the array; they don't fight.
Determinism first, locators second. I spent zero time managing test flakiness on this project, and the entire reason is the seeded RNG in the mock data layer. If your fixtures are random, even the best locator strategy in the world produces tests that are fragile by design. Make the source-of-truth deterministic before you write the first assertion.
Skip Docker actions for Allure. The community space has several "one-step Allure for GitHub Actions" Docker actions. Most of them pin to old, unmaintained base images that get yanked from Docker Hub eventually. Two shell steps (cp + npx allure generate) take 30 seconds to write, never break, and you can read what they do.
Allure's history needs a place to live. If you only ever upload Allure as a per-run artifact, you're not getting the feature people install Allure for. Either publish to Pages (this project), to an S3 bucket, or to Allure TestOps. Without persistent history, most of the report's value is left on the table.
if: ${{ !cancelled() }} on every report step. A failing test run is exactly when you want the report. Default if: success() skips the Allure steps when tests fail, which is the opposite of what you want.
Java is unavoidable for Allure. The CLI is a JVM app. allure-commandline doesn't change that — it's just a launcher. CI needs actions/setup-java; dev machines need a JRE. There's no pure-Node Allure CLI as of 2026.

Summary

This project is a small, complete reference for what a modern Playwright + Allure pipeline looks like in 2026 — natural-language test authoring with Claude Code, deterministic mock data that makes locator strategies stable, two reporters running in parallel (the built-in HTML for trace viewing, Allure for grouping and trends), and a published GitHub Pages site that accumulates history across every push to main. The interesting bits aren't the framework choices — they're the seams: where determinism comes from, why two reporters are better than one, how history actually gets merged across CI runs, and which third-party actions to avoid. The live report at https://kevingomes17.github.io/dashboard-and-playwright/ is the same site you'd build for your own project by following this pattern — clone the workflow, adjust the test mappings, and you've got a public test dashboard with no infrastructure to maintain beyond a gh-pages branch.

DEV Community