π Live Allure test report: https://kevingomes17.github.io/dashboard-and-playwright/
Republished from CI on every push to
main. Each run appends to a 30-run history kept in thegh-pagesbranch β pass-rate trends, duration trends, and per-test history arrows accumulate over time.
Overview
A working example of an end-to-end testing pipeline that pairs a real React + Vite dashboard with a Playwright suite generated by Claude Code, executed in GitHub Actions, and reported via Allure with trend history published to GitHub Pages.
The system-under-test is a microservices observability dashboard (base-dashboard/) β five widgets fed by a deterministic mock data layer. The interesting half of the project is how it gets tested: how the tests are authored, how they run in CI, and how the report becomes a long-lived public artifact instead of a single-run zip file you have to download to read.
Two sibling projects in one repo:
dashboard-and-playwright/
βββ base-dashboard/ β React 19 + Vite + Tailwind dashboard (system-under-test)
βββ playwright/ β Standalone Playwright e2e project
βββ .github/workflows/ β GitHub Actions: test β generate Allure β publish to Pages
Architectural Goals
-
AI-assisted test authoring. Use Claude Code with the
playwright-cliskill to generate*.spec.tsfiles from natural-language flow descriptions, eliminating the boilerplate cycle of "open browser, find selector, copy-paste, write assertion". -
Deterministic fixtures, not stub servers. The dashboard's mock data layer uses a seeded RNG so charts and traces produce identical values every render. No network mocking, no test-only
data-testidattributes β tests use real text and ARIA roles. - Two complementary reporters, not one. Keep Playwright's built-in HTML reporter (the only place to access the interactive trace viewer) and add Allure for severity/feature/owner grouping, behavior-driven navigation, and trend graphs across runs.
-
History as a first-class artifact. Most CI pipelines treat each run as standalone β you download a zip, unzip it, look at it, throw it away. Publishing Allure to GitHub Pages keeps history accumulating in the
gh-pagesbranch so flakiness and regressions show up as trends, not point-in-time snapshots. - Zero authentication friction. The published report is just a static site on Pages β anyone with the URL can browse pass-rate trends, severity grouping, and per-test history without logging into GitHub.
System Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Test Generation (local) β
β β
β Developer prompt βββ Claude Code βββ playwright-cli (browser) β
β β
β "Verify the latency chart filters by service" β
β β β
β βΌ β
β playwright-cli open http://localhost:5173 β
β playwright-cli snapshot (a11y tree β element refs) β
β playwright-cli click e42 (each action prints TS code) β
β β β
β βΌ β
β playwright/tests/charts.spec.ts (committed to git) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. Test Execution β
β β
β Local GitHub Actions (Ubuntu) β
β βββββ ββββββββββββββββββββ β
β npx playwright test actions/checkout@v4 β
β β actions/setup-node@v4 β
β webServer auto-starts actions/setup-java@v4 (Temurin) β
β `npm run dev` in npm ci Γ 2 β
β ../base-dashboard playwright install --with-deps β
β β npx playwright test β
β 8 tests, ~2.5s 8 tests, retries: 2 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. Dual Reporting Layer β
β β
β Built-in HTML reporter allure-playwright reporter β
β ββββββββββββββββββββββ ββββββββββββββββββββββββββ β
β playwright-report/ allure-results/ β
β ββ index.html ββ <uuid>-result.json Γ 8 β
β ββ trace.zip per failure ββ <uuid>-attachment.* β
β ββ Interactive trace viewer ββ epic/feature/story/severity β
β (DOM snapshots, network, β
β action timeline) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. History Merge + Publish β
β β
β actions/checkout@v4 (gh-pages branch, continue-on-error) β
β β β
β cp -R gh-pages/history/* playwright/allure-results/history/ β
β β β
β npx allure generate allure-results -o allure-history --clean β
β β β
β peaceiris/actions-gh-pages@v4 (only on main) β
β β β
β gh-pages branch βββ https://kevingomes17.github.io/ β
β dashboard-and-playwright/ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
1. AI-Assisted Test Generation (Claude Code + playwright-cli)
The repo ships with a playwright-cli skill at .claude/skills/playwright-cli/SKILL.md plus reference docs (test-generation.md, playwright-tests.md, element-attributes.md, etc.). When Claude Code is started from the repo root, it auto-discovers the skill and can drive a real browser via the globally-installed playwright-cli tool.
The workflow is conversational, not boilerplate-driven:
- The developer describes a flow in plain English: "Generate a Playwright test that opens the dashboard, switches the latency chart to the
paymentsservice, and verifies the chart re-renders. Save it toplaywright/tests/payments-filter.spec.ts." - Claude opens a browser (
playwright-cli open http://localhost:5173) and reads the accessibility tree (playwright-cli snapshot), which lists every interactive element with stable refs (e1,e2, β¦). - Each action Claude takes (
click e42,fill e7 "payments") prints the equivalent Playwright TypeScript:await page.getByRole('combobox').click(). The skill is configured so Claude collects these lines. - Claude stitches the lines into a
@playwright/testspec, applying this repo's conventions automatically:- Scope queries to
page.locator("main")(the sidebar nav has its own "Services"/"Traces" buttons that would otherwise collide). - Use
getByRole("combobox")for Base UI Select triggers andgetByRole("option", { name, exact: true })for items rendered into a portal. - Apply Allure metadata via the local
tagTest({ feature, story, severity })helper (covered below).
- Scope queries to
- Claude runs the new test once with
npx playwright test path/to/new.spec.tsto verify it passes before committing.
The result: a 10-line natural-language prompt produces a 30β60 line spec that follows the repo's conventions and works on the first run, because Claude validates against the live mock data while writing it.
2. Deterministic Mock Data (the secret to stable tests)
The dashboard's data layer (base-dashboard/apps/web/src/lib/metrics/mock-client.ts) is a typed MetricsClient interface with a MockMetricsClient implementation that uses a seeded mulberry32 RNG. The same input always produces the same chart values, error counts, and trace spans. This is the foundation that makes everything else possible:
- Tests use real visible text as locators (
getByText("Latency (p50 / p95 / p99)")) instead ofdata-testidattributes added solely for testing. - No request-mocking layer, no MSW config, no fixture files β the deterministic source IS the fixture.
- Allure trends are meaningful β pass-rate variance reflects real regressions, not flaky randomness.
When the project moves to a real backend, the MetricsClient interface becomes the seam: drop in a PrometheusMetricsClient for production, keep MockMetricsClient for tests.
3. Local Test Execution
cd playwright
npx playwright test # 8 tests, headless, ~2.5s
npx playwright test --ui # interactive runner, time-travel
npx playwright test --headed # watch the browser drive itself
The playwright.config.ts has a webServer block that auto-starts the Vite dev server in ../base-dashboard:
webServer: {
command: "npm run dev",
cwd: "../base-dashboard",
url: "http://localhost:5173",
reuseExistingServer: !process.env.CI,
timeout: 120_000,
}
reuseExistingServer: !process.env.CI means locally Playwright attaches to an already-running npm run dev; in CI it always starts a fresh process. One config, both environments.
4. Allure Reporter Wiring
The playwright/playwright.config.ts reporter array runs all reporters in parallel β none of them disable the others:
reporter: process.env.CI
? [
["html", { open: "never" }],
["allure-playwright", { resultsDir: "allure-results", detail: true }],
["github"],
]
: [
["list"],
["allure-playwright", { resultsDir: "allure-results" }],
],
Two npm packages are needed, and the distinction trips a lot of people up:
-
allure-playwrightβ the reporter plugin. Hooks into Playwright's reporter API and writes raw JSON results intoallure-results/. Without this, Playwright has no way to emit Allure-format output. -
allure-commandlineβ a Node wrapper around the Java-based Allure CLI. Brings theallurebinary intonode_modules/.bin/so you can runnpm run allure:generate/open/servelocally without a system-wide install. Requires a JRE onPATH(the bundled binary is just a launcher).
Both go in playwright/devDependencies. The package.json exposes three scripts:
"allure:generate": "allure generate allure-results -o allure-report --clean",
"allure:open": "allure open allure-report",
"allure:serve": "allure serve allure-results"
allure:serve is the most useful one locally β it generates a temp report from allure-results/ and opens a browser tab in one command.
5. Test Annotations for Behavior-Driven Grouping
The Allure report's killer organizational view is Behaviors β tests grouped as Epic β Feature β Story. To populate it, each test calls a tiny helper at the top of its body:
// playwright/tests/_allure.ts
import { epic, feature, owner, severity, story } from "allure-js-commons"
export type Severity = "blocker" | "critical" | "normal" | "minor" | "trivial"
export async function tagTest(opts: { feature: string; story: string; severity: Severity }) {
await epic("Dashboard")
await owner("dashboard-team")
await feature(opts.feature)
await story(opts.story)
await severity(opts.severity)
}
The functions are imported from allure-js-commons directly (allure-playwright's own allure re-export is marked deprecated in current versions). The underscore prefix on _allure.ts keeps it out of Playwright's *.spec.ts testMatch.
Per-test usage is one block at the top of the test body:
test("latency chart filters by service", async ({ page }) => {
await tagTest({
feature: "Latency chart",
story: "Service filter",
severity: "normal",
})
// ...test body unchanged
})
Result: the Allure home page Behaviors tab shows Dashboard β Latency chart β Service filter β latency chart filters by service with severity badges, and the Categories view groups failures by severity automatically.
6. CI/CD: GitHub Actions Pipeline
The full workflow (.github/workflows/playwright.yml) runs on every push to main and on every pull_request:
name: Playwright tests
on:
push:
branches: [main]
pull_request:
permissions:
contents: write
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
cache-dependency-path: |
base-dashboard/package-lock.json
playwright/package-lock.json
- uses: actions/setup-java@v4
with:
distribution: temurin
java-version: "17"
- name: Install base-dashboard dependencies
working-directory: base-dashboard
run: npm ci
- name: Install Playwright dependencies
working-directory: playwright
run: npm ci
- name: Install Chromium
working-directory: playwright
run: npx playwright install --with-deps chromium
- name: Run Playwright tests
working-directory: playwright
run: npx playwright test
Key choices:
-
Two
npm cisteps becausebase-dashboard/andplaywright/are sibling projects, not workspace members. Both lockfiles are listed incache-dependency-pathso the npm cache invalidates correctly. -
actions/setup-java@v4with Temurin 17 β required by the Allure CLI which is a JVM application. The npm-installedallure-commandlineis just a launcher; it needs a real JRE onPATH. -
permissions: contents: writeat the job level β required so the defaultGITHUB_TOKENcan later push to thegh-pagesbranch. Without this the deploy step will silently fail with a 403. -
No browser matrix β Chromium only. ~70% of users, fastest iteration. Easy to extend with a
projectsarray later.
7. History Merging & Allure Report Generation
This is the part that makes Allure trends work. Allure history isn't magic β it's a history/ subfolder containing JSON files (history.json, history-trend.json, categories-trend.json, etc.). To get trends across runs, the previous run's history/ folder has to be copied into the current run's allure-results/ before allure generate runs.
- name: Get Allure history from gh-pages
if: ${{ !cancelled() }}
uses: actions/checkout@v4
with:
ref: gh-pages
path: gh-pages
continue-on-error: true
- name: Merge previous Allure history into new results
if: ${{ !cancelled() }}
run: |
mkdir -p playwright/allure-results/history
if [ -d gh-pages/history ]; then
cp -R gh-pages/history/* playwright/allure-results/history/ || true
fi
- name: Build Allure report
if: ${{ !cancelled() }}
working-directory: playwright
run: npx allure generate allure-results -o ../allure-history --clean
A few notes on what's going on:
-
continue-on-error: trueon the gh-pages checkout β the very first workflow run doesn't have agh-pagesbranch yet, so the checkout 404s. The flag turns the failure into a warning so the workflow proceeds. -
The
if [ -d ... ]guard in the merge step makes the same first-run case a no-op instead of a failure oncp. -
if: ${{ !cancelled() }}on every Allure step β this ensures the Allure report is generated and published even when tests fail. Failed runs are exactly when you want the trend data the most. -
No third-party Docker action. An earlier draft used
simple-elf/allure-report-action@v1.9which was tempting because it looked turnkey, but its Dockerfile pullsopenjdk:8-jre-alpineβ an image Docker Hub deprecated and removed in 2023. The action is unmaintained. Replacing it with two plain shell steps eliminated the dependency, made the pipeline transparent, and removed a Docker pull from every CI run.
8. Publishing to GitHub Pages
- name: Upload Allure report
if: ${{ !cancelled() }}
uses: actions/upload-artifact@v4
with:
name: allure-report
path: allure-history
retention-days: 14
- name: Deploy Allure report to gh-pages
if: ${{ !cancelled() && github.ref == 'refs/heads/main' }}
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_branch: gh-pages
publish_dir: allure-history
Two things happen for every run:
-
actions/upload-artifact@v4publishes the report as theallure-reportartifact at the bottom of the run summary page. This works for every run including PRs β reviewers can grab the artifact, unzip, and openindex.htmlin any browser without GitHub Pages access. -
peaceiris/actions-gh-pages@v4force-pushesallure-history/to thegh-pagesbranch β but only onmain. Theifguardgithub.ref == 'refs/heads/main'keeps PR runs from polluting the live site. PRs see the artifact; merges tomainupdate Pages.
A successful main run produces a fully-static Allure site at https://<owner>.github.io/<repo>/ within ~30β60 seconds of the workflow finishing.
9. Viewing the Live Report
The first time you push the workflow there's a one-time GitHub repo setup:
-
Settings β Pages β Source β "Deploy from a branch" β Branch
gh-pagesβ Folder/ (root)β Save. - After the first successful
mainrun creates thegh-pagesbranch, the green banner at the top of the same Settings β Pages screen shows "Your site is live at β¦" within ~1 minute.
For this project the live URL is https://kevingomes17.github.io/dashboard-and-playwright/. Useful deep links once you're there:
-
#suitesβ file/describe-block view (matches the spec layout) -
#behaviorsβ epic β feature β story view (where thetagTestannotations pay off) -
#categoriesβ failures grouped by Allure category -
#graphβ pass-rate trend, severity pie, duration chart (the killer feature; populated after the secondmainrun) -
#timelineβ Gantt-style view of every test in the run
Per-test detail pages show a history arrow in the top-right that lists previous runs with status, duration, and direct links β the closest thing to "git blame for test results" any reporter offers.
Data Flow
- Developer describes a flow in natural language to Claude Code in the repo root.
- Claude opens a browser via
playwright-cli, reads the accessibility tree, drives the page, and emits Playwright TypeScript for each action. - Claude stitches the actions into a
*.spec.tsfile underplaywright/tests/, applies the local conventions (tagTest,page.locator("main"), role-based locators), and runs the test once to verify. - Developer commits and pushes. GitHub Actions checks out the repo, installs Node + Java + Chromium, and runs
npx playwright testfromplaywright/. - Playwright's
webServerconfig auto-startsnpm run devin../base-dashboardand waits forhttp://localhost:5173. Tests execute against the deterministic mock data. - Two reporters write in parallel: the built-in HTML reporter into
playwright-report/, the Allure reporter intoallure-results/. - The workflow checks out the
gh-pagesbranch intogh-pages/(continue-on-error for the first run). - A shell step copies
gh-pages/history/*intoplaywright/allure-results/history/so the about-to-runallure generatepicks up the previous run's trend data. -
npx allure generateproduces a fully-staticallure-history/site that includes both the new run and the merged history. -
actions/upload-artifact@v4uploadsallure-history/as theallure-reportartifact (every run). -
peaceiris/actions-gh-pages@v4force-pushesallure-history/to thegh-pagesbranch (only onmain). - GitHub Pages picks up the new commit and refreshes the live site within ~30β60 seconds.
Key Libraries & Tools
-
@playwright/testβ End-to-end testing framework with built-in test runner, deterministic auto-waits, ARIA-aware locators, parallel workers, retries, andwebServerlifecycle management. -
allure-playwrightβ Reporter plugin that hooks into Playwright's reporter API and writes raw JSON results plus attachments intoallure-results/. -
allure-commandlineβ npm wrapper around the Java-based Allure CLI; brings theallurebinary intonode_modules/.bin/for local report generation. -
allure-js-commonsβ The non-deprecated facade exportingepic,feature,story,severity,owner,link, etc. for in-test annotations. -
Claude Code +
playwright-cliskill β AI-assisted test authoring. The skill at.claude/skills/playwright-cli/SKILL.mdlets Claude drive a browser, read the accessibility tree, and emit Playwright TypeScript. -
GitHub Actions β CI runtime. Key actions:
actions/checkout@v4,actions/setup-node@v4,actions/setup-java@v4,actions/upload-artifact@v4,peaceiris/actions-gh-pages@v4. -
GitHub Pages β Static hosting for the published Allure report. The
gh-pagesbranch is the publish source.
Configuration Highlights
The full playwright.config.ts driving everything above:
import { defineConfig, devices } from "@playwright/test"
const PORT = 5173
const BASE_URL = `http://localhost:${PORT}`
export default defineConfig({
testDir: "./tests",
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 1 : undefined,
reporter: process.env.CI
? [
["html", { open: "never" }],
["allure-playwright", { resultsDir: "allure-results", detail: true }],
["github"],
]
: [
["list"],
["allure-playwright", { resultsDir: "allure-results" }],
],
use: {
baseURL: BASE_URL,
trace: "on-first-retry",
screenshot: "only-on-failure",
},
projects: [
{
name: "chromium",
use: { ...devices["Desktop Chrome"] },
},
],
webServer: {
command: "npm run dev",
cwd: "../base-dashboard",
url: BASE_URL,
reuseExistingServer: !process.env.CI,
timeout: 120_000,
stdout: "pipe",
stderr: "pipe",
},
})
Lessons & Trade-offs
A few things I'd flag for anyone setting up the same pipeline:
- Don't pick "Allure vs. built-in HTML" β run both. Playwright's interactive trace viewer (DOM snapshots, network log, action timeline replay) is the single most valuable failure-debugging surface in any e2e framework, and Allure cannot replicate it. Allure's strengths (history, severity grouping, BDD navigation) are orthogonal. Stack the reporters in the array; they don't fight.
- Determinism first, locators second. I spent zero time managing test flakiness on this project, and the entire reason is the seeded RNG in the mock data layer. If your fixtures are random, even the best locator strategy in the world produces tests that are fragile by design. Make the source-of-truth deterministic before you write the first assertion.
-
Skip Docker actions for Allure. The community space has several "one-step Allure for GitHub Actions" Docker actions. Most of them pin to old, unmaintained base images that get yanked from Docker Hub eventually. Two shell steps (
cp+npx allure generate) take 30 seconds to write, never break, and you can read what they do. - Allure's history needs a place to live. If you only ever upload Allure as a per-run artifact, you're not getting the feature people install Allure for. Either publish to Pages (this project), to an S3 bucket, or to Allure TestOps. Without persistent history, most of the report's value is left on the table.
-
if: ${{ !cancelled() }}on every report step. A failing test run is exactly when you want the report. Defaultif: success()skips the Allure steps when tests fail, which is the opposite of what you want. -
Java is unavoidable for Allure. The CLI is a JVM app.
allure-commandlinedoesn't change that β it's just a launcher. CI needsactions/setup-java; dev machines need a JRE. There's no pure-Node Allure CLI as of 2026.
Summary
This project is a small, complete reference for what a modern Playwright + Allure pipeline looks like in 2026 β natural-language test authoring with Claude Code, deterministic mock data that makes locator strategies stable, two reporters running in parallel (the built-in HTML for trace viewing, Allure for grouping and trends), and a published GitHub Pages site that accumulates history across every push to main. The interesting bits aren't the framework choices β they're the seams: where determinism comes from, why two reporters are better than one, how history actually gets merged across CI runs, and which third-party actions to avoid. The live report at https://kevingomes17.github.io/dashboard-and-playwright/ is the same site you'd build for your own project by following this pattern β clone the workflow, adjust the test mappings, and you've got a public test dashboard with no infrastructure to maintain beyond a gh-pages branch.
Top comments (0)