web4browser

Posted on May 30

wright Evidence Bundles for Browser Automation Debugging

#webdev #testing #playwright #automation

That is useful, but it is rarely enough.

A timeout does not tell you which browser profile was active.
A failed click does not tell you whether the account was still logged in.
A navigation error does not tell you whether the proxy changed.
A screenshot does not explain the retry history.

For simple tests, an error stack may be enough.

For real browser automation workflows, especially workflows involving accounts, proxies, persistent profiles, or AI agents, the useful artifact is not just the error.

It is the evidence bundle.

An evidence bundle is a small folder created after each browser automation run. It keeps the inputs, browser state, screenshots, traces, network symptoms, and final result in a form that a developer, operator, or agent can review later.

This is the kind of operational layer teams usually need when they move from isolated scripts toward a browser automation workspace for multi-account teams.

Why a failed browser run needs more than an error message

A Playwright error usually answers one question:

Where did the script stop?

But debugging browser automation usually requires better questions.

Which account was used?
Which profile was attached?
Which proxy region was expected?
Was the session already logged in?
Did the page redirect?
Did a verification prompt appear?
Did the retry change anything?

Without this context, teams waste time guessing.

One person says the selector changed. Another says the proxy failed. Someone else says the account was logged out. The script owner reruns the task locally and cannot reproduce the problem.

That is not only a coding problem.

It is an evidence problem.

The minimum evidence bundle

A useful evidence bundle does not need to be large.

A small folder like this is often enough:

runs/
  run_2026_05_30_001/
    input.json
    environment.json
    screenshot.png
    page.html
    trace.zip
    network-summary.json
    result.json
    notes.md

Each file has a job.

input.json records what the script was asked to do.

environment.json records the browser profile, proxy expectation, runtime mode, and account context.

screenshot.png shows the visible page when the run stopped.

page.html helps inspect unexpected UI changes.

trace.zip gives Playwright’s timeline for reproducible debugging.

network-summary.json records request failures, status codes, redirects, and timeout patterns.

result.json gives the final machine-readable status.

notes.md gives humans a short explanation.

The goal is not to save everything.

The goal is to save enough context so the next person does not start from zero.

Save the inputs before the browser starts

The first rule is simple:

Save the inputs before doing browser work.

If the run fails later, you should not need to reconstruct what happened from memory, console logs, or chat messages.

{
  "run_id": "run_2026_05_30_001",
  "task": "check_dashboard_status",
  "account_id": "acct_us_018",
  "profile_id": "profile_us_018",
  "proxy_region": "US",
  "target_url": "https://example.com/dashboard",
  "headless": false
}

This file helps answer basic questions quickly.

Was the task pointed at the correct URL?
Was the right account selected?
Was the expected region correct?
Was the script running in headless mode or visible mode?

For multi-account automation, this matters a lot.

The same Playwright code can be safe in one account and risky in another. The difference is often not in the script. It is in the context around the script.

Capture browser state at the moment of failure

A screenshot is helpful, but it is only one layer.

At the moment a task stops, capture a few browser facts:

const state = {
  url: page.url(),
  title: await page.title(),
  timestamp: new Date().toISOString()
};

await page.screenshot({
  path: `${runDir}/screenshot.png`,
  fullPage: true
});

await fs.promises.writeFile(
  `${runDir}/page.html`,
  await page.content()
);

await fs.promises.writeFile(
  `${runDir}/browser-state.json`,
  JSON.stringify(state, null, 2)
);

This gives reviewers more than a frozen image.

They can inspect the URL, page title, rendered HTML, and visible state together.

Be careful with sensitive data.

Do not casually save raw cookies, tokens, passwords, wallet information, private messages, or payment pages. If session information is needed, save a redacted summary, not the secret itself.

Good evidence should improve debugging without creating a new security problem.

Use Playwright trace for reproducible debugging

Playwright trace is one of the best tools for complex browser runs.

It can capture screenshots, DOM snapshots, and source information across the run.

await context.tracing.start({
  screenshots: true,
  snapshots: true,
  sources: true
});

try {
  await page.goto(targetUrl);
  await runTask(page);
} finally {
  await context.tracing.stop({
    path: `${runDir}/trace.zip`
  });
}

The trace is especially useful when the failure is not obvious from the final screenshot.

Maybe the page redirected twice.
Maybe a modal appeared and disappeared.
Maybe a request stalled before the selector failed.
Maybe the script clicked the wrong matching element.

A final screenshot may miss that story.

A trace can preserve it.

Record network symptoms separately

Network problems are often misdiagnosed as selector problems.

A page may fail because the selector changed.

But it may also fail because proxy authentication failed, the DNS path changed, a request timed out, or the server returned 403 or 429.

A small network summary helps separate these cases.

{
  "failed_requests": 3,
  "status_codes": {
    "200": 18,
    "403": 2,
    "429": 1
  },
  "timeouts": 1,
  "redirect_count": 4,
  "proxy_region_expected": "US",
  "proxy_region_observed": "US"
}

This does not need to replace full logs.

It gives the reviewer a fast signal.

If the page failed after several 429 responses, the fix may involve pacing or retry behavior.

If the observed proxy region does not match the expected region, the issue may be environment binding.

If only one static asset failed, the problem may not be the browser task at all.

Write a result file that humans and agents can read

The final output should be structured.

A result file helps humans review the run and helps automation systems decide what to do next.

{
  "status": "review_required",
  "stop_reason": "verification_prompt",
  "last_url": "https://example.com/security-check",
  "screenshot": "screenshot.png",
  "trace": "trace.zip",
  "retry_count": 1,
  "next_action": "human_review"
}

This is more useful than a loose console message.

A human can scan it quickly.

An AI agent can use it safely.

A workflow system can route it to the right next step.

For browser automation, this is a major difference. The run is no longer just a script that passed or failed. It becomes a reviewed operation with a clear state.

That is also why Web4 Browser focuses on keeping browser profiles, proxies, tasks, logs, and review states connected instead of treating automation as isolated script execution.

When an evidence bundle is too much

Not every Playwright script needs a full evidence bundle.

If you are testing a public page locally, a simple screenshot and stack trace may be enough.

If you are experimenting with selectors, saving a trace for every run may be overkill.

Evidence bundles become useful when the workflow has real operational context.

Use them for:

Logged-in account workflows
Proxy-aware browser automation
Multi-account tasks
AI browser agent actions
Scheduled browser jobs
Human-reviewed operations
Long-running workflows
Tasks where failure history matters

Skip them for:

Throwaway local experiments
Simple public page checks
One-time selector tests
Low-risk scripts with no account context

The point is not to make every script heavy.

The point is to make important browser runs reviewable.

A practical rule

Here is a simple rule:

If a failed run would make someone ask, “What exactly happened?”, save an evidence bundle.

If a failed run could affect an account, a user session, a proxy route, a workflow queue, or a human review process, save an evidence bundle.

If the task may later be called by an AI agent, save structured evidence by default.

Browser automation is not only about making the browser move.

It is about knowing what happened after it moved.

Conclusion

A Playwright script that works once is useful.

A Playwright workflow that can be reviewed, reproduced, and fixed is much more useful.

Error messages tell you where a script stopped. Evidence bundles tell you what the run actually looked like: the input, the browser state, the network symptoms, the trace, the stop reason, and the next action.

That is the difference between debugging by memory and debugging from proof.

For more practical notes on profiles, proxy checks, MCP workflows, and account-aware automation, see these more browser automation and profile workflow notes.

DEV Community