Visual Verification for AI Agents: How to Confirm Web Actions Actually Worked
AI agents that click buttons and fill forms have a blind spot: they can't tell if the action worked unless they check. A form submission might return a 200 but show an error message. A login might redirect to an unexpected page. A delete might require a confirmation step the agent didn't anticipate.
The fix is visual verification: after each action, screenshot the current state and let the model evaluate whether the expected outcome happened.
The problem with optimistic agents
// This agent is optimistic — it assumes actions work
const result = await agent.invoke("Login to https://example.com with email test@example.com password abc123");
// The agent says "Done" but has no way to confirm login succeeded
// The actual page might show "Invalid credentials" or require 2FA
Add a verification step
import fetch from "node-fetch";
const PAGEBOLT_KEY = process.env.PAGEBOLT_API_KEY;
async function screenshotPage(url) {
const res = await fetch("https://pagebolt.dev/api/v1/screenshot", {
method: "POST",
headers: { "x-api-key": PAGEBOLT_KEY, "Content-Type": "application/json" },
body: JSON.stringify({ url, blockBanners: true }),
});
if (!res.ok) throw new Error(`Screenshot failed: ${res.status}`);
return Buffer.from(await res.arrayBuffer());
}
async function inspectPage(url) {
const res = await fetch("https://pagebolt.dev/api/v1/inspect", {
method: "POST",
headers: { "x-api-key": PAGEBOLT_KEY, "Content-Type": "application/json" },
body: JSON.stringify({ url }),
});
if (!res.ok) throw new Error(`Inspect failed: ${res.status}`);
return res.json();
}
// Verification tool — screenshot the result and ask the model to evaluate it
async function verifyAction(url, expectedOutcome) {
const screenshot = await screenshotPage(url);
const inspection = await inspectPage(url);
// Pass screenshot + inspection to the model with a specific question
return {
screenshot, // raw PNG bytes
elements: inspection.elements,
question: `After the action, is the current page consistent with: "${expectedOutcome}"?
Look for: success messages, error messages, redirects, modal dialogs, or missing expected content.
Answer: Yes/No and explain what you see.`,
};
}
Full loop: act → verify → retry
import Anthropic from "@anthropic-ai/sdk";
import fetch from "node-fetch";
const claude = new Anthropic();
const PAGEBOLT_KEY = process.env.PAGEBOLT_API_KEY;
// Tool definitions
const tools = [
{
name: "navigate_and_screenshot",
description: "Navigate to a URL and take a screenshot of the result",
input_schema: {
type: "object",
properties: {
url: { type: "string", description: "URL to navigate to" },
expected: { type: "string", description: "What you expect to see (used to verify success)" },
},
required: ["url"],
},
},
{
name: "inspect_page",
description: "Get all interactive elements and CSS selectors on the current page. Use before filling forms or clicking buttons.",
input_schema: {
type: "object",
properties: {
url: { type: "string" },
},
required: ["url"],
},
},
{
name: "run_sequence_and_verify",
description: "Run a multi-step browser sequence and screenshot the final state for verification",
input_schema: {
type: "object",
properties: {
url: { type: "string", description: "Starting URL" },
steps: {
type: "array",
description: "Browser steps to execute",
items: {
type: "object",
properties: {
action: { type: "string", enum: ["click", "fill", "navigate", "wait", "screenshot"] },
selector: { type: "string" },
value: { type: "string" },
url: { type: "string" },
ms: { type: "number" },
},
required: ["action"],
},
},
expected: { type: "string", description: "Expected state after all steps complete" },
},
required: ["url", "steps"],
},
},
];
async function executeTool(name, input) {
const headers = { "x-api-key": PAGEBOLT_KEY, "Content-Type": "application/json" };
if (name === "navigate_and_screenshot") {
const res = await fetch("https://pagebolt.dev/api/v1/screenshot", {
method: "POST",
headers,
body: JSON.stringify({ url: input.url, blockBanners: true }),
});
if (!res.ok) return { error: `Screenshot failed: ${res.status}` };
const bytes = Buffer.from(await res.arrayBuffer());
return {
screenshot_b64: bytes.toString("base64"),
url: input.url,
expected: input.expected,
note: "Use the screenshot to verify whether the expected outcome was achieved.",
};
}
if (name === "inspect_page") {
const res = await fetch("https://pagebolt.dev/api/v1/inspect", {
method: "POST",
headers,
body: JSON.stringify({ url: input.url }),
});
if (!res.ok) return { error: `Inspect failed: ${res.status}` };
const data = await res.json();
return {
elementCount: data.elements?.length,
elements: (data.elements || []).slice(0, 30).map((el) => ({
tag: el.tag,
role: el.role,
text: (el.text || "").slice(0, 80),
selector: el.selector,
})),
};
}
if (name === "run_sequence_and_verify") {
// Build sequence steps + screenshot at end
const sequenceSteps = [
{ action: "navigate", url: input.url },
...input.steps,
{ action: "screenshot" },
];
const res = await fetch("https://pagebolt.dev/api/v1/sequence", {
method: "POST",
headers,
body: JSON.stringify({ steps: sequenceSteps }),
});
if (!res.ok) return { error: `Sequence failed: ${res.status} ${await res.text()}` };
const data = await res.json();
const lastScreenshot = data.outputs?.find((o) => o.type === "screenshot");
return {
outputs: data.outputs?.length,
screenshot_b64: lastScreenshot?.data,
expected: input.expected,
note: "Check the screenshot to verify whether the expected outcome was achieved.",
};
}
return { error: `Unknown tool: ${name}` };
}
async function runVerifyingAgent(task) {
console.log(`\nTask: ${task}\n`);
const messages = [{
role: "user",
content: `${task}
IMPORTANT: After every action that changes page state, use navigate_and_screenshot to capture the result and verify it looks correct. Do not assume actions succeeded without visual confirmation.`,
}];
let iterations = 0;
const MAX_ITERATIONS = 10;
while (iterations < MAX_ITERATIONS) {
iterations++;
const response = await claude.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 4096,
tools,
messages,
});
messages.push({ role: "assistant", content: response.content });
// If done, return final response
if (response.stop_reason === "end_turn") {
const textContent = response.content.find((c) => c.type === "text");
return textContent?.text || "Task complete";
}
// Execute tool calls
const toolUses = response.content.filter((c) => c.type === "tool_use");
if (toolUses.length === 0) break;
const toolResults = await Promise.all(
toolUses.map(async (tu) => {
console.log(` → ${tu.name}(${JSON.stringify(tu.input).slice(0, 80)}...)`);
const result = await executeTool(tu.name, tu.input);
// If the tool returned a screenshot, include it as an image for the model to see
if (result.screenshot_b64) {
return {
type: "tool_result",
tool_use_id: tu.id,
content: [
{
type: "image",
source: { type: "base64", media_type: "image/png", data: result.screenshot_b64 },
},
{
type: "text",
text: `Screenshot captured. Expected: "${result.expected || 'not specified'}". Evaluate whether this matches expectations.`,
},
],
};
}
return {
type: "tool_result",
tool_use_id: tu.id,
content: JSON.stringify(result),
};
})
);
messages.push({ role: "user", content: toolResults });
}
return "Max iterations reached";
}
// Run it
const outcome = await runVerifyingAgent(
"Go to https://example.com, inspect the page to find all links, " +
"then screenshot the page and tell me if there are any broken layout issues."
);
console.log("\nResult:", outcome);
Verification patterns
Pattern 1: check for expected text
// After form submission, verify the success message appeared
const verification = await runVerifyingAgent(
"Submit the contact form at https://example.com/contact with " +
"name: 'Test User', email: 'test@test.com', message: 'Hello'. " +
"After submitting, screenshot the page and confirm a success message appeared."
);
Pattern 2: check page title or URL changed
// After login, verify redirect happened
const verification = await runVerifyingAgent(
"Log in to https://example.com/login. After submitting credentials, " +
"screenshot the result and verify the URL changed to /dashboard (not still on /login)."
);
Pattern 3: element presence check
// Inspect after action to verify element appeared/disappeared
const verification = await runVerifyingAgent(
"Click the 'Delete account' button at https://example.com/settings. " +
"After clicking, inspect the page and verify a confirmation dialog appeared " +
"(look for elements with 'confirm' or 'are you sure' text)."
);
Why this matters for production agents
Unverified agents produce silent failures — they report success but leave the system in an unknown state. Visual verification closes the loop:
- Act → run the browser action
- Capture → screenshot the result
- Evaluate → model assesses whether expected outcome occurred
- Retry or escalate → if verification fails, retry with a different approach or surface the failure
This is the difference between an agent that says "done" and one that's actually done.
Try it free — 100 requests/month, no credit card. → Get started in 2 minutes
Top comments (0)