DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

Monitoring AI Agent Actions in Production: A Developer's Guide

Monitoring AI Agent Actions in Production: A Developer's Guide

You deploy an AI agent to production. It's supposed to fill out forms, make API calls, and report back. For the first week, everything works. Then on Wednesday, a customer reports: "The agent submitted my form twice and now my data is corrupted."

You check the logs. Your agent says:

2026-03-17T14:32:15Z Agent started task
2026-03-17T14:32:18Z Form filled
2026-03-17T14:32:19Z Submit clicked
2026-03-17T14:32:20Z Task completed
Enter fullscreen mode Exit fullscreen mode

But the logs don't answer: What did the agent actually see on screen? Did the form really fill? Did the submit button click? Or did the page freeze after your agent clicked?

Text logs alone aren't enough. You need to see what your agent saw.

The Problem: Blind Agents

Right now, your agent monitoring probably includes:

  • Log output (text statements)
  • API call traces (what endpoints were hit)
  • Error messages (if something broke)

But none of this answers: What did the UI actually show the agent?

Common blind spots:

  • Form validation errors that logs missed
  • Page redirects your agent didn't expect
  • Visual elements (buttons, links) that moved or disappeared
  • Stale page state (cached HTML)

Result: Agents make mistakes and you have no visual evidence of what went wrong.

The Solution: Screenshot-Based Monitoring

Every time your agent takes an action, capture a screenshot. Not for humans to debug — for the agent itself to verify what it's seeing, and for you to audit what happened.

Here's the pattern:

const fetch = require('node-fetch');
const fs = require('fs');

class MonitoredAgent {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.screenshots = [];
  }

  async captureScreenshot(url, label) {
    const response = await fetch('https://api.pagebolt.com/v1/screenshot', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        url: url,
        viewport: { width: 1280, height: 720 },
        format: 'png'
      })
    });

    const buffer = await response.buffer();
    const filename = `screenshot-${label}-${Date.now()}.png`;
    fs.writeFileSync(`./audit-trail/${filename}`, buffer);

    this.screenshots.push({
      timestamp: new Date().toISOString(),
      label: label,
      filename: filename,
      url: url
    });

    return buffer;
  }

  async executeTask(url) {
    console.log(`Agent starting task on ${url}`);

    // Screenshot 1: Initial state
    await this.captureScreenshot(url, 'initial-state');

    // Agent does work (fill form, click button, etc.)
    await this.fillForm(url);

    // Screenshot 2: After form fill
    await this.captureScreenshot(url, 'after-fill');

    // Click submit
    await this.clickSubmit(url);

    // Screenshot 3: After submit
    await this.captureScreenshot(url, 'after-submit');

    // Wait for confirmation
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Screenshot 4: Final state
    await this.captureScreenshot(url, 'final-state');

    return this.getAuditTrail();
  }

  getAuditTrail() {
    return {
      task: 'form_submission',
      timestamp: new Date().toISOString(),
      screenshots: this.screenshots,
      status: 'completed'
    };
  }
}

// Usage
const agent = new MonitoredAgent(process.env.PAGEBOLT_API_KEY);
const auditTrail = await agent.executeTask('https://example.com/form');

// Save audit trail
fs.writeFileSync(
  `./audit-trails/${Date.now()}.json`,
  JSON.stringify(auditTrail, null, 2)
);
Enter fullscreen mode Exit fullscreen mode

Real-World Example: Procurement Workflow Agent

Let's say you have an agent that processes purchase requests. At each decision point, capture a screenshot:

async function procurementAgent(prUrl) {
  const agent = new MonitoredAgent(process.env.PAGEBOLT_API_KEY);
  const auditTrail = [];

  try {
    // Step 1: Read requisition
    await agent.captureScreenshot(prUrl, 'read-requisition');
    const amount = await agent.extractAmount(prUrl);

    // Step 2: Check approval rules
    const requiresApproval = amount > 10000;
    await agent.captureScreenshot(prUrl, 'approval-check');

    // Step 3: Route for approval
    if (requiresApproval) {
      await agent.submitForApproval(prUrl);
      await agent.captureScreenshot(prUrl, 'submitted-for-approval');
    } else {
      await agent.auto-approve(prUrl);
      await agent.captureScreenshot(prUrl, 'auto-approved');
    }

    // Step 4: Final state
    await agent.captureScreenshot(prUrl, 'final-state');

    // Return complete audit trail
    return agent.getAuditTrail();

  } catch (error) {
    // On error, capture final screenshot
    await agent.captureScreenshot(prUrl, 'error-state');
    throw error;
  }
}
Enter fullscreen mode Exit fullscreen mode

Audit trail includes:

  • Screenshot at each decision point
  • Timestamps of each action
  • URL state at each step
  • Complete visual record of what the agent saw

For compliance: "Here's what the agent saw when it approved the purchase." Auditors can literally see the UI the agent was interacting with.

Why Screenshots > Text Logs

Text logs tell you what the agent thinks happened:

[14:32:19] Form field "amount" filled with value "5000"
[14:32:20] Submit button clicked
[14:32:21] Request successful
Enter fullscreen mode Exit fullscreen mode

Screenshots show what actually happened:

  • Screenshot 1: Form loaded correctly
  • Screenshot 2: Form filled (but validation error showing in red)
  • Screenshot 3: Submit button is disabled (grayed out)
  • Screenshot 4: Modal popup blocked submission

Huge difference. The agent's logs say "submit clicked" but the screenshot shows "button is disabled." Text logs are incomplete.

Governance & Compliance

In regulated industries (fintech, healthcare, insurance), visual proof matters:

  • Audit trail: Regulators want to see: "Here's the exact UI state when the agent made decision X"
  • Debugging: Support says "agent rejected my application" → you show 5 screenshots proving why
  • Liability: "Agent made wrong decision" → you have visual evidence of exactly what information was available

Screenshots are the governance layer that text logs can't provide.

Cost & Implementation

PageBolt approach:

  • Starter plan: $29/month (5,000 screenshots)
  • Typical agent: 4-10 screenshots per task
  • Volume: 500 tasks/month = 2,000-5,000 screenshots ✓ Covered by Starter
  • Total: $29/month

Self-hosted approach:

  • Puppeteer + Node.js: $50-100/month infrastructure
  • Storage: $5-10/month for screenshots
  • DevOps overhead: 2-4 hours/month
  • Total: $55-110/month + time

For production AI agents, PageBolt's $29/month Starter plan is both cheaper and simpler.

Getting Started

  1. Sign up free at pagebolt.dev/pricing — 100 screenshots/month
  2. Add the captureScreenshot() function to your agent
  3. Call it before/after critical actions
  4. Save screenshots + metadata to your audit trail
  5. Compliance + governance layer ready

Your agent is now fully transparent. Every action, every decision, every page state — captured and auditable.

Start free: pagebolt.dev/pricing. 100 screenshots/month, no credit card required.

Top comments (0)