Custodia-Admin

Posted on Mar 17 • Edited on Mar 25 • Originally published at pagebolt.dev

Monitoring AI Agent Actions in Production: A Developer's Guide

#aiagents #monitoring #governance #observability

Monitoring AI Agent Actions in Production: A Developer's Guide

You deploy an AI agent to production. It's supposed to fill out forms, make API calls, and report back. For the first week, everything works. Then on Wednesday, a customer reports: "The agent submitted my form twice and now my data is corrupted."

You check the logs. Your agent says:

2026-03-17T14:32:15Z Agent started task
2026-03-17T14:32:18Z Form filled
2026-03-17T14:32:19Z Submit clicked
2026-03-17T14:32:20Z Task completed

But the logs don't answer: What did the agent actually see on screen? Did the form really fill? Did the submit button click? Or did the page freeze after your agent clicked?

Text logs alone aren't enough. You need to see what your agent saw.

The Problem: Blind Agents

Right now, your agent monitoring probably includes:

Log output (text statements)
API call traces (what endpoints were hit)
Error messages (if something broke)

But none of this answers: What did the UI actually show the agent?

Common blind spots:

Form validation errors that logs missed
Page redirects your agent didn't expect
Visual elements (buttons, links) that moved or disappeared
Stale page state (cached HTML)

Result: Agents make mistakes and you have no visual evidence of what went wrong.

The Solution: Screenshot-Based Monitoring

Every time your agent takes an action, capture a screenshot. Not for humans to debug — for the agent itself to verify what it's seeing, and for you to audit what happened.

Here's the pattern:

const fetch = require('node-fetch');
const fs = require('fs');

class MonitoredAgent {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.screenshots = [];
  }

  async captureScreenshot(url, label) {
    const response = await fetch('https://api.pagebolt.com/v1/screenshot', {
      method: 'POST',
      headers: {
        'x-api-key': `${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        url: url,
        viewport: { width: 1280, height: 720 },
        format: 'png'
      })
    });

    const buffer = await response.buffer();
    const filename = `screenshot-${label}-${Date.now()}.png`;
    fs.writeFileSync(`./audit-trail/${filename}`, buffer);

    this.screenshots.push({
      timestamp: new Date().toISOString(),
      label: label,
      filename: filename,
      url: url
    });

    return buffer;
  }

  async executeTask(url) {
    console.log(`Agent starting task on ${url}`);

    // Screenshot 1: Initial state
    await this.captureScreenshot(url, 'initial-state');

    // Agent does work (fill form, click button, etc.)
    await this.fillForm(url);

    // Screenshot 2: After form fill
    await this.captureScreenshot(url, 'after-fill');

    // Click submit
    await this.clickSubmit(url);

    // Screenshot 3: After submit
    await this.captureScreenshot(url, 'after-submit');

    // Wait for confirmation
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Screenshot 4: Final state
    await this.captureScreenshot(url, 'final-state');

    return this.getAuditTrail();
  }

  getAuditTrail() {
    return {
      task: 'form_submission',
      timestamp: new Date().toISOString(),
      screenshots: this.screenshots,
      status: 'completed'
    };
  }
}

// Usage
const agent = new MonitoredAgent(process.env.PAGEBOLT_API_KEY);
const auditTrail = await agent.executeTask('https://example.com/form');

// Save audit trail
fs.writeFileSync(
  `./audit-trails/${Date.now()}.json`,
  JSON.stringify(auditTrail, null, 2)
);

Real-World Example: Procurement Workflow Agent

Let's say you have an agent that processes purchase requests. At each decision point, capture a screenshot:

async function procurementAgent(prUrl) {
  const agent = new MonitoredAgent(process.env.PAGEBOLT_API_KEY);
  const auditTrail = [];

  try {
    // Step 1: Read requisition
    await agent.captureScreenshot(prUrl, 'read-requisition');
    const amount = await agent.extractAmount(prUrl);

    // Step 2: Check approval rules
    const requiresApproval = amount > 10000;
    await agent.captureScreenshot(prUrl, 'approval-check');

    // Step 3: Route for approval
    if (requiresApproval) {
      await agent.submitForApproval(prUrl);
      await agent.captureScreenshot(prUrl, 'submitted-for-approval');
    } else {
      await agent.auto-approve(prUrl);
      await agent.captureScreenshot(prUrl, 'auto-approved');
    }

    // Step 4: Final state
    await agent.captureScreenshot(prUrl, 'final-state');

    // Return complete audit trail
    return agent.getAuditTrail();

  } catch (error) {
    // On error, capture final screenshot
    await agent.captureScreenshot(prUrl, 'error-state');
    throw error;
  }
}

Audit trail includes:

Screenshot at each decision point
Timestamps of each action
URL state at each step
Complete visual record of what the agent saw

For compliance: "Here's what the agent saw when it approved the purchase." Auditors can literally see the UI the agent was interacting with.

Why Screenshots > Text Logs

Text logs tell you what the agent thinks happened:

[14:32:19] Form field "amount" filled with value "5000"
[14:32:20] Submit button clicked
[14:32:21] Request successful

Screenshots show what actually happened:

Screenshot 1: Form loaded correctly
Screenshot 2: Form filled (but validation error showing in red)
Screenshot 3: Submit button is disabled (grayed out)
Screenshot 4: Modal popup blocked submission

Huge difference. The agent's logs say "submit clicked" but the screenshot shows "button is disabled." Text logs are incomplete.

Governance & Compliance

In regulated industries (fintech, healthcare, insurance), visual proof matters:

Audit trail: Regulators want to see: "Here's the exact UI state when the agent made decision X"
Debugging: Support says "agent rejected my application" → you show 5 screenshots proving why
Liability: "Agent made wrong decision" → you have visual evidence of exactly what information was available

Screenshots are the governance layer that text logs can't provide.

Cost & Implementation

PageBolt approach:

Starter plan: $29/month (5,000 screenshots)
Typical agent: 4-10 screenshots per task
Volume: 500 tasks/month = 2,000-5,000 screenshots ✓ Covered by Starter
Total: $29/month

Self-hosted approach:

Puppeteer + Node.js: $50-100/month infrastructure
Storage: $5-10/month for screenshots
DevOps overhead: 2-4 hours/month
Total: $55-110/month + time

For production AI agents, PageBolt's $29/month Starter plan is both cheaper and simpler.

Getting Started

Sign up free at pagebolt.dev/pricing — 100 screenshots/month
Add the captureScreenshot() function to your agent
Call it before/after critical actions
Save screenshots + metadata to your audit trail
Compliance + governance layer ready

Your agent is now fully transparent. Every action, every decision, every page state — captured and auditable.

Start free: pagebolt.dev/pricing. 100 screenshots/month, no credit card required.

Top comments (2)

Tijo Gaucher • Apr 13

this hits close to home lol. the gap between "agent started task" and actually knowing what it did is so real. we had an agent silently retrying a failed action for like 40 min before anyone noticed. observability is the unsexy part of agents that nobody wants to build but everybody needs

Armorer Labs • Jun 13

Action monitoring is the right level for agents. Request monitoring alone misses the thing people actually care about: what did the agent do?

I would separate three streams: raw traces for debugging, normalized action records for review, and policy/approval events for governance. The useful product surface is usually built from normalized action records, not raw logs. Disclosure: I'm building Armorer Guard, so this maps closely to how we think about tool calls.