Custodia-Admin

Posted on Mar 13 • Originally published at pagebolt.dev

agentlens, unworldly, and the text audit trail gap — why visual replay is still missing

#agents #compliance #security

agentlens, unworldly, and the text audit trail gap — why visual replay is still missing

agentlens just shipped immutable audit trail logging for AI agents. unworldly launched weeks before with tamper-evident logs. Both solve a real problem: tracking what your agent did.

Problem solved: ✅ immutable log of every action
Problem not solved: visual proof of what the agent actually saw

This gap is bigger than it looks.

The Text Audit Trail Problem

agentlens logs every agent action:

{
  "timestamp": "2026-03-07T14:23:15Z",
  "agent_id": "refund_processor_v2",
  "action": "click",
  "selector": "button[name=submit]",
  "result": "success"
}

This is perfect for forensics. It's immutable. It's auditable. It's tamper-evident.

But it answers one question, not two.

Question 1: What did the agent do?
Answer: Text log shows it (click, fill, navigate, submit).

Question 2: What did the agent see?
Answer: Text log does not show it.

Regulators ask both questions. Compliance teams ask both questions. Auditors ask both questions.

Real Scenario: The Compliance Interview

Compliance auditor: "On March 2, your agent processed a $500 refund. Show me the evidence."

You show the text log:

action: navigate, url: /refunds/123
action: fill, selector: input[name=amount], value: 500
action: click, selector: button[type=submit]
result: success

Auditor: "That shows what the agent said it did. But what did it actually see? What was on the screen? Did the form show $500? Did the confirmation say 'refund approved'? How do I know the agent filled the right field?"

You have no answer. Text logs don't capture screen state.

The Gap: Text vs Visual

Text audit trails (agentlens, unworldly, LangSmith) tell you:

What the agent decided to do
What API calls it made
What responses it got
Timestamps and metadata

Visual audit trails (screenshots, videos, PDFs) show you:

What was actually on the screen
What the agent clicked on
What the form looked like before/after
The full interaction sequence

For regulated workflows, you need both.

Why Regulators Demand Visual Proof

Three reasons:

1. Behavioral verification: Logs say "agent filled amount field with 500." A screenshot of the filled form proves it actually happened. Logs can be faked or interpreted wrong. Screenshots are harder to fake.

2. Compliance standards: SOC 2 Type II audits explicitly require "evidence of correct behavior." Text logs aren't evidence — they're assertions. Screenshots are evidence.

3. Liability: If something goes wrong and regulators investigate, your defense is: "Here's the immutable log AND here's the screenshot proving what the agent saw." Not: "Here's a log that says it worked."

Text logs alone put you in a weaker position.

The Opportunity for agentlens and unworldly

Both projects are solving the right problem. Immutable audit trails are essential. But they're solving half the problem.

The teams that win will pair text audit trails (for forensics) with visual replay (for proof).

If agentlens or unworldly ship screenshot/video capture, they move from "immutable logs" to "immutable logs + visual proof." That's a stronger moat.

Until then, they're incomplete.

The Complementary Solution

You don't replace text audit trails. You pair them with visual proof.

Pattern:

Your agent runs:
1. Navigate to /refunds
2. Fill amount field
3. Click submit
4. Get confirmation

agentlens logs: [all 4 steps with timestamps]
PageBolt captures: [screenshot before, screenshot after, video of full flow]

Auditor sees:
- Text log proves what happened (forensics)
- Screenshots prove what the agent saw (compliance)
- Video proves the sequence (behavior verification)

Text audit trail tools are getting better every month. agentlens, unworldly, LangSmith, Helicone — they're all building comprehensive logging.

None of them are building visual replay. Because visual replay requires:

Taking screenshots at the right moments
Recording video of multi-step sequences
Storing visual artifacts server-side
Syncing visual proof with text logs

That's infrastructure-heavy. Most audit trail projects stay focused on text.

The Strategic Implication

If you're using agentlens for audit trails, you still need visual proof for regulators. agentlens logs that the agent navigated to /refunds. PageBolt screenshots what the /refunds page looked like.

If you're using unworldly for tamper-evident logs, you still need to show auditors what the agent actually saw. unworldly proves the log is immutable. PageBolt proves what happened on screen.

They're not competing. They're complementary.

Same relationship we see with LangSmith + PageBolt. LangSmith captures what the LLM decided. PageBolt captures what the user actually saw as a result.

Text audit trails are table stakes. Visual replay is the next layer.

Getting Started

If you're building with agentlens or unworldly, add PageBolt for visual replay:

Capture screenshots at checkpoints: form filled, submission confirmed, error caught
Record videos of multi-step workflows: navigation → fill → submit → confirmation
Store visuals server-side: alongside your text audit trail
Pair them for compliance: auditors see logs AND screenshots

Free tier: 100 requests/month. Enough to capture visual proof for 20–30 workflows per month.

For teams building audit trail infrastructure: Add visual replay to your audit stack

For teams getting started: Get started at pagebolt.dev/signup

Text audit trails are necessary. Visual replay is non-negotiable for regulated workflows. The teams that build both win.

Top comments (4)

ArkForge • Mar 14

The article treats screenshots as the stronger evidence, but screenshots are also mutable artifacts — a PNG can be edited. For agents interacting with APIs (rather than rendered UIs), what regulators actually need isn't a screenshot of the response: it's a cryptographic proof that the exact bytes returned by the upstream service haven't been altered. A SHA-256 hash of the response, signed by an independent third party and anchored in a public append-only log like Sigstore Rekor, is harder to fake than any screenshot — and it answers the same question: "what did the agent actually receive?" The visual replay gap is real for browser automation, but for API-driven agents the gap is cryptographic attestation, not screen recording.

ArkForge • Mar 16

The text-vs-visual framing slightly obscures the actual evidentiary gap. An unsigned screenshot stored in your own S3 bucket is as mutable as an unsigned log - both can be altered before an audit. What makes evidence admissible isn't the format, it's whether it's cryptographically bound to a specific moment by an independent party. For API-based agents (which covers most of what LangSmith, agentlens, and unworldly are actually logging), RFC 3161 timestamps and Ed25519 signatures on the raw request/response payload produce stronger evidence than a screenshot - because the binding is mathematical, not visual. The real axis isn't text vs visual: it's signed-by-independent-party vs self-reported.

ArkForge • Mar 18

The screenshot argument has a weak point worth flagging: screenshots stored on your own infrastructure are controlled by the same party as the text logs. An auditor who pushes back on "our logs say X" can equally push back on "our screenshots show Y" - the trust model is identical. For visual proof to hold in a compliance interview, the capture timestamp and image hash need to be anchored somewhere independent of the operator's storage, otherwise you are swapping one assertion for another.

ArkForge • Mar 18

The text vs visual framing is useful, but there's a third axis neither agentlens nor unworldly addresses: custody of the audit trail. Both generate logs from within the same process that took the action, meaning the agent controls its own evidence. For regulated workflows, auditors increasingly ask not just "is the log immutable?" but "who generated it, and could the party being audited have influenced it?" Independent third-party attestation, where a neutral proxy signs and timestamps the exchange from outside the agent, addresses this in a way self-generated logs cannot, regardless of how tamper-evident they are.