DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

agentlens, unworldly, and the text audit trail gap — why visual replay is still missing

agentlens, unworldly, and the text audit trail gap — why visual replay is still missing

agentlens just shipped immutable audit trail logging for AI agents. unworldly launched weeks before with tamper-evident logs. Both solve a real problem: tracking what your agent did.

Problem solved: ✅ immutable log of every action
Problem not solved: visual proof of what the agent actually saw

This gap is bigger than it looks.

The Text Audit Trail Problem

agentlens logs every agent action:

{
  "timestamp": "2026-03-07T14:23:15Z",
  "agent_id": "refund_processor_v2",
  "action": "click",
  "selector": "button[name=submit]",
  "result": "success"
}
Enter fullscreen mode Exit fullscreen mode

This is perfect for forensics. It's immutable. It's auditable. It's tamper-evident.

But it answers one question, not two.

Question 1: What did the agent do?
Answer: Text log shows it (click, fill, navigate, submit).

Question 2: What did the agent see?
Answer: Text log does not show it.

Regulators ask both questions. Compliance teams ask both questions. Auditors ask both questions.

Real Scenario: The Compliance Interview

Compliance auditor: "On March 2, your agent processed a $500 refund. Show me the evidence."

You show the text log:

action: navigate, url: /refunds/123
action: fill, selector: input[name=amount], value: 500
action: click, selector: button[type=submit]
result: success
Enter fullscreen mode Exit fullscreen mode

Auditor: "That shows what the agent said it did. But what did it actually see? What was on the screen? Did the form show $500? Did the confirmation say 'refund approved'? How do I know the agent filled the right field?"

You have no answer. Text logs don't capture screen state.

The Gap: Text vs Visual

Text audit trails (agentlens, unworldly, LangSmith) tell you:

  • What the agent decided to do
  • What API calls it made
  • What responses it got
  • Timestamps and metadata

Visual audit trails (screenshots, videos, PDFs) show you:

  • What was actually on the screen
  • What the agent clicked on
  • What the form looked like before/after
  • The full interaction sequence

For regulated workflows, you need both.

Why Regulators Demand Visual Proof

Three reasons:

1. Behavioral verification: Logs say "agent filled amount field with 500." A screenshot of the filled form proves it actually happened. Logs can be faked or interpreted wrong. Screenshots are harder to fake.

2. Compliance standards: SOC 2 Type II audits explicitly require "evidence of correct behavior." Text logs aren't evidence — they're assertions. Screenshots are evidence.

3. Liability: If something goes wrong and regulators investigate, your defense is: "Here's the immutable log AND here's the screenshot proving what the agent saw." Not: "Here's a log that says it worked."

Text logs alone put you in a weaker position.

The Opportunity for agentlens and unworldly

Both projects are solving the right problem. Immutable audit trails are essential. But they're solving half the problem.

The teams that win will pair text audit trails (for forensics) with visual replay (for proof).

If agentlens or unworldly ship screenshot/video capture, they move from "immutable logs" to "immutable logs + visual proof." That's a stronger moat.

Until then, they're incomplete.

The Complementary Solution

You don't replace text audit trails. You pair them with visual proof.

Pattern:

Your agent runs:
1. Navigate to /refunds
2. Fill amount field
3. Click submit
4. Get confirmation

agentlens logs: [all 4 steps with timestamps]
PageBolt captures: [screenshot before, screenshot after, video of full flow]

Auditor sees:
- Text log proves what happened (forensics)
- Screenshots prove what the agent saw (compliance)
- Video proves the sequence (behavior verification)
Enter fullscreen mode Exit fullscreen mode

Text audit trail tools are getting better every month. agentlens, unworldly, LangSmith, Helicone — they're all building comprehensive logging.

None of them are building visual replay. Because visual replay requires:

  • Taking screenshots at the right moments
  • Recording video of multi-step sequences
  • Storing visual artifacts server-side
  • Syncing visual proof with text logs

That's infrastructure-heavy. Most audit trail projects stay focused on text.

The Strategic Implication

If you're using agentlens for audit trails, you still need visual proof for regulators. agentlens logs that the agent navigated to /refunds. PageBolt screenshots what the /refunds page looked like.

If you're using unworldly for tamper-evident logs, you still need to show auditors what the agent actually saw. unworldly proves the log is immutable. PageBolt proves what happened on screen.

They're not competing. They're complementary.

Same relationship we see with LangSmith + PageBolt. LangSmith captures what the LLM decided. PageBolt captures what the user actually saw as a result.

Text audit trails are table stakes. Visual replay is the next layer.

Getting Started

If you're building with agentlens or unworldly, add PageBolt for visual replay:

  1. Capture screenshots at checkpoints: form filled, submission confirmed, error caught
  2. Record videos of multi-step workflows: navigation → fill → submit → confirmation
  3. Store visuals server-side: alongside your text audit trail
  4. Pair them for compliance: auditors see logs AND screenshots

Free tier: 100 requests/month. Enough to capture visual proof for 20–30 workflows per month.

For teams building audit trail infrastructure: Add visual replay to your audit stack

For teams getting started: Get started at pagebolt.dev/signup


Text audit trails are necessary. Visual replay is non-negotiable for regulated workflows. The teams that build both win.

Top comments (4)

Collapse
 
arkforge-ceo profile image
ArkForge

The article treats screenshots as the stronger evidence, but screenshots are also mutable artifacts — a PNG can be edited. For agents interacting with APIs (rather than rendered UIs), what regulators actually need isn't a screenshot of the response: it's a cryptographic proof that the exact bytes returned by the upstream service haven't been altered. A SHA-256 hash of the response, signed by an independent third party and anchored in a public append-only log like Sigstore Rekor, is harder to fake than any screenshot — and it answers the same question: "what did the agent actually receive?" The visual replay gap is real for browser automation, but for API-driven agents the gap is cryptographic attestation, not screen recording.

Collapse
 
arkforge-ceo profile image
ArkForge

The text-vs-visual framing slightly obscures the actual evidentiary gap. An unsigned screenshot stored in your own S3 bucket is as mutable as an unsigned log - both can be altered before an audit. What makes evidence admissible isn't the format, it's whether it's cryptographically bound to a specific moment by an independent party. For API-based agents (which covers most of what LangSmith, agentlens, and unworldly are actually logging), RFC 3161 timestamps and Ed25519 signatures on the raw request/response payload produce stronger evidence than a screenshot - because the binding is mathematical, not visual. The real axis isn't text vs visual: it's signed-by-independent-party vs self-reported.

Collapse
 
arkforge-ceo profile image
ArkForge

The screenshot argument has a weak point worth flagging: screenshots stored on your own infrastructure are controlled by the same party as the text logs. An auditor who pushes back on "our logs say X" can equally push back on "our screenshots show Y" - the trust model is identical. For visual proof to hold in a compliance interview, the capture timestamp and image hash need to be anchored somewhere independent of the operator's storage, otherwise you are swapping one assertion for another.

Collapse
 
arkforge-ceo profile image
ArkForge

The text vs visual framing is useful, but there's a third axis neither agentlens nor unworldly addresses: custody of the audit trail. Both generate logs from within the same process that took the action, meaning the agent controls its own evidence. For regulated workflows, auditors increasingly ask not just "is the log immutable?" but "who generated it, and could the party being audited have influenced it?" Independent third-party attestation, where a neutral proxy signs and timestamps the exchange from outside the agent, addresses this in a way self-generated logs cannot, regardless of how tamper-evident they are.