Forcing AI agents to actually prove they did the work

Julien Berthomier — Wed, 25 Mar 2026 07:25:41 +0000

My AI agents kept telling me “Done!” while the page was blank and the console had 47 errors.

The problem is simple: agents write UI code but never open a browser to check. They can’t see if the layout is broken, if buttons overlap, or if the console is full of red.

So I built ProofShot: a CLI that gives any AI agent a browser, records the session, and bundles the proof of the agent's hard work
.

We hit 280 stars overnight after posting on HN — would love more feedback.

How it works

Three commands:

proofshot start --run "npm run dev" --port 3000

(agent navigates, clicks, takes screenshots)

proofshot stop

start opens a headless browser, begins video recording, and pipes your dev server logs. The agent interacts with the page through proofshot exec — clicking, typing, navigating. stop collects console errors, trims the video, and generates a self-contained HTML viewer with everything synced to a timeline.

The viewer is a single file — video playback with clickable action markers, console and server log tabs, screenshot gallery. Works offline, no dependencies. You can drop it on a PR and any reviewer sees exactly what happened.

proofshot pr uploads the whole bundle to a GitHub PR comment automatically.

What it catches

ProofShot collects console errors and matches server log errors across 10+ languages (JS, Python, Ruby, Go, Rust, Java, etc). So if the agent’s code throws a React error, a Python traceback, or a Go panic — it shows up in the viewer with timestamps.

Agent-agnostic

It’s just shell commands. Works with Claude Code, Cursor, Codex, Gemini CLI, Windsurf — anything that can call a terminal. Built on agent-browser from Vercel Labs, which uses compact element references instead of the full accessibility tree (~93% smaller than Playwright MCP’s output).

Why not Playwright?

Playwright is great for writing test scripts. ProofShot solves a different problem: the agent doesn’t write tests, it just operates the browser and records everything. No scripting, no assertions. The human reviews the evidence.

Think of it as the difference between automated testing and a screen recording of a QA session.

What it’s not

It’s not a testing framework. The agent doesn’t decide pass/fail. It gives you the recording, the errors, and the screenshots — you decide if the feature is correct.

Open source, MIT licensed.

I built a Chrome extension that lets you annotate localhost and have AI fix everything

Julien Berthomier — Tue, 24 Feb 2026 11:26:51 +0000

If you use AI coding tools (Cursor, Claude Code, Windsurf), you've probably hit this: you see something wrong in your UI, and now you have to describe it in text.

"The padding on the card component... it's in the dashboard page... the one with the user avatar... the spacing between the avatar and the name is too tight, and the border radius should match the other cards..."

By the time you've typed that, you could have just opened the CSS and fixed it.

What I built
Pointa is a Chrome extension that lets you click on any element in your localhost app and leave a visual annotation.

pointa.dev | GitHub: github.com/AmElmo/pointa-app

Behind the scenes, it captures:

Element CSS selector
Current CSS properties
Source file reference
Your annotation text
Optional screenshot
Your AI coding tool reads all annotations via MCP (Model Context Protocol) and implements the changes.

The bulk workflow
This is where it clicks.

Browse through your app. Click things that need fixing. Wrong color here. Bad padding there. Missing hover state. Broken alignment on mobile.

Build up 15-20 annotations across different pages.

Then tell your AI: "Please fix those annotations."

All changes. One command. No context switching between "reviewing the app" and "fixing the app."

Bug reports with context
For bugs, Pointa captures a full timeline:

Console errors and warnings
Network request failures
User interactions (clicks, inputs)
DOM state at the time of the bug
Backend logs (if you wrap your dev server with pointa dev)
Your AI gets the same context a senior developer would want when debugging — automatically.

Setup

Install Chrome extension from Chrome Web Store

Then:

npx pointa-server
That's it. Under a minute.

Privacy
Everything runs locally. Annotations live in ~/.pointa on your machine. No cloud, no accounts, no data leaving your machine.