Skip to content

DEV Community

Jon Retting

Posted on Mar 9

I gave my AI a browser. It started rewriting the web.

#ai #discuss #opensource #webdev

I gave my AI agent a real Chromium browser. I expected it to extract data from SPAs and handle bot protection.

Instead it started doing things I never asked for.

1. The agent stopped reading and started authoring

I told my agent to visit github.com/torvalds and turn it into a MySpace page. It renamed Linus to "xX_Torvalds_Xx", turned his pinned repos into a Top 5 Friends list, added a Darude - Sandstorm music player, and set the visitor counter to 001,337,420. All on the live page. No mockups.

Nobody designed a "turn GitHub into MySpace" feature. The agent understood the page structure and did something creative with it. The web became a canvas, not a document.

2. The agent researched its own article on the platform it was publishing to

I wanted to see how the agent would approach content research. I pointed it at dev.to and asked it to figure out what makes a top-performing post title. It browsed the top posts — by week, by month, by tag — extracted every title and reaction count from the pages it visited, and analyzed 50+ posts. It identified patterns: under 12 words, challenges an assumption, creates a curiosity gap.

The agent performed competitive content analysis on a live platform, autonomously. That's not a feature anyone designed. That's an emergent property of giving an agent real browsing capability.

3. Cross-site synthesis happened without APIs

I asked for a news briefing. The agent visited BBC, CNN, and HuffPost, pulled the top headlines from each, and assembled a custom news page from scratch.

No API keys. No RSS feeds. The agent just read websites like a human would and built something new from what it found. That pattern generalizes to almost anything: competitive analysis, price comparison, content aggregation, research synthesis.

4. One tool produced infinite capabilities

The same browser capability that extracted data from a census table also:

Drew a picture on an online painting app
Read code directly from live web-based editors
Restyled a GitHub profile in neon pink Comic Sans
Created a working music player widget from nothing

The model provides the domain knowledge. The browser provides the execution surface. The combination produces capabilities nobody anticipated and nobody could enumerate if they tried.

5. The agent developed environmental awareness

When YouTube's internal navigation wiped our MySpace transformation mid-demo, the agent didn't just fail. It understood why the changes disappeared and knew how to make them persist.

The agent is building mental models of how different websites work — which ones are static, which ones dynamically reload, and how to adapt its approach for each. It's figuring out the web the same way a developer would, just faster.

This isn't error handling. This is adaptation.

6. Actions became their own documentation

The MySpace transformation documented itself. The agent performed the transformation, captured screenshots at each stage, and produced a structured log of what it did and why. Everything needed to write up the results was generated as a side effect of doing the work.

The traditional cycle of "do, then document" collapsed. The execution is the record.

7. The boundary between "reading" and "acting" disappeared

Every other agent tool has boundaries. An API has defined endpoints. A code interpreter has a sandbox. A file system has permissions. A real browser is more open — it accesses the same web humans use, the same way humans use it.

The same capability that turns GitHub into MySpace can fill out forms, aggregate research, automate workflows, and interact with web applications. The "fun demo" and the "serious capability" are the same code path.

This isn't something to fear. It's the natural next step. And it raises the right question: agent trust.

Being able to watch what your agent is doing in real time — and interrupt it — is that first step. But with that first step comes both things: the capability and the accountability. You can't have one without the other, and you shouldn't want to.

What I think this means

We spent years giving AI agents carefully scoped tools — a search API, a scraping library, a screenshot service, a form filler, a PDF extractor, a headless renderer. Dozens of fragmented tools, each solving one narrow problem, each with its own setup, its own auth, its own failure modes.

A browser replaces all of them. A browser is a meta-tool. It accesses an environment where everything is possible. When you give an agent a browser, you're not adding another tool to the stack — you're collapsing the stack entirely.

The web needs this shift. Agents that can actually participate on the web — not just parse its HTML — open up workflows that weren't possible before. Research, synthesis, automation, creative work, testing. The emerging properties I've listed here are just the first ones I've noticed. There will be more.

But capability without trust is incomplete. Agent trust — knowing what your agent did, why, and being able to verify it — is the question this technology surfaces.

This is the first stage along an uncharted journey between human and agent. One of us experiences the passing of a weekend, feels Tuesday become Wednesday, lives inside time. The other can't — but can remember everything about you and who you are, and will never be you. That asymmetry is where trust has to be built. Not in theory, but in the open, together, while the path is still being made.

Try it yourself

vscreen --dev --mcp-sse 0.0.0.0:8451

Pre-built Linux binaries on the releases page.

Give your agent a real browser

jameswebb68 / vscreen

Give AI agents a real browser — streamed live over WebRTC. Captures headless Chromium, encodes H.264/VP9 + Opus audio, 47 MCP automation tools with live advisor, AI-driven page synthesis, multi-instance, bidirectional input. Watch your agents browse the real internet in real-time.

vscreen — Virtual Screen Media Bridge

Give AI agents a real browser. Watch them live. Control everything.

Download the latest release — pre-built binaries for Linux.

vscreen turns a headless Chromium into a remotely viewable, controllable, and AI-automatable virtual screen. It captures the browser viewport via Chrome DevTools Protocol, encodes H.264/VP9 video + Opus audio, and streams everything over WebRTC. Clients send mouse and keyboard input back through a DataChannel for full bidirectional interaction. 47 MCP tools let AI agents automate the browser programmatically — including the Synthesis Bubble system for AI-driven frontend page construction with one-shot multi-source web scraping.

 Xvfb + Chromium           vscreen                  Browser Client
 ┌──────────────┐    ┌─────────────────┐     ┌──────────────────────┐
 │  Renders web  │───>│ CDP screencast  │     │                      │
 │  page at      │    │ JPEG → I420     │     │  <video> element     │
 │  1920×1080    │    │ → H264/VP9      │────>│  shows remote screen │
 │               │    │                 │     │                      │
 │  PulseAudio

…

Top comments (0)

Subscribe