DEV Community

Cover image for Your AI agent can't see half the internet
Jon Retting
Jon Retting

Posted on

Your AI agent can't see half the internet

Your agent's web tool sends an HTTP GET, parses the HTML, and hopes for the best. For Hacker News and Wikipedia, that works fine.

For the other half of the internet — JavaScript SPAs, bot-protected sites, live code editors, chat interfaces — it gets back an empty shell, a 403, or a cookie consent wall. Your agent apologizes and moves on.

I gave an agent a real Chromium browser and ran 16 retrieval tasks head-to-head. Here's what it found.


The scorecard

Task HTTP fetch vscreen Winner
X/Twitter profile FAIL (403) 2.6s PASS vscreen
Spotify album tracks FAIL (empty SPA) 2.1s PASS vscreen
Census.gov data table FAIL (403) 2.1s PASS vscreen
CodePen live editor FAIL (403) 1.7s PASS vscreen
Rust Playground code 15 chars 86 chars vscreen
YouTube video metadata 252 chars 1,911 chars vscreen
Hacker News front page 45ms PASS 1.7s PASS Tie
Wikipedia article 89ms PASS 2.1s PASS Tie

vscreen 5. HTTP 0. Ties on the server-rendered pages where both methods work. HTTP never won.

Full 16-task scorecard
| Task | HTTP | vscreen | Winner |
|------|------|---------|--------|
| HN front page | 45ms PASS | 1.7s PASS | Tie |
| X/Twitter profile | FAIL | 2.6s PASS | vscreen |
| NYTimes headlines | 76ms PASS | 2.1s PASS | Tie |
| StackOverflow Q&A | 234ms PASS | 1.8s PASS | Tie |
| Spotify tracks | FAIL | 2.1s PASS | vscreen |
| GitHub README | 581ms PASS | 1.7s PASS | Tie |
| Wikipedia | 89ms PASS | 2.1s PASS | Tie |
| LinkedIn | 624ms PASS | 2.4s PASS | Tie |
| Census.gov | FAIL (403) | 2.1s PASS | vscreen |
| YouTube | 252 chars | 1,911 chars | vscreen |
| CodePen | FAIL (403) | 1.7s PASS | vscreen |
| Svelte REPL | SSR fallback | PASS | Tie |
| Go Playground | SSR fallback | PASS | Tie |
| ChatGPT | FAIL (403) | FAIL (auth) | Tie |
| Rust Playground | 15 chars | 86 chars | vscreen |
| TS Playground | SSR fallback | PASS | Tie |


How fast


Text extraction from a loaded page: 4 milliseconds.

Navigate: 793ms. Screenshot: 94ms. Structured extraction: 100ms. Once a page is loaded, pulling all visible text is essentially free.

75 pages/min from one instance. vscreen runs up to 16 parallel browsers — scale linearly. 4 instances: 300 pages/min. Each instance uses ~200-400 MB of RAM depending on page complexity. The bottleneck is the internet, not vscreen.


I read the Rust Playground's code buffer

This is the part that surprised me. vscreen doesn't just render the page — it can reach into a code editor's internal state:

ace.edit(document.querySelector('.ace_editor')).getValue()
// Returns: "fn main() {\n    println!(\"Hello, world!\");\n}"
Enter fullscreen mode Exit fullscreen mode
Editor Framework What vscreen extracted
TypeScript Playground Monaco Editor buffer + language ID + file URI
Svelte REPL CodeMirror Source code, compiled JS, and CSS — all 3 panes
Rust Playground Ace Full code + ace/mode/rust metadata
Go Playground textarea package main + fmt.Println("Hello, 世界")

HTTP fetch gets the HTML shell. vscreen reads the editor's internal model — the same data the user is editing.


How this title was written

My AI agent used vscreen to browse dev.to, navigate to the top posts pages, and execute JavaScript on the rendered DOM to extract every title and reaction count from the last month. It analyzed 50+ top-performing posts, identified that the highest-engagement titles share three properties — under 12 words, challenge a reader assumption, create a curiosity gap — and generated the title you clicked on.

The tool researched its own article on the platform it's being published to. The entire analysis took under 2 minutes.

That's what a real browser gives an agent.


Try it

vscreen --dev --mcp-sse 0.0.0.0:8451
Enter fullscreen mode Exit fullscreen mode

Pre-built Linux binaries on the releases page.

Give your agent a real browser

GitHub logo jameswebb68 / vscreen

Give AI agents a real browser — streamed live over WebRTC. Captures headless Chromium, encodes H.264/VP9 + Opus audio, 47 MCP automation tools with live advisor, AI-driven page synthesis, multi-instance, bidirectional input. Watch your agents browse the real internet in real-time.

vscreen — Virtual Screen Media Bridge

Give AI agents a real browser. Watch them live. Control everything.

Download the latest release — pre-built binaries for Linux.

vscreen turns a headless Chromium into a remotely viewable, controllable, and AI-automatable virtual screen. It captures the browser viewport via Chrome DevTools Protocol, encodes H.264/VP9 video + Opus audio, and streams everything over WebRTC. Clients send mouse and keyboard input back through a DataChannel for full bidirectional interaction. 47 MCP tools let AI agents automate the browser programmatically — including the Synthesis Bubble system for AI-driven frontend page construction with one-shot multi-source web scraping.

 Xvfb + Chromium           vscreen                  Browser Client
 ┌──────────────┐    ┌─────────────────┐     ┌──────────────────────┐
 │  Renders web  │───>│ CDP screencast  │     │                      │
 │  page at      │    │ JPEG → I420     │     │  <video> element     │
 │  1920×1080    │    │ → H264/VP9      │────>│  shows remote screen │
 │               │    │                 │     │                      │
 │  PulseAudio

Top comments (1)

Collapse
 
lowjax profile image
Jon Retting

Make me a house on a painting website.