Your agent's web tool sends an HTTP GET, parses the HTML, and hopes for the best. For Hacker News and Wikipedia, that works fine.
For the other half of the internet — JavaScript SPAs, bot-protected sites, live code editors, chat interfaces — it gets back an empty shell, a 403, or a cookie consent wall. Your agent apologizes and moves on.
I gave an agent a real Chromium browser and ran 16 retrieval tasks head-to-head. Here's what it found.
The scorecard
| Task | HTTP fetch | vscreen | Winner |
|---|---|---|---|
| X/Twitter profile | FAIL (403) | 2.6s PASS | vscreen |
| Spotify album tracks | FAIL (empty SPA) | 2.1s PASS | vscreen |
| Census.gov data table | FAIL (403) | 2.1s PASS | vscreen |
| CodePen live editor | FAIL (403) | 1.7s PASS | vscreen |
| Rust Playground code | 15 chars | 86 chars | vscreen |
| YouTube video metadata | 252 chars | 1,911 chars | vscreen |
| Hacker News front page | 45ms PASS | 1.7s PASS | Tie |
| Wikipedia article | 89ms PASS | 2.1s PASS | Tie |
vscreen 5. HTTP 0. Ties on the server-rendered pages where both methods work. HTTP never won.
Full 16-task scorecard
| Task | HTTP | vscreen | Winner |
|------|------|---------|--------|
| HN front page | 45ms PASS | 1.7s PASS | Tie |
| X/Twitter profile | FAIL | 2.6s PASS | vscreen |
| NYTimes headlines | 76ms PASS | 2.1s PASS | Tie |
| StackOverflow Q&A | 234ms PASS | 1.8s PASS | Tie |
| Spotify tracks | FAIL | 2.1s PASS | vscreen |
| GitHub README | 581ms PASS | 1.7s PASS | Tie |
| Wikipedia | 89ms PASS | 2.1s PASS | Tie |
| LinkedIn | 624ms PASS | 2.4s PASS | Tie |
| Census.gov | FAIL (403) | 2.1s PASS | vscreen |
| YouTube | 252 chars | 1,911 chars | vscreen |
| CodePen | FAIL (403) | 1.7s PASS | vscreen |
| Svelte REPL | SSR fallback | PASS | Tie |
| Go Playground | SSR fallback | PASS | Tie |
| ChatGPT | FAIL (403) | FAIL (auth) | Tie |
| Rust Playground | 15 chars | 86 chars | vscreen |
| TS Playground | SSR fallback | PASS | Tie |
How fast
Navigate: 793ms. Screenshot: 94ms. Structured extraction: 100ms. Once a page is loaded, pulling all visible text is essentially free.
75 pages/min from one instance. vscreen runs up to 16 parallel browsers — scale linearly. 4 instances: 300 pages/min. Each instance uses ~200-400 MB of RAM depending on page complexity. The bottleneck is the internet, not vscreen.
I read the Rust Playground's code buffer
This is the part that surprised me. vscreen doesn't just render the page — it can reach into a code editor's internal state:
ace.edit(document.querySelector('.ace_editor')).getValue()
// Returns: "fn main() {\n println!(\"Hello, world!\");\n}"
| Editor | Framework | What vscreen extracted |
|---|---|---|
| TypeScript Playground | Monaco | Editor buffer + language ID + file URI |
| Svelte REPL | CodeMirror | Source code, compiled JS, and CSS — all 3 panes |
| Rust Playground | Ace | Full code + ace/mode/rust metadata |
| Go Playground | textarea |
package main + fmt.Println("Hello, 世界")
|
HTTP fetch gets the HTML shell. vscreen reads the editor's internal model — the same data the user is editing.
How this title was written
My AI agent used vscreen to browse dev.to, navigate to the top posts pages, and execute JavaScript on the rendered DOM to extract every title and reaction count from the last month. It analyzed 50+ top-performing posts, identified that the highest-engagement titles share three properties — under 12 words, challenge a reader assumption, create a curiosity gap — and generated the title you clicked on.
The tool researched its own article on the platform it's being published to. The entire analysis took under 2 minutes.
That's what a real browser gives an agent.
Try it
vscreen --dev --mcp-sse 0.0.0.0:8451
Pre-built Linux binaries on the releases page.
Give your agent a real browser
jameswebb68
/
vscreen
Give AI agents a real browser — streamed live over WebRTC. Captures headless Chromium, encodes H.264/VP9 + Opus audio, 47 MCP automation tools with live advisor, AI-driven page synthesis, multi-instance, bidirectional input. Watch your agents browse the real internet in real-time.
vscreen — Virtual Screen Media Bridge
Give AI agents a real browser. Watch them live. Control everything.
Download the latest release — pre-built binaries for Linux.
vscreen turns a headless Chromium into a remotely viewable, controllable, and AI-automatable virtual screen. It captures the browser viewport via Chrome DevTools Protocol, encodes H.264/VP9 video + Opus audio, and streams everything over WebRTC. Clients send mouse and keyboard input back through a DataChannel for full bidirectional interaction. 47 MCP tools let AI agents automate the browser programmatically — including the Synthesis Bubble system for AI-driven frontend page construction with one-shot multi-source web scraping.
Xvfb + Chromium vscreen Browser Client
┌──────────────┐ ┌─────────────────┐ ┌──────────────────────┐
│ Renders web │───>│ CDP screencast │ │ │
│ page at │ │ JPEG → I420 │ │ <video> element │
│ 1920×1080 │ │ → H264/VP9 │────>│ shows remote screen │
│ │ │ │ │ │
│ PulseAudio…
Top comments (1)
Make me a house on a painting website.