Task	HTTP fetch	vscreen	Winner
X/Twitter profile	FAIL (403)	2.6s PASS	vscreen
Spotify album tracks	FAIL (empty SPA)	2.1s PASS	vscreen
Census.gov data table	FAIL (403)	2.1s PASS	vscreen
CodePen live editor	FAIL (403)	1.7s PASS	vscreen
Rust Playground code	15 chars	86 chars	vscreen
YouTube video metadata	252 chars	1,911 chars	vscreen
Hacker News front page	45ms PASS	1.7s PASS	Tie
Wikipedia article	89ms PASS	2.1s PASS	Tie

Editor	Framework	What vscreen extracted
TypeScript Playground	Monaco	Editor buffer + language ID + file URI
Svelte REPL	CodeMirror	Source code, compiled JS, and CSS — all 3 panes
Rust Playground	Ace	Full code + `ace/mode/rust` metadata
Go Playground	textarea	`package main` + `fmt.Println("Hello, 世界")`

vscreen — Virtual Screen Media Bridge

Jon Retting · 2026-03-09T03:58:51Z

Your agent's web tool sends an HTTP GET, parses the HTML, and hopes for the best. For Hacker News and Wikipedia, that works fine. For the other half of the internet — JavaScript SPAs, bot-protected sites, live code editors, chat interfaces — it gets back an empty shell, a 403, or a cookie consent wall. Your agent apologizes and moves on. I gave an agent a real Chromium browser and ran 16 retrieval tasks head-to-head. Here's what it found. The scorecard Task HTTP fetch vscreen Winner X/Twitter profile FAIL (403) 2.6s PASS vscreen Spotify album tracks FAIL (empty SPA) 2.1s PASS vscreen Census.gov data table FAIL (403) 2.1s PASS vscreen CodePen live editor FAIL (403) 1.7s PASS vscreen Rust Playground code 15 chars 86 chars vscreen YouTube video metadata 252 chars 1,911 chars vscreen Hacker News front page 45ms PASS 1.7s PASS Tie Wikipedia article 89ms PASS 2.1s PASS Tie vscreen 5. HTTP 0. Ties on the server-rendered pages where both methods work. HTTP never won. Full 16-task scorecard | Task | HTTP | vscreen | Winner | |------|------|---------|--------| | HN front page | 45ms PASS | 1.7s PASS | Tie | | X/Twitter profile | FAIL | 2.6s PASS | vscreen | | NYTimes headlines | 76ms PASS | 2.1s PASS | Tie | | StackOverflow Q&A | 234ms PASS | 1.8s PASS | Tie | | Spotify tracks | FAIL | 2.1s PASS | vscreen | | GitHub README | 581ms PASS | 1.7s PASS | Tie | | Wikipedia | 89ms PASS | 2.1s PASS | Tie | | LinkedIn | 624ms PASS | 2.4s PASS | Tie | | Census.gov | FAIL (403) | 2.1s PASS | vscreen | | YouTube | 252 chars | 1,911 chars | vscreen | | CodePen | FAIL (403) | 1.7s PASS | vscreen | | Svelte REPL | SSR fallback | PASS | Tie | | Go Playground | SSR fallback | PASS | Tie | | ChatGPT | FAIL (403) | FAIL (auth) | Tie | | Rust Playground | 15 chars | 86 chars | vscreen | | TS Playground | SSR fallback | PASS | Tie | How fast Text extraction from a loaded page: 4 milliseconds. Navigate: 793ms . Screenshot: 94ms . Structured extraction: 100ms . Once a page is loaded, pulling all visible text is essentially free. 75 pages/min from one instance. vscreen runs up to 16 parallel browsers — scale linearly. 4 instances: 300 pages/min . Each instance uses ~200-400 MB of RAM depending on page complexity. The bottleneck is the internet, not vscreen. I read the Rust Playground's code buffer This is the part that surprised me. vscreen doesn't just render the page — it can reach into a code editor's internal state: ace . edit ( document . querySelector ( ' .ace_editor ' )). getValue () // Returns: "fn main() {\n println!(\"Hello, world!\");\n}" Enter fullscreen mode Exit fullscreen mode Editor Framework What vscreen extracted TypeScript Playground Monaco Editor buffer + language ID + file URI Svelte REPL CodeMirror Source code, compiled JS, and CSS — all 3 panes Rust Playground Ace Full code + ace/mode/rust metadata Go Playground textarea package main + fmt.Println("Hello, 世界") HTTP fetch gets the HTML shell. vscreen reads the editor's internal model — the same data the user is editing. How this title was written My AI agent used vscreen to browse dev.to, navigate to the top posts pages, and execute JavaScript on the rendered DOM to extract every title and reaction count from the last month. It analyzed 50+ top-performing posts, identified that the highest-engagement titles share three properties — under 12 words, challenge a reader assumption, create a curiosity gap — and generated the title you clicked on. The tool researched its own article on the platform it's being published to. The entire analysis took under 2 minutes. That's what a real browser gives an agent. Try it vscreen --dev --mcp-sse 0.0.0.0:8451 Enter fullscreen mode Exit fullscreen mode Pre-built Linux binaries on the releases page . Give your agent a real browser jameswebb68 / vscreen Give AI agents a real browser — streamed live over WebRTC. Captures headless Chromium, encodes H.264/VP9 + Opus audio, 47 MCP automation tools with live advisor, AI-driven page synthesis, multi-instance, bidirectional input. Watch your agents browse the real internet in real-time. vscreen — Virtual Screen Media Bridge Give AI agents a real browser. Watch them live. Control everything. Download the latest release — pre-built binaries for Linux. vscreen turns a headless Chromium into a remotely viewable, controllable, and AI-automatable virtual screen. It captures the browser viewport via Chrome DevTools Protocol, encodes H.264/VP9 video + Opus audio, and streams everything over WebRTC. Clients send mouse and keyboard input back through a DataChannel for full bidirectional interaction. 47 MCP tools let AI agents automate the browser programmatically — including the Synthesis Bubble system for AI-driven frontend page construction with one-shot multi-source web scraping. <div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content=" Xvfb + Chromium vscreen Browser Client ┌──────────────┐ ┌─────────────────┐ ┌──────────────────────┐ │ Renders web │───>│ CDP screencast │ │ │ │ page at │ │ JPEG → I420 │ │ <video> element │ │ 1920×1080 │ │ → H264/VP9 │────>│ shows remote screen │ │ │ │ │ │ │ │ PulseAudio │───>│ PA capture │ │ Audio playback │ │ audio output │ │ → Opus encode │────>│ │ │ │ │ │ │ Mouse + keyboard │ │ Receives │<───│ CDP input │<────│ via DataChannel │ │ input events │ │ dispatch │ │ │ └──────────────┘ └─────────────────┘ └──────────────────────┘ │ │ │ │ MCP (stdio / SSE / stdio-proxy) │ ▼ │ AI Automation Clients │ (Cursor, custom agents) │ │ RTSP (Opus audio) ▼ External consumers Synthesis Bubble (GStreamer, FFmpeg, VLC) ┌─────────────────┐ │ SvelteKit 5 dev │ │ server (HTTPS) │ │ AI-built pages │ │ 31 components │ └─────────────────┘"> Xvfb + Chromium vscreen Browser Client ┌──────────────┐ ┌─────────────────┐ ┌──────────────────────┐ │ Renders web │───>│ CDP screencast │ │ │ │ page at │ │ JPEG → I420 │ │ <video> element │ │ 1920×1080 │ │ → H264/VP9 │────>│ shows remote screen │ │ │ │ │ │ │ │ PulseAudio … View on GitHub

Give AI agents a real browser. Watch them live. Control everything.

Download the latest release — pre-built binaries for Linux.

vscreen turns a headless Chromium into a remotely viewable, controllable, and AI-automatable virtual screen. It captures the browser viewport via Chrome DevTools Protocol, encodes H.264/VP9 video + Opus audio, and streams everything over WebRTC. Clients send mouse and keyboard input back through a DataChannel for full bidirectional interaction. 47 MCP tools let AI agents automate the browser programmatically — including the Synthesis Bubble system for AI-driven frontend page construction with one-shot multi-source web scraping.

 Xvfb + Chromium           vscreen                  Browser Client
 ┌──────────────┐    ┌─────────────────┐     ┌──────────────────────┐
 │  Renders web  │───>│ CDP screencast  │     │                      │
 │  page at      │    │ JPEG → I420     │     │  <video> element     │
 │  1920×1080    │    │ → H264/VP9      │────>│  shows remote screen │
 │               │    │                 │     │                      │
 │  PulseAudio

…

DEV Community

Your AI agent can't see half the internet

The scorecard

How fast

I read the Rust Playground's code buffer

How this title was written

Try it

jameswebb68 / vscreen

Give AI agents a real browser — streamed live over WebRTC. Captures headless Chromium, encodes H.264/VP9 + Opus audio, 47 MCP automation tools with live advisor, AI-driven page synthesis, multi-instance, bidirectional input. Watch your agents browse the real internet in real-time.

vscreen — Virtual Screen Media Bridge

Top comments (1)