vscreen gives AI agents a real Chromium browser, streamed live over WebRTC. The first release had 63 tools. They worked — but agents kept chaining them inefficiently. Three round-trips to click a button and see what happened.
0.2.0 consolidates 63 tools into 47 with a two-layer architecture, adds a live advisor that catches mistakes in real-time, and introduces a synthesis system that builds websites from scraped data.
Two layers, one fast path
| Layer | Tools | Purpose |
|---|---|---|
| Workflow |
browse, observe, interact, extract, solve_challenge
|
Entire workflows in a single call |
| Precision |
click, type, find, wait, scroll, etc. |
Exact control when needed |
vscreen_browse navigates, waits, dismisses cookie banners, screenshots, and returns page info — one call instead of four. vscreen_interact clicks by visible text, returns a screenshot after. vscreen_extract pulls structured data in six modes: articles, table, kv, stats, links, or auto.
Fast path for 80% of tasks. Drop to precision tools when needed.
The advisor
The MCP server tracks every tool call in a sliding window and returns inline hints when it detects anti-patterns:
| Pattern detected | Hint |
|---|---|
| click → wait → screenshot | vscreen_interact does this in one call |
| scroll → screenshot loop | Use full_page=true instead |
| Repeated fixed waits | Use condition="text" or "selector" |
| JS for built-in operations | Use vscreen_get_page_info instead |
| 5+ calls without Layer 1 | Try vscreen_browse or vscreen_interact |
One-shot, contextual. Plus vscreen_plan("fill out the form") for step-by-step tool recipes and vscreen_help(topic=...) for built-in docs. The server is self-documenting.
Synthesis Bubble
AI agents build live web pages from scraped data. One call does everything:
vscreen_synthesis_scrape_and_create({
"instance_id": "dev",
"title": "Tech News Roundup",
"urls": [
{ "url": "https://arstechnica.com", "limit": 8, "source_label": "Ars" },
{ "url": "https://techcrunch.com", "limit": 8, "source_label": "TC" },
{ "url": "https://theverge.com", "limit": 8, "source_label": "Verge" }
]
})
Three ephemeral tabs open in parallel. The page builds live via SSE as each source finishes. Component type auto-selected: 1–3 → hero, 4–12 → card grid, 13+ → content list.
31 Svelte 5 components — card grids, sortable tables, bar/line/pie charts (raw SVG), timelines, accordions, code blocks, image galleries, and more. The scraper runs 5 strategies (JSON-LD, <article>, heading+link, card heuristics, OpenGraph) with ad filtering, quality scoring, and timeout budgets.
Key fixes
| Fix | Detail |
|---|---|
| Zombie processes | Aggressive process group kills + lock file cleanup |
| CAPTCHA solver | 2-phase vision: crop header → identify target, then tiles in batches of 3 |
| Orphaned tasks |
AbortOnDrop RAII guards on all spawned tokio tasks |
| Grid math |
saturating_sub prevents u32 underflow panics |
Try it
./target/release/vscreen --dev --synthesis --mcp-sse 0.0.0.0:8451
Pre-built Linux binaries on the releases page. Full changelog in the repo.
GitHub: github.com/jameswebb68/vscreen
Top comments (0)