Tappi Is the Most Token-Efficient Browser Tool for AI Agents. Nothing Else Comes Close.
That's a dangerous claim. The AI browser automation market is projected to grow from $4.5 billion to $76.8 billion by 2034. Vercel Labs shipped Agent-Browser in January 2026 with a Rust CLI and 14,000+ GitHub stars. Microsoft launched @playwright/cli weeks later. Anthropic has Claude for Chrome with direct DOM access via a Chrome extension. Browser Use has 78,000+ stars. Stagehand has 21,000+.
And I'm saying a 200-line CDP tool with pip install tappi is better than all of them at the one thing that matters most for AI agents: how many tokens it takes to do useful work in a browser.
Let me prove it.
The Real Problem: Browsers Are a Token Furnace
Every AI agent that touches a browser faces the same bottleneck: the agent needs to understand what's on the page before it can act. The method you choose to represent that page to the LLM determines everything — how fast it works, how much it costs, and whether it even succeeds.
There are three approaches in the wild today, and two of them are on fire.
Approach 1: Screenshots (Vision Tax)
Send a full-page screenshot to the LLM. Let it "see" the page.
Tools: Anthropic Computer Use, OpenAI Operator
The problem: A single screenshot costs 5,000–10,000 tokens in vision processing. The model then has to guess pixel coordinates for where to click. It's like asking someone to operate a computer by describing screenshots over the phone. Computer Use benchmarks show success rates in the 50–70% range on real tasks — impressive for a first attempt, but fundamentally limited by the pixel-guessing paradigm.
Cost per page interaction: ~5,000–10,000 tokens
Approach 2: DOM/Accessibility Tree Dumps (Context Tax)
Extract the page's DOM or ARIA accessibility tree and send it to the LLM as structured text.
Tools: Playwright MCP, OpenClaw browser tool, Browser Use, Stagehand
The problem: A single content-rich page produces 15,000–50,000+ tokens of tree data. Reddit with its <shreddit-comment> shadow DOM components? 50K+ tokens for one page. The LLM reads an entire novel of nested elements just to find a button. Microsoft's own benchmarks show Playwright MCP consuming ~114,000 tokens for a typical browser task — over four pages, that's your entire context window gone.
Cost per page interaction: ~15,000–50,000+ tokens
Approach 3: Compact Element References (The Breakthrough)
Index the page's interactive elements into a compact list. Give the LLM only what it needs to act.
Tools: Tappi, Agent-Browser (Vercel Labs), @playwright/cli
This is the right idea. But the implementations vary wildly in how compact they actually are, what they can reach, and how they connect to the browser. That's where the real differentiation lives.
Cost per page interaction: ~200–2,000 tokens (varies by tool)
The Competitive Landscape: Everyone Who Matters
Let me give every major player their fair credit before I explain why tappi does it better.
🏢 Vercel's Agent-Browser (Jan 2026)
Agent-Browser is the closest thing to tappi in philosophy. Vercel Labs shipped it in January 2026 with a Rust CLI, a Node.js daemon, and a "Snapshot + Refs" system that uses @e1, @e2 references instead of full DOM trees. It claims 90% token reduction vs Playwright MCP and has earned 14,000+ GitHub stars.
Credit where it's due: Agent-Browser popularized the compact-refs concept in the AI tooling discourse. The Pulumi blog called it "one clever idea". It's a genuine step forward.
🏢 Microsoft's Playwright CLI (Feb 2026)
@playwright/cli is Microsoft's response to the token problem in their own Playwright MCP. Instead of streaming accessibility trees into the LLM's context, it saves YAML snapshots to disk and lets the agent decide what to read. Microsoft's benchmarks: ~27,000 tokens per task vs ~114,000 with MCP — a 4x improvement.
Smart architectural decision. Still 27K tokens, but the right direction.
🏢 Anthropic's Claude for Chrome
Claude for Chrome is a browser extension that gives Claude direct access to the page via read_page (accessibility tree), find (natural language element queries), computer (mouse/keyboard + screenshots), and javascript_tool (arbitrary JS execution). Reverse engineering shows it calls Claude's /v1/messages API in a tool-calling loop with a 40KB+ system prompt.
Impressive integration. But it's locked to Claude's ecosystem — no other LLM can use it.
🌐 The Rest
| Tool | Stars | Approach | Token Cost |
|---|---|---|---|
| Browser Use | 78K+ | Playwright + DOM extraction | High (full tree) |
| Stagehand | 21K+ | TypeScript SDK, act()/extract()/observe()
|
High (DOM + LLM reasoning per action) |
| Skyvern | 20K+ | Screenshots + DOM hybrid | Very high |
| Browserbase | — | Cloud infrastructure (pairs with Stagehand) | Depends on client |
| Steel | 6.4K+ | Open-source browser API | Depends on client |
All worthy projects. None of them solve the token efficiency problem at the level tappi does.
How Tappi Works — The Core Innovation
Here's what tappi returns when you run tappi elements on a page:
[0] (link) Skip to content
[1] (button) Toggle navigation
[2] (link) Homepage → https://github.com/
[3] (button) Platform
[4] (link) GitHub Copilot - Write better code with AI
[5] (link) GitHub Spark - Build and deploy intelligent apps
[6] (textbox) Search or jump to... :disabled
[7] (button) Sign in
The LLM says click 7. Done. ~200 tokens for a full page.
Here's what Agent-Browser returns for the same concept (agent-browser snapshot -i):
- navigation "Main":
- link "Homepage" @e1 → /
- button "Platform" @e2
- list:
- link "GitHub Copilot" @e3
- paragraph: "Write better code with AI"
- link "GitHub Spark" @e4
- paragraph: "Build and deploy intelligent apps"
- search:
- searchbox "Search or jump to..." @e5
- link "Sign in" @e6
- main:
- heading "agent-browser" [level=1]
- paragraph: "Headless browser automation CLI..."
...
The LLM says click @e6. Same result — but the snapshot is an accessibility tree, not a flat list. It includes:
- Hierarchical nesting (navigation → list → items)
- Non-interactive elements (paragraphs, headings, sections)
- Structural markup (indentation, YAML formatting)
A real page's Agent-Browser snapshot runs 1,000–3,000+ tokens. Tappi's element list for the same page: 100–300 tokens.
That's not a rounding error. That's a 5–10x difference between the two tools that are both supposedly "compact."
Why the Difference Is Structural, Not Cosmetic
The gap isn't about formatting preferences. It's about a fundamental design choice:
| Design Decision | Tappi | Agent-Browser |
|---|---|---|
| What's indexed | Only interactive elements (buttons, links, inputs) | Full accessibility tree (including paragraphs, headings, sections) |
| Structure | Flat numbered list | Hierarchical YAML tree |
| Element format | [3] (button) Submit Order |
- button "Submit Order" @e3 + nested children |
| Non-actionable content | Excluded entirely — use tappi text separately when needed |
Included in every snapshot |
| Tokens per element | ~5–10 | ~15–40 (with hierarchy + children) |
Tappi separates what you can do (elements) from what you can read (text). The LLM gets the action list first. If it needs page content, it calls tappi text — a separate, targeted extraction. Agent-Browser merges both into one snapshot, so the LLM always pays for everything whether it needs it or not.
This separation is the core architectural insight. It's why tappi can represent a 50-element Reddit page in ~300 tokens while Agent-Browser needs ~2,000+ for the same page.
The Full Comparison Table
| Dimension | Tappi | Agent-Browser (Vercel) | Playwright CLI (Microsoft) | Claude for Chrome | Playwright MCP | Browser Use |
|---|---|---|---|---|---|---|
| Tokens per page | ~200 | ~1,000–3,000 | ~5,000–27,000 (saved to disk) | Unknown (a11y tree + screenshots) | ~15,000–50,000 | ~15,000–50,000 |
| Protocol | Raw CDP | Playwright (via Node.js daemon) | Playwright | Chrome Extension APIs | Playwright | Playwright |
| Middleware layers | 0 (direct CDP) | 3 (Rust CLI → Node.js daemon → Playwright) | 2 (CLI → Playwright) | 1 (Extension APIs) | 1 (MCP → Playwright) | 1 (Python → Playwright) |
| Shadow DOM | ✅ Pierces automatically | ❌ Not documented | Via Playwright (partial) | Via javascript_tool
|
Via Playwright (partial) | Via Playwright (partial) |
| Real browser sessions | ✅ Your Chrome, your cookies | ❌ Launches own Chromium | ❌ Launches own Chromium | ✅ Your Chrome | Depends on config | ❌ Fresh instances |
| Bot detection risk | None (real browser) | High (headless Chromium) | High (headless Chromium) | None (extension) | High | High |
| Model lock-in | Any LLM | Any LLM | Any coding agent | ❌ Claude only | Any MCP client | Any LLM |
| Surfaces | CLI + Python lib + MCP server + Web UI + AI agent | CLI only | CLI only | Chrome extension only | MCP only | Python framework |
| Install | pip install tappi |
npm install -g agent-browser + Rust + Chromium download |
npm install -g @playwright/cli |
Chrome Web Store | npx @playwright/mcp |
pip install browser-use |
| Cross-origin iframes | ✅ Coordinate commands | Not documented | Via Playwright | Via computer tool |
Via Playwright | Via Playwright |
The Benchmark: Real Numbers from Real Tasks
Head-to-Head: Tappi vs Agent-Browser (Same Task, Same Browser, Same Model)
I ran both tools on the exact same workflow — same Chrome instance (CDP port 18800), same model (Claude Sonnet 4.6), same two tasks:
- Google Maps: Search "plumbers in Houston TX," extract top 5 businesses, save JSON
- Gmail: Compose and send an email with the results to a real address
Both ran as isolated sub-agents with no human intervention. Here's what happened:
| Metric | Tappi | Agent-Browser |
|---|---|---|
| Total tokens | 28,704 | 58,377 |
| Time to complete | 3 min | 7 min 12s |
| Maps crawl | ✅ | ✅ |
| Gmail send | ✅ (verified body) | ✅ (with extensive workarounds) |
| Screenshots taken | 0 | 7 (~200KB each, vision tokens) |
| JavaScript eval fallbacks | 1 (body recovery) | 15+ (entire compose via eval) |
| Token ratio | 1× | 2.03× |
HEAD-TO-HEAD: SAME TASKS · SAME MODEL · SAME BROWSER
Tappi 🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩░░░░░░░░░░ 28,704 tokens · 3 min
Agent-Browser 🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥 58,377 tokens · 7m 12s
Tappi: 2× fewer tokens. 2.4× faster.
The killer finding: Agent-Browser's accessibility tree snapshot cannot see Gmail's compose dialog. The floating overlay is invisible to its snapshot command. So the agent had to fall back to raw JavaScript for the entire email composition — probing DOM structure, dispatching keyboard events character-by-character, reading screenshots through the vision model, debugging two accidentally-opened compose windows.
Tappi? elements → sees [6] (textbox) Message Body → types into it → verifies content landed → clicks Send. Five commands.
Agent-Browser's Maps snapshot was also telling: ~120+ lines with full URLs, ad tracking links, and hierarchical YAML nesting. Tappi's text extraction for the same Maps page: one call, clean text, all 5 businesses extracted immediately.
Prior Benchmarks (3-Task Suite)
In a broader controlled benchmark — same model (Claude Sonnet 4.6), same thinking level, same tasks — here's what happened:
Task: Reddit Data Extraction (5 posts, top comments)
| Tool | Context Tokens | Time | Result |
|---|---|---|---|
| Tappi | 21K | 1m52s | ✅ Correct data, real human comments |
| OpenClaw Browser Tool | 118K | 3m00s | ✅ Correct data (5.6× more tokens) |
| Playwright (scripting) | 14K | 1m02s | ⚠️ Wrong data — bot comments on 4/5 posts |
| playwright-cli | 21K | 2m22s | ❌ Blocked by Reddit bot detection |
Task: Gmail (Authenticated Email)
| Tool | Context Tokens | Time | Result |
|---|---|---|---|
| Tappi | 18K | 1m10s | ✅ Sent email successfully |
| OpenClaw Browser Tool | 68K | 3m13s | ✅ Sent email (3.8× more tokens) |
| Playwright | — | — | ❌ Failed — couldn't authenticate |
| playwright-cli | — | — | ❌ Failed — couldn't authenticate |
Task: GitHub PR Data (Authenticated)
| Tool | Context Tokens | Time | Result |
|---|---|---|---|
| Tappi | 20K | 1m11s | ✅ Extracted PR data |
| OpenClaw Browser Tool | 66K | 2m25s | ✅ Extracted PR data (3.3× more tokens) |
| Playwright | 30K | 2m40s | ✅ Worked (public data) |
| playwright-cli | 31K | 1m14s | ✅ Worked (public data) |
Totals Across All 3 Tasks
3-TASK BENCHMARK: Reddit + Gmail + GitHub
Tappi 🟩🟩🟩🟩🟩░░░░░░░░░░░░░░░░░░░░░ 59K tokens 3/3 ✅
playwright 🟧🟧🟧🟧░░░░░░░░░░░░░░░░░░░░░░░ 44K tokens 1/3 ⚠️
pw-cli 🟧🟧🟧🟧🟧░░░░░░░░░░░░░░░░░░░░░ 52K tokens 1/3 ❌
Browser Tool 🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪 252K tokens 3/3 ✅
Tappi: only tool to go 3/3 with correct data at reasonable token cost.
Playwright scripting was cheaper on tokens but got wrong answers on 4 out of 5 Reddit posts (captured automod bot comments instead of real human comments) and couldn't authenticate anywhere. playwright-cli got CAPTCHA'd by Reddit on the first page. The OpenClaw browser tool succeeded on everything but burned 4.3× more tokens.
Why Raw CDP Matters
Every tool except tappi runs Playwright underneath. Playwright is an excellent browser automation framework — for testing. But it adds layers:
Agent → CLI → Playwright → CDP → Browser
vs.
Agent → Tappi → CDP → Browser
Those layers cost you:
Startup overhead. Agent-Browser needs a Rust CLI + Node.js daemon + Playwright launch. Tappi connects to an already-running Chrome via CDP — instant.
Abstraction leakage. Playwright's accessibility tree is designed for testing, not for LLM consumption. It includes structural metadata (roles, states, properties) that testers need but agents don't.
Session isolation. Playwright launches its own Chromium by default. That means no saved sessions, no cookies, no extensions. You're starting from scratch every time — and headless Chromium has a detectable fingerprint that triggers bot detection on Reddit, Gmail, and dozens of other sites.
Shadow DOM handling. Playwright has limited shadow DOM support — it can locate elements in open shadow roots but doesn't automatically traverse them. Tappi evaluates JavaScript directly via CDP's
Runtime.evaluate, which pierces all shadow boundaries. Reddit's<shreddit-comment>components, GitHub's<include-fragment>elements, Gmail's deeply nested shadow roots — tappi sees them all.
What About Claude for Chrome?
Claude for Chrome deserves its own section because it's the most interesting comparison.
It uses the Chrome Extension API to inject directly into pages — like tappi, it has access to the real browser with real sessions. Its tool set includes read_page (accessibility tree), find (natural language element queries), computer (mouse/keyboard + screenshots), and javascript_tool (arbitrary JS).
What it does well:
- Real browser sessions (your cookies, your logins)
-
javascript_toolfor arbitrary DOM access -
findfor natural language element location - Native Chrome integration, no setup
Where tappi differs:
- Model-agnostic. Claude for Chrome works with Claude only. Tappi works with any LLM — Anthropic, OpenAI, Google, local models, anything.
- Multi-surface. Claude for Chrome is an extension. Tappi is a CLI + Python library + MCP server + Web UI + standalone AI agent.
-
Token efficiency. Claude for Chrome's
read_pagereturns a full accessibility tree — the same approach that costs 15,000+ tokens per page with Playwright MCP. Itscomputertool sends screenshots. These are the two most expensive representation methods. - Programmable automation. Tappi has cron scheduling, file management, PDF generation, spreadsheet support. Claude for Chrome is conversational-first.
Claude for Chrome is a great product for interactive Claude users. Tappi is infrastructure for anyone building AI agents that need to browse.
The Architecture That Makes It Possible
┌─────────────────────────────────────────┐
│ AI Agent / LLM │
│ (Any model: Claude, GPT, Gemini, etc.) │
└────────────────┬────────────────────────┘
│ "click 7"
▼
┌─────────────────────────────────────────┐
│ Tappi │
│ • elements → flat indexed list (~200t) │
│ • text → page content on demand │
│ • click/type → direct CDP commands │
│ • Shadow DOM piercing via JS eval │
└────────────────┬────────────────────────┘
│ CDP WebSocket
▼
┌─────────────────────────────────────────┐
│ Your Chrome Browser │
│ • Saved sessions & cookies │
│ • Extensions │
│ • Real fingerprint (no bot detection) │
└─────────────────────────────────────────┘
No Playwright. No Puppeteer. No daemon. No Rust CLI. No YAML snapshots. No accessibility tree. Just CDP over a WebSocket to your already-running Chrome.
The simplicity is the feature.
Addressing the Skeptics
"But Agent-Browser has 14K stars and Vercel behind it."
And it earned them. The snapshot + refs idea is genuinely good. But stars measure awareness, not token efficiency. In a live head-to-head benchmark on the same browser, same model, same tasks — agent-browser used 2× more tokens and took 2.4× longer than tappi. Its accessibility tree couldn't even see Gmail's compose dialog. Stars don't ship emails.
"Playwright CLI saves snapshots to disk. Tokens don't count if they're on disk."
They count the moment the agent reads them — and it has to read them to know what to click. A 5,000-token YAML file on disk is still 5,000 tokens in context when the agent needs it. The savings are real (vs MCP's inline dumps) but the snapshot itself is still an accessibility tree. Tappi's element list is 200 tokens whether it's inline or on disk.
"You're comparing against tools launched the same month."
Yes. All three — tappi, Agent-Browser, Playwright CLI — shipped in January–February 2026. The AI browser automation space converged on the same insight simultaneously: stop dumping full page representations to the LLM. The question is who executed the idea best. I'm arguing tappi did, and the token counts back it up.
"What about scale? Enterprise? Cloud?"
Tappi is a local-first tool. It's not trying to be Browserbase (cloud browser infrastructure) or Stagehand (enterprise SDK). It's trying to be the most efficient way to let an AI agent interact with a browser. If you need cloud-scale browser farms, use Browserbase. If you need an efficient agent-browser interface to put inside that infrastructure, use tappi.
The Claim, Restated
Tappi is the most token-efficient browser control tool for AI agents available today.
- ~200 tokens per page vs ~1,000–3,000 (Agent-Browser) vs ~5,000–27,000 (Playwright CLI) vs ~15,000–50,000 (Playwright MCP / Browser Use)
- Raw CDP — zero middleware between the agent and the browser
- Shadow DOM piercing — automatic, no configuration, works on Reddit/GitHub/Gmail
- Real browser sessions — your Chrome, your cookies, no bot detection
- Model-agnostic — any LLM, any provider
-
Multi-surface — CLI, Python library, MCP server, Web UI, standalone AI agent, all from
pip install tappi
Open source. MIT licensed. github.com/shaihazher/tappi
pip install tappi
tappi launch
tappi open reddit.com
tappi elements # ~200 tokens. That's it.
Previously: Tappi: Your Browser on Autopilot · Every AI Browser Tool Is Broken Except One · Tappi MCP Is Live
Top comments (0)