I use Claude Code 12+ hours a day.
An AI that lives in the terminal has no eyes. It can't see websites. It can't click buttons. It can't fill forms. It can't even verify what a page looks like after deployment.
An AI without browser access is like a chef who can't taste their own food.
So from February to March 2026, I tested every browser automation tool available for Claude Code. Chrome DevTools MCP, Claude in Chrome extension, WebFetch, agent-browser, PinchTab, browser-use.
Honestly, I had to try all of them before I could reach a conclusion.
This article is a record of that journey and the final verdict.
Why Does an AI Even Need a Browser?
Claude Code runs in the terminal. It can read and write files, execute commands, manage Git — all good.
But it can't see the web.
| Task | Terminal Only | With a Browser |
|---|---|---|
| Post-deploy verification | Read logs and guess | See the actual page |
| Read Twitter/Instagram posts | Impossible | Extract text |
| Test a web app | curl the API only | Click buttons and verify |
| Fill forms (job applications, etc.) | Impossible | Auto-fill |
| Take screenshots | Impossible | Capture and visually confirm |
| Access auth-gated pages | Impossible | Use cookies |
You might think "logs are enough." But when you're using it 12 hours a day, the frustration of having no eyes compounds fast.
Phase 1: Chrome DevTools MCP (February 2026)
The first thing I tried was the official approach.
Launch Chrome with the --remote-debugging-port=9222 flag, then control it from Claude Code through an MCP server. Built by the official Chrome DevTools team.
Perfect in theory. Brutal in practice.
The Pitfalls
| Problem | Details |
|---|---|
Windows npx
|
Doesn't work. Needs a cmd /c wrapper |
| Separate profile | Launches a debug Chrome. No cookies, no logins, re-authenticate everything |
| Context consumption | MCP JSON payloads are massive. 10,000+ tokens per page |
| Text input bug |
fill tool drops the first character
|
| Multi-line text | Completely broken |
| Startup overhead | Close Chrome and relaunch with the debug flag every time |
When you use Claude Code 12 hours a day, relaunching Chrome with special flags every session is torture.
Phase 2: Claude in Chrome Extension (February 2026)
I tried v1.0.54 Beta.
It runs as a Chrome extension, so it uses your actual browser profile. Cookies and login sessions carry over. Setup is just installing the extension.
Good idea. Beta quality.
| Pros | Cons |
|---|---|
| Uses real browser cookies | Disconnects mid-session randomly |
| Zero config | Text input bug still present |
| Intuitive to use | Multi-line text breaks |
It could become something great if stabilized. But as of February 2026, it wasn't production-ready.
Phase 3: WebFetch Hell
"Maybe an existing built-in tool can handle this."
Claude Code has a built-in tool called WebFetch. Give it a URL and it fetches the HTML.
For static documentation pages, it works fine.
For social media, it's hell.
When I tried reading Instagram with WebFetch, I got CSS and JavaScript garbage back. Almost no usable text. Twitter was the same. Any dynamically rendered page was a total loss.
I told Claude "don't use WebFetch for SNS" over and over. Literally, across multiple sessions.
Using WebFetch for Instagram is like trying to grill a steak in a microwave.
Phase 4: agent-browser (Late February 2026)
Built by Vercel Labs. Rust CLI + Playwright backend.
This was the first time I saw light.
npm install -g agent-browser
agent-browser install
Context consumption dropped 93% compared to Chrome DevTools MCP. Instead of heavy MCP JSON, it outputs compact text. Operated via shell commands.
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser close
Auth Vault for saving credentials. Network mocking. Visual diffs.
Windows Gotcha
It wasn't smooth on Windows though.
Rust's canonicalize() generates \\?\ UNC paths, which crash Node.js. You need to set the AGENT_BROWSER_HOME environment variable as a workaround.
I wrote about this in detail in a previous article.
agent-browser's Limitations
I used it as my main tool for a while. But frustrations remained.
- 3,000-5,000 tokens per page. Light enough, but could be lighter
- Launches Chromium every time. Cookies don't persist across sessions (Auth Vault helps but adds friction)
- Still too heavy for SNS text extraction
Phase 5: PinchTab — The Game Changer (March 2026)
PinchTab changed my standards for browser automation.
It runs as a local HTTP server and uses Chrome's accessibility tree to parse pages.
About 800 tokens per page.
Compare that to agent-browser's 3,000-5,000 — several times lighter. Compared to Chrome DevTools MCP's 10,000+, that's a 12x difference.
Setup
# Headless mode (background daemon)
pinchtab &
# Headed mode (see the browser)
BRIDGE_HEADLESS=false pinchtab &
Launch once and it stays running for the entire session. HTTP server on port 9867. No restarts needed.
Basic Workflow
pinchtab nav https://example.com
sleep 3
pinchtab snap -i -c # Compact view of interactive elements
pinchtab click e5 # Click element e5
pinchtab type e12 "text" # Type into element e12
Why It's Fast
| Method | Tokens | Use Case |
|---|---|---|
pinchtab text |
~800 | Text extraction (SNS, articles) |
pinchtab snap -i -c |
~2,000 | Button/link interaction |
pinchtab snap --diff |
Diff only | Multi-step sequential operations |
pinchtab snap (full) |
~10,500 | Full page understanding |
pinchtab ss (screenshot) |
~2,000 (Vision) | Visual verification |
Reading SNS with pinchtab text costs 800 tokens. This changed everything.
Reading Twitter posts. Checking Instagram profiles. Verifying pages after deployment. All done in 800 tokens.
Context is a battery. A tool that burns 10,000 tokens is a space heater. PinchTab at 800 tokens is an LED bulb. Same battery, 12x the runtime.
The HTTP API (port 9867) also means you can integrate it into bots and automation pipelines.
Phase 6: browser-use — The Missing Piece (March 2026)
PinchTab solved everyday browser tasks.
But there was one thing PinchTab made tedious: complex form filling.
Job application forms. 10+ fields. Dropdowns, radio buttons, textareas. PinchTab can do it, but repeating snap -> type -> click for each field gets laborious.
browser-use is a Python framework. 80,000+ stars on GitHub. MIT license.
Its biggest weapon over PinchTab: autonomous agent mode.
Give it a task and the AI figures out the steps and executes them. Say "fill out this job application with my info" and it finds the fields, selects the right values, and types them in.
How I Actually Used It
I used browser-use to fill out application forms on Greenhouse and Ashby (recruiting platforms). Claude Code orchestrated while browser-use handled field-by-field input. I watched in headed mode and only clicked the final submit button myself.
pip install browser-use
CLI mode:
browser-use open https://example.com
browser-use input "field name" "value"
browser-use state # Check current state
MCP server mode is also available, providing 17 tools.
Tradeoffs
It consumes far more tokens than PinchTab. The autonomous agent calls an LLM at each step. Per-page token count is also higher.
But for complex multi-step tasks, autonomy > efficiency.
Manually running snap -> type for 10 form fields takes 20 minutes. Telling browser-use "fill this out" takes 3 minutes. More tokens consumed, but human time saved.
Full Tool Comparison — The Final Showdown
Here's everything laid out in one table.
| Feature | Chrome DevTools MCP | Claude in Chrome | agent-browser | PinchTab | browser-use |
|---|---|---|---|---|---|
| Tokens/page | 10,000+ | 10,000+ | 3,000-5,000 | ~800 | 10,000+ |
| Setup | Debug flag launch | Extension | npm install |
Launch once | pip install |
| Windows support |
cmd /c hack needed |
OK | UNC path bug | OK | OK |
| Auth/cookies | Separate profile | Real browser | Auth Vault | Real browser | Real browser |
| Stability | Stable | Beta, disconnects | Stable | Stable | Stable |
| Speed | Medium | Medium | Medium | Fast | Slow (LLM calls) |
| SNS reading | Heavy JSON | Heavy JSON | Heavy | 800 tokens | Heavy |
| Form filling | Drops first char | Drops first char | OK | OK | Best (autonomous) |
| Autonomous agent | No | No | No | No | Yes |
| Background daemon | No | No | No | HTTP server | No |
| Cost | Free | Free | Free | Free | Free |
Final Verdict: The Priority Chain
No perfect tool exists. The answer is a combination.
Browser automation priority chain:
1. PinchTab -> Everything daily (reading, scraping, testing, screenshots)
2. browser-use -> Complex multi-step tasks (form filling, autonomous workflows)
3. agent-browser -> When PinchTab isn't available, or you need video recording / Auth Vault
4. WebFetch -> Static docs / API references ONLY. Never use it for SNS.
The Vehicle Analogy
| Tool | Vehicle | Characteristics |
|---|---|---|
| PinchTab | Bicycle | Fast, best fuel efficiency, daily commute |
| browser-use | Car | Goes the distance, carries cargo, burns more fuel |
| agent-browser | Motorcycle | Backup, special purposes |
| WebFetch | Walking | Slow, can't carry much, last resort |
Recommendations
If You Use Claude Code Daily
Install PinchTab first. Highest ROI. Read a page for 800 tokens. Your session lifespan extends dramatically.
pinchtab &
pinchtab nav https://your-app.com
sleep 3
pinchtab text
That alone changes everything.
If You Need Form Automation
Add browser-use. Job applications, data entry, multi-page workflows. Let the autonomous agent handle it while you supervise.
pip install browser-use
If You Need Auth Vault / Video Recording
Use agent-browser. npm install -g agent-browser && agent-browser install and you're set.
For Dynamic Sites
Don't use WebFetch. No matter what. Especially not for SNS.
5 Things I Learned
| # | Lesson |
|---|---|
| 1 | "Browser access for AI" is still an unsolved problem as of March 2026 |
| 2 | No perfect tool exists. The answer is a combination |
| 3 | Token efficiency is everything. 800 vs 10,000 = 12x difference. Session lifespan is completely different |
| 4 | SNS requires dedicated tools. Don't use WebFetch |
| 5 | Form filling is best handled by autonomous agents. AI fills, human supervises |
Conclusion
I tried all 6 tools. It took months.
It started with the ordeal of relaunching Chrome with debug flags for Chrome DevTools MCP, continued through Claude in Chrome's beta disconnections, the CSS garbage wars with WebFetch, seeing the light with agent-browser, finding daily peace with PinchTab, and finally filling the last gap with browser-use.
Honestly, I'm glad I tried them all.
Because this isn't a problem one tool can solve. You commute by bicycle, take the car for long trips, and grab the motorcycle in a pinch. Same idea.
Now that AI can use a browser, the chef can finally taste their own cooking.
The question now is: what will they cook?
This article was written in Tokyo, with PinchTab reading the preview for a final check before publishing.
Questions or feedback welcome on X (@DavidAi311).
Top comments (2)
Allow me to suggest a third architecture: proxy-based session virtualization.
Instead of controlling a local browser instance, the agent operates within a server-managed session that rewrites and streams the target application.
This model changes the state management problem. Session state (cookies, local storage) is managed server-side within the virtualized session. When paired with a custom domain, all cookies become first-party, which bypasses the re-authentication loop you experienced - essentially providing AI agents ability to control live web sessions.
The control mechanism uses a dedicated Automation API. For instance, webfuse exposes an API like browser.webfuseSession.automation.left_click(selector) and browser.webfuseSession.automation.type(text, selector). More importantly, it addresses the token problem at the source. The proxy can pre-process the DOM before serialization using take_dom_snapshot({ modifier: 'AdaptiveD2Snap', params: { maxTokens: 8192 } }), reducing it to a specified token budget before it ever hits the LLM.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.