Every AI browser automation tool today works the same way: the LLM looks at the page, decides what to click, clicks it, looks again, decides again. Every. Single. Step.
This is fundamentally wrong. Here's why, and what we built instead.
The Problem: AI-Per-Step is Broken
Tools like Browser-Use and Stagehand are impressive demos. But in production:
- Slow: Each step needs an LLM call (1-3 seconds). A 10-step workflow takes 30+ seconds.
- Expensive: Every interaction costs tokens. Running 1000 times/day = real money.
- Unreliable: LLMs hallucinate. Different results each run. No determinism.
- Fragile: The AI might click the wrong button, misread a selector, or get confused by a modal.
The core issue: operating an interface is a solved problem the moment you figure out how. The hard part is understanding the page. That's what AI is good at. The easy part is executing the same steps again. That doesn't need AI at all.
The Insight: Separate Understanding from Execution
What if AI only runs once — to analyze the site and create a deterministic script — and then that script runs forever?
forge_inspect → forge_verify → forge_save → run forever
AI analyzes AI tests AI saves zero AI, zero tokens
This is Tap: an interface protocol for AI agents.
The Protocol: 8 + 16
Tap defines a minimal, complete contract for operating any interface:
8 kernel primitives — the irreducible atoms of interaction:
eval · pointer · keyboard · nav · wait · screenshot · tap · capabilities
16 stdlib operations — composed from the kernel:
click · type · hover · scroll · pressKey · select · upload · dialog
fetch · find · cookies · download · waitFor · waitForNetwork · ssrState · storage
A new runtime implements 8 methods, instantly gets 16 operations and every existing script. Today: Chrome Extension + Playwright. Tomorrow: Android, iOS, desktop apps.
What a Tap Looks Like
// API-first: fetch data directly
export default {
site: "bilibili", name: "hot",
extract: async () => {
const res = await fetch('https://api.bilibili.com/x/web-interface/ranking/v2',
{ credentials: 'include' })
const data = await res.json()
return data.data.list.map(v => ({
title: v.title, author: v.owner.name,
views: String(v.stat.view)
}))
}
}
// Action: operate the interface
export default {
site: "x", name: "post",
args: { content: { type: "string" } },
async run(page, args) {
await page.nav('https://x.com/compose/post')
await page.type('[data-testid="tweetTextarea_0"]', args.content)
await page.click('[data-testid="tweetButton"]')
return [{ status: 'posted' }]
}
}
No LLM. No tokens. Pure JavaScript. Runs in under 1 second.
81 Skills Across 41 Sites
The community has already forged taps for GitHub, Reddit, Hacker News, X/Twitter, YouTube, Bilibili, Zhihu, Xiaohongshu, Weibo, Medium, arXiv, and many more.
curl -fsSL https://raw.githubusercontent.com/LeonTing1010/tap/master/install.sh | sh
tap install # Clone 81 community skills
tap list # See them all
It's an MCP server — works with Claude Code, Cursor, Windsurf, or any MCP-compatible agent:
{
"mcpServers": {
"tap": { "command": "tap", "args": ["mcp"] }
}
}
The Economics
| AI-per-step | Tap | |
|---|---|---|
| Forge cost | — | ~$0.05 (one-time) |
| Run cost | $0.01-0.10/run | $0.00 |
| 1000 runs | $10-100 | $0.05 total |
| Latency | 10-30s | <1s |
| Deterministic | No | Yes |
Try It
- GitHub: github.com/LeonTing1010/tap
- Skills: github.com/LeonTing1010/tap-skills
Forge once, run forever. That's the idea.
Tap is AGPL-3.0 licensed. ~1,800 lines of Deno. Zero dependencies.
Top comments (0)