My AI agent was spending 30 seconds and 300MB of RAM to search X. Launch Chromium, navigate, wait for render, scrape the DOM. For a GET request.
That felt like driving a truck to pick up a letter.
So I built web2cli - a CLI where every website is a command. Direct HTTP requests, same cookies your browser uses, structured output. No browser.
pip install web2cli
web2cli hn top --limit 5
web2cli x search --query "AI agents" --limit 3 --format json
web2cli discord send --server myserver --channel general --message "hello"
Six adapters ship today: Hacker News, X, Discord, Slack, Stack Overflow, Reddit.
The concept was simple. The implementation... less so.
The first wall: Cloudflare's TLS fingerprinting
Stack Overflow uses Cloudflare. My first attempt was straightforward - httpx with a Chrome User-Agent header:
resp = await httpx.AsyncClient().get(url, headers={
"User-Agent": "Mozilla/5.0 ... Chrome/120.0.0.0"
})
# 403 Forbidden. Every time.
Turns out Cloudflare doesn't just check your User-Agent header. It hashes your TLS ClientHello packet - the very first message in the TLS handshake, sent before any HTTP headers. This hash is called a JA3 fingerprint.
Here's the problem: Python's default TLS stack (OpenSSL) produces a JA3 fingerprint that looks nothing like Chrome's. Chrome uses BoringSSL with a specific set of cipher suites, extensions, and ALPN protocols. OpenSSL uses different ones.
So Cloudflare sees:
User-Agent says: Chrome 120
TLS fingerprint says: Python/OpenSSL
Verdict: Bot. Blocked.
This happens during the TLS handshake - before your HTTP request is even sent. No amount of header manipulation helps.
The fix: curl_cffi - Python bindings for curl-impersonate. It replaces OpenSSL with BoringSSL (Chrome's actual TLS library) and mimics the exact cipher suites, extensions, and HTTP/2 settings.
from curl_cffi.requests import AsyncSession
async with AsyncSession() as s:
resp = await s.get(url, impersonate="chrome")
# 200 OK. Cloudflare thinks we're Chrome.
One line change. JA3 hash now matches real Chrome. Cloudflare lets us through.
The second wall: X.com's cryptographic nonces
X was harder. Their search endpoint requires a x-client-transaction-id header - a one-time cryptographic nonce that's generated by obfuscated JavaScript in the browser.
You can't reuse nonces. You can't fake them. Each one is tied to the specific request method and path.
The community reverse-engineered the algorithm. The flow:
- Fetch x.com homepage (get the base page)
- Fetch a specific
ondemand.sJavaScript bundle - Initialize a transaction generator with both
- Generate a fresh nonce per request
from x_client_transaction import ClientTransaction
ct = ClientTransaction(home_page, ondemand_js)
nonce = ct.generate_transaction_id("GET", "/search")
headers["x-client-transaction-id"] = nonce
This nonce rotates - the JS bundle changes periodically, so you need to refresh the generator. But it beats paying $100/mo for official API access.
Session architecture
Instead of managing browser profiles, web2cli stores sessions as encrypted cookie jars:
# Opens real Chromium, you log in, cookies are captured
web2cli login x.com --browser
# Or paste cookies manually
web2cli login discord --cookies "token=xxx"
# Or use environment variables
WEB2CLI_X_COOKIES="auth_token=xxx; ct0=yyy" web2cli x search ...
Sessions are encrypted with Fernet (AES-128-CBC) and stored in ~/.web2cli/sessions/. Your credentials never leave your machine.
The --browser flag is the smoothest path - Playwright opens Chromium, you log in normally (password manager, 2FA, whatever), and web2cli polls for the required cookies. When they appear, it captures them and closes the browser. You don't need to know which cookies are needed - the adapter spec declares them.
The adapter model
Each website is described by a YAML file that maps CLI commands to HTTP request pipelines. Most sites need zero Python code:
commands:
search:
description: "Search tweets"
args:
- name: query
required: true
- name: limit
default: 20
pipeline:
- request:
method: GET
url: /search
params:
q: "{{args.query}}"
- parse:
format: json
fields:
- name: author
from: "$.user.screen_name"
- name: text
from: "$.full_text"
For complex sites (X.com's GraphQL, SO's Cloudflare protection), you can add Python builder/parser scripts. But the goal is that adding a new site takes ~30 minutes and a YAML file.
Benchmarks
These are measured, not estimated:
| Task | Browser automation | web2cli | Speedup |
|---|---|---|---|
| Read Discord messages | 26s | 0.63s | 41x |
| Send a Slack message | 35s | 0.60s | 58x |
| Search X | 75s | 1.54s | 50x |
| Search Stack Overflow | 41s | 0.65s | 63x |
| Fetch HN submissions | 36s | 1.42s | 25x |
For AI agents, this changes the economics completely:
| Scenario | Browser | web2cli |
|---|---|---|
| Monitor Discord (1 check/min) | $2.88/day | $0.002/day |
| 10k daily actions | ~$50/day | ~$0.01/day |
| Monthly infra | $50+/mo | $4/mo |
What's next
I'm working on web2cli Cloud - managed sessions with proxy rotation for multi-user agents. Your users click a link, log in via a sandboxed browser, your agent gets an opaque session token. No cookies touch your server.
Think "OAuth for websites that don't have OAuth."
The broader question I keep coming back to: how much of "web automation" actually needs a browser? For the 80% of tasks that are just "fetch data behind a login" - probably none.
GitHub: github.com/jb41/web2cli
Install: pip install web2cli
Cloud waitlist: web2cli.com

Top comments (0)