I built a CLI that turns any website into a Unix command - here's how I bypassed Cloudflare's TLS fingerprinting

#ai #agents #cli #automation

My AI agent was spending 30 seconds and 300MB of RAM to search X. Launch Chromium, navigate, wait for render, scrape the DOM. For a GET request.

That felt like driving a truck to pick up a letter.

So I built web2cli - a CLI where every website is a command. Direct HTTP requests, same cookies your browser uses, structured output. No browser.

pip install web2cli
web2cli hn top --limit 5
web2cli x search --query "AI agents" --limit 3 --format json
web2cli discord send --server myserver --channel general --message "hello"

Six adapters ship today: Hacker News, X, Discord, Slack, Stack Overflow, Reddit.

The concept was simple. The implementation... less so.

The first wall: Cloudflare's TLS fingerprinting

Stack Overflow uses Cloudflare. My first attempt was straightforward - httpx with a Chrome User-Agent header:

resp = await httpx.AsyncClient().get(url, headers={
    "User-Agent": "Mozilla/5.0 ... Chrome/120.0.0.0"
})
# 403 Forbidden. Every time.

Turns out Cloudflare doesn't just check your User-Agent header. It hashes your TLS ClientHello packet - the very first message in the TLS handshake, sent before any HTTP headers. This hash is called a JA3 fingerprint.

Here's the problem: Python's default TLS stack (OpenSSL) produces a JA3 fingerprint that looks nothing like Chrome's. Chrome uses BoringSSL with a specific set of cipher suites, extensions, and ALPN protocols. OpenSSL uses different ones.

So Cloudflare sees:

User-Agent says:     Chrome 120
TLS fingerprint says: Python/OpenSSL
Verdict:             Bot. Blocked.

This happens during the TLS handshake - before your HTTP request is even sent. No amount of header manipulation helps.

The fix: curl_cffi - Python bindings for curl-impersonate. It replaces OpenSSL with BoringSSL (Chrome's actual TLS library) and mimics the exact cipher suites, extensions, and HTTP/2 settings.

from curl_cffi.requests import AsyncSession

async with AsyncSession() as s:
    resp = await s.get(url, impersonate="chrome")
    # 200 OK. Cloudflare thinks we're Chrome.

One line change. JA3 hash now matches real Chrome. Cloudflare lets us through.

The second wall: X.com's cryptographic nonces

X was harder. Their search endpoint requires a x-client-transaction-id header - a one-time cryptographic nonce that's generated by obfuscated JavaScript in the browser.

You can't reuse nonces. You can't fake them. Each one is tied to the specific request method and path.

The community reverse-engineered the algorithm. The flow:

Fetch x.com homepage (get the base page)
Fetch a specific ondemand.s JavaScript bundle
Initialize a transaction generator with both
Generate a fresh nonce per request

from x_client_transaction import ClientTransaction

ct = ClientTransaction(home_page, ondemand_js)
nonce = ct.generate_transaction_id("GET", "/search")
headers["x-client-transaction-id"] = nonce

This nonce rotates - the JS bundle changes periodically, so you need to refresh the generator. But it beats paying $100/mo for official API access.

Session architecture

Instead of managing browser profiles, web2cli stores sessions as encrypted cookie jars:

# Opens real Chromium, you log in, cookies are captured
web2cli login x.com --browser

# Or paste cookies manually
web2cli login discord --cookies "token=xxx"

# Or use environment variables
WEB2CLI_X_COOKIES="auth_token=xxx; ct0=yyy" web2cli x search ...

Sessions are encrypted with Fernet (AES-128-CBC) and stored in ~/.web2cli/sessions/. Your credentials never leave your machine.

The --browser flag is the smoothest path - Playwright opens Chromium, you log in normally (password manager, 2FA, whatever), and web2cli polls for the required cookies. When they appear, it captures them and closes the browser. You don't need to know which cookies are needed - the adapter spec declares them.

The adapter model

Each website is described by a YAML file that maps CLI commands to HTTP request pipelines. Most sites need zero Python code:

commands:
  search:
    description: "Search tweets"
    args:
      - name: query
        required: true
      - name: limit
        default: 20
    pipeline:
      - request:
          method: GET
          url: /search
          params:
            q: "{{args.query}}"
      - parse:
          format: json
          fields:
            - name: author
              from: "$.user.screen_name"
            - name: text
              from: "$.full_text"

For complex sites (X.com's GraphQL, SO's Cloudflare protection), you can add Python builder/parser scripts. But the goal is that adding a new site takes ~30 minutes and a YAML file.

Benchmarks

These are measured, not estimated:

Task	Browser automation	web2cli	Speedup
Read Discord messages	26s	0.63s	41x
Send a Slack message	35s	0.60s	58x
Search X	75s	1.54s	50x
Search Stack Overflow	41s	0.65s	63x
Fetch HN submissions	36s	1.42s	25x

For AI agents, this changes the economics completely:

Scenario	Browser	web2cli
Monitor Discord (1 check/min)	$2.88/day	$0.002/day
10k daily actions	~$50/day	~$0.01/day
Monthly infra	$50+/mo	$4/mo

What's next

I'm working on web2cli Cloud - managed sessions with proxy rotation for multi-user agents. Your users click a link, log in via a sandboxed browser, your agent gets an opaque session token. No cookies touch your server.

Think "OAuth for websites that don't have OAuth."

The broader question I keep coming back to: how much of "web automation" actually needs a browser? For the 80% of tasks that are just "fetch data behind a login" - probably none.

GitHub: github.com/jb41/web2cli
Install: pip install web2cli
Cloud waitlist: web2cli.com