DEV Community

Cover image for I built a CLI that turns any website into a Unix command - here's how I bypassed Cloudflare's TLS fingerprinting
Michael Oblak
Michael Oblak

Posted on

I built a CLI that turns any website into a Unix command - here's how I bypassed Cloudflare's TLS fingerprinting

My AI agent was spending 30 seconds and 300MB of RAM to search X. Launch Chromium, navigate, wait for render, scrape the DOM. For a GET request.

That felt like driving a truck to pick up a letter.

So I built web2cli - a CLI where every website is a command. Direct HTTP requests, same cookies your browser uses, structured output. No browser.

demo

pip install web2cli
web2cli hn top --limit 5
web2cli x search --query "AI agents" --limit 3 --format json
web2cli discord send --server myserver --channel general --message "hello"
Enter fullscreen mode Exit fullscreen mode

Six adapters ship today: Hacker News, X, Discord, Slack, Stack Overflow, Reddit.

The concept was simple. The implementation... less so.

The first wall: Cloudflare's TLS fingerprinting

Stack Overflow uses Cloudflare. My first attempt was straightforward - httpx with a Chrome User-Agent header:

resp = await httpx.AsyncClient().get(url, headers={
    "User-Agent": "Mozilla/5.0 ... Chrome/120.0.0.0"
})
# 403 Forbidden. Every time.
Enter fullscreen mode Exit fullscreen mode

Turns out Cloudflare doesn't just check your User-Agent header. It hashes your TLS ClientHello packet - the very first message in the TLS handshake, sent before any HTTP headers. This hash is called a JA3 fingerprint.

Here's the problem: Python's default TLS stack (OpenSSL) produces a JA3 fingerprint that looks nothing like Chrome's. Chrome uses BoringSSL with a specific set of cipher suites, extensions, and ALPN protocols. OpenSSL uses different ones.

So Cloudflare sees:

User-Agent says:     Chrome 120
TLS fingerprint says: Python/OpenSSL
Verdict:             Bot. Blocked.
Enter fullscreen mode Exit fullscreen mode

This happens during the TLS handshake - before your HTTP request is even sent. No amount of header manipulation helps.

The fix: curl_cffi - Python bindings for curl-impersonate. It replaces OpenSSL with BoringSSL (Chrome's actual TLS library) and mimics the exact cipher suites, extensions, and HTTP/2 settings.

from curl_cffi.requests import AsyncSession

async with AsyncSession() as s:
    resp = await s.get(url, impersonate="chrome")
    # 200 OK. Cloudflare thinks we're Chrome.
Enter fullscreen mode Exit fullscreen mode

One line change. JA3 hash now matches real Chrome. Cloudflare lets us through.

The second wall: X.com's cryptographic nonces

X was harder. Their search endpoint requires a x-client-transaction-id header - a one-time cryptographic nonce that's generated by obfuscated JavaScript in the browser.

You can't reuse nonces. You can't fake them. Each one is tied to the specific request method and path.

The community reverse-engineered the algorithm. The flow:

  1. Fetch x.com homepage (get the base page)
  2. Fetch a specific ondemand.s JavaScript bundle
  3. Initialize a transaction generator with both
  4. Generate a fresh nonce per request
from x_client_transaction import ClientTransaction

ct = ClientTransaction(home_page, ondemand_js)
nonce = ct.generate_transaction_id("GET", "/search")
headers["x-client-transaction-id"] = nonce
Enter fullscreen mode Exit fullscreen mode

This nonce rotates - the JS bundle changes periodically, so you need to refresh the generator. But it beats paying $100/mo for official API access.

Session architecture

Instead of managing browser profiles, web2cli stores sessions as encrypted cookie jars:

# Opens real Chromium, you log in, cookies are captured
web2cli login x.com --browser

# Or paste cookies manually
web2cli login discord --cookies "token=xxx"

# Or use environment variables
WEB2CLI_X_COOKIES="auth_token=xxx; ct0=yyy" web2cli x search ...
Enter fullscreen mode Exit fullscreen mode

Sessions are encrypted with Fernet (AES-128-CBC) and stored in ~/.web2cli/sessions/. Your credentials never leave your machine.

The --browser flag is the smoothest path - Playwright opens Chromium, you log in normally (password manager, 2FA, whatever), and web2cli polls for the required cookies. When they appear, it captures them and closes the browser. You don't need to know which cookies are needed - the adapter spec declares them.

The adapter model

Each website is described by a YAML file that maps CLI commands to HTTP request pipelines. Most sites need zero Python code:

commands:
  search:
    description: "Search tweets"
    args:
      - name: query
        required: true
      - name: limit
        default: 20
    pipeline:
      - request:
          method: GET
          url: /search
          params:
            q: "{{args.query}}"
      - parse:
          format: json
          fields:
            - name: author
              from: "$.user.screen_name"
            - name: text
              from: "$.full_text"
Enter fullscreen mode Exit fullscreen mode

For complex sites (X.com's GraphQL, SO's Cloudflare protection), you can add Python builder/parser scripts. But the goal is that adding a new site takes ~30 minutes and a YAML file.

Benchmarks

These are measured, not estimated:

Task Browser automation web2cli Speedup
Read Discord messages 26s 0.63s 41x
Send a Slack message 35s 0.60s 58x
Search X 75s 1.54s 50x
Search Stack Overflow 41s 0.65s 63x
Fetch HN submissions 36s 1.42s 25x

For AI agents, this changes the economics completely:

Scenario Browser web2cli
Monitor Discord (1 check/min) $2.88/day $0.002/day
10k daily actions ~$50/day ~$0.01/day
Monthly infra $50+/mo $4/mo

What's next

I'm working on web2cli Cloud - managed sessions with proxy rotation for multi-user agents. Your users click a link, log in via a sandboxed browser, your agent gets an opaque session token. No cookies touch your server.

Think "OAuth for websites that don't have OAuth."

The broader question I keep coming back to: how much of "web automation" actually needs a browser? For the 80% of tasks that are just "fetch data behind a login" - probably none.

GitHub: github.com/jb41/web2cli
Install: pip install web2cli
Cloud waitlist: web2cli.com

Top comments (0)