How to know if you actually need mobile proxies (without buying any)

#webscraping #opensource #node #typescript

Every scraping project I start, the same question comes up: do I actually need mobile
proxies for this target, or will residential or datacenter do?

Picking wrong on this is the most expensive mistake on a scraping project. Too cheap and
your requests get blocked — you pay for traffic that achieves nothing. Too expensive
and your margins evaporate; mobile carrier IPs run roughly 5–10× the per-GB rate of
datacenter ones. And the answer changes per target: a sitemap crawl on a documentation
site doesn't need carrier-grade trust; the same scraper pointed at Nike's product pages
will be rejected from a datacenter IP within a hundred requests.

I got tired of doing this analysis manually — running curl -i against the target,
grepping for the familiar markers, mentally mapping them to vendors — so I packaged the
heuristic into a CLI.

  npx anti-bot-sniffer https://www.nike.com

    https://www.nike.com
    status 200 · 7 cookies set

    Detected
      ● Akamai Bot Manager
          via ak_bmsc cookie
          Enterprise-grade. Behavior + IP scoring; carrier ASN avoids
          most challenges.

    Recommended proxy tier
      ▶ MOBILE CARRIER

The tool is open-source (MIT) at github.com/atheris-ee/anti-bot-sniffer. Zero runtime dependencies, Node 18+. The rest of this
post is a quick tour of what it does and the reasoning behind the recommendations,
since picking the right tier matters whether you use this tool or not.

## What the tool actually checks

A single GET request with a normal browser-ish User-Agent, follows up to 5 redirects,
reads the first 64KB of response body, then matches against a signature catalog. It
looks at three places:

Response headers — cf-ray, server, x-dd-b, x-kpsdk-cd, and so on. CDN and WAF vendors leak identity here even when they don't mean to.
Set-Cookie names — __cf_bm, _abck, _px3, incap_ses_*. Cookies set on the first response are the cleanest signal of what's running, because they're set before the page renders.
HTML markers — js.datadome.co, challenges.cloudflare.com/turnstile, captcha.px-cdn.net. Vendor scripts embedded in the initial HTML.

No JavaScript execution. The tool runs in milliseconds and doesn't spin up a browser.

## What it can — and can't — see

Catches the outer wall:

CDN / WAF identity (Cloudflare, Akamai, Imperva, AWS WAF, Sucuri…)
Bot management add-ons (Cloudflare BM, DataDome, PerimeterX/HUMAN, Kasada, Akamai Bot Manager, F5/Shape)
Challenge widgets (reCAPTCHA, hCaptcha, Turnstile)

Doesn't catch:

Client-side JS fingerprinting (canvas, WebGL, AudioContext, behavior heuristics)
Anti-bot vendors that defer detection until specific user actions
Custom in-house systems with no public markers

So if anti-bot-sniffer says "nothing detected," that doesn't guarantee the target is
friendly to bots — it guarantees the target hasn't put a known anti-bot vendor between
you and the document. That's enough information to start with datacenter and escalate
if you see challenges, which is the right calibration for most workflows anyway.

## How the recommendations map to proxy tiers

Three tiers, in order of strictness:

mobile — only real mobile carrier IPs reliably pass. Triggered by: Cloudflare Bot
Management, DataDome, PerimeterX/HUMAN, Akamai Bot Manager, Kasada, F5/Shape. The reason
mobile is the answer here isn't magic — it's CGNAT. Mobile carriers share each
public IP among hundreds or thousands of subscribers, so IP-level reputation scoring is
unreliable. Blocking one mobile IP would block hundreds of real customers, so anti-bot
platforms treat carrier ASNs leniently by default.

residential — residential ISP pool usually works, sometimes mobile is needed.
Triggered by: AWS WAF, Imperva/Incapsula, base Cloudflare CDN without Bot Management.
Residential IPs blend with real home traffic at the ISP-ASN layer. Cheaper than mobile,
but the well-known pool ASNs (the big-three residential providers' ranges) are
increasingly being flagged by anti-bot platforms that watch for concurrent-automation
patterns.

datacenter — datacenter usually fine. Triggered by: Sucuri, Wordfence, or no
detected anti-bot. These are mostly application-rule WAFs that don't score IP class
aggressively. A datacenter proxy at sane request rates passes most of these without
challenges.

I wrote a longer breakdown of when each tier is actually the right answer — including
the cases where datacenter is correct despite being the cheapest — at Mobile vs
residential vs datacenter proxies — how to
choose.

## Three sample probes

To make the output concrete, here's what three well-known targets return:

example.com — base Cloudflare CDN, no Bot Management:

  Detected
    ◐ Cloudflare (base CDN tier)
        via server: cloudflare

  Recommended proxy tier
    ▶ RESIDENTIAL

www.cloudflare.com — running their own Bot Management:

  Detected
    ● Cloudflare Bot Management
        via __cf_bm cookie

  Recommended proxy tier
    ▶ MOBILE CARRIER

example.org — no anti-bot detected:

  ◯ No anti-bot stack detected from HTTP signals.

  Recommended proxy tier
    ▶ DATACENTER (OK)

The --json flag emits a stable structured shape, so you can pipe it into
target-tracking spreadsheets, CI, or whatever:

  $ npx anti-bot-sniffer nike.com --json | jq '.recommendedTier'
  "mobile"

## The honest gaps

The signature catalog covers the major vendors but isn't exhaustive. Coverage I'd like
in future versions but didn't land in v0.1: GeeTest, Friendly Captcha, Bot Master Lab,
Reblaze, Radware. If you hit a target that should match a particular vendor and doesn't,
drop a curl -iL snippet in an
issue — I'll add the detection.

I'd also welcome contributions on the recommendation logic itself. The tier mapping is
2025 industry consensus but varies per target. A site running Cloudflare base CDN often
passes from datacenter at low request rates and trips at high ones — the tool can't tell
you the request-rate boundary, only that the platform might enforce one. PRs that
surface that nuance are welcome.

## Where this came from

Disclosure: I run Atheris, a small mobile and residential
proxy reseller in Estonia. This tool is independent, MIT-licensed, and works regardless
of where you buy proxies. The recommendation logic deliberately tells you to use
datacenter when datacenter is enough — we'd rather earn the customers whose workloads
actually need mobile than upsell the ones whose workloads don't.

I wrote it because every prospect's first question was the same one this tool answers,
and forcing them to sign up for a paid plan just to find out whether mobile proxies were
the right tool felt like the wrong friction to put first. Releasing it as OSS solves
the friction problem permanently: people learn the answer, decide for themselves, and
the ones who do need mobile can find us if they want.

If you find it useful, a star on the
repo would help others find it too. PRs
and issues welcome.

Further reading: Mobile vs residential vs datacenter
proxies.