DEV Community

Dialphone Limited
Dialphone Limited

Posted on

I scraped 13 VoIP pricing pages with Playwright — here's what the anti-bot tells you about the vendor

Quick context: I've been building a free VoIP cost calculator for 2026, and the first hard problem was "can we actually trust the prices we're quoting?" Turns out that question is a lot more interesting than it sounds.

The premise

If you're building a B2B comparison tool, the first integrity question is — where does the data come from. I decided to automate a weekly pricing check across 13 business VoIP vendors (RingCentral, 8x8, Dialpad, Nextiva, Vonage, GoTo Connect, Zoom Phone, Ooma, Grasshopper, OpenPhone/Quo, Microsoft Teams Phone, Google Voice, DialPhone) using Playwright with the playwright-stealth plugin.

I expected maybe one or two would block the scraper. Instead, 8 out of 13 blocked it, 500ed it, or rendered the prices in a way that was effectively unscrapeable. The remaining 5 gave up their prices freely.

The split turned out to be surprisingly predictable once I squinted at it.

The setup

Standard Playwright stealth setup:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    ctx = browser.new_context(
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ...",
        viewport={"width": 1366, "height": 820},
        locale="en-US",
    )
    page = ctx.new_page()
    stealth_sync(page)
    page.goto(url, wait_until="domcontentloaded", timeout=25000)
    page.wait_for_timeout(6000)  # let JS render
    text = page.inner_text("body")
Enter fullscreen mode Exit fullscreen mode

Then a couple of regex passes for $ followed by a plausible price:

PRICE_RE = re.compile(r"\$\s?(\d{1,3}(?:\.\d{2})?)\s*(?:/|per|\s+per)", re.I)
PRICE_RE_BROAD = re.compile(r"\$\s?(\d{1,3}(?:\.\d{2})?)")
Enter fullscreen mode Exit fullscreen mode

I recorded: HTTP status, text length, prices found, and whether any of the provider's actual advertised price matched what I scraped (match ratio).

What actually happened

Provider Status Notes
RingCentral verified Scrape matched published prices. Public, scraper-friendly, values own pricing page as a landing surface.
Google Voice verified Workspace pricing is exposed as straight HTML — no tricks.
Nextiva verified Pricing revealed correctly; I'd had the wrong numbers from a 2024 cache in my own code. (More on that below.)
OpenPhone / Quo verified Clean HTML, matched.
DialPhone verified Our own pricing — checked for completeness anyway. Worth running against yourself.
8x8 rendered_but_prices_differ Page rendered, but published prices are behind a gated CTA. My scrape got other dollar amounts that weren't plan prices.
Dialpad rendered_but_prices_differ Same pattern — plan prices gated behind "Talk to sales".
Vonage no_prices_found Rendered page had long marketing copy but zero plan prices visible server-side.
GoTo Connect blocked_or_empty Something in the stack 403'd the session after 4–5 seconds.
Zoom Phone rendered_but_prices_differ Pricing shown only after region selector, which I didn't automate.
Ooma rendered_but_prices_differ Mixed SMB and residential prices on the same page — hard to disambiguate cleanly in regex.
Grasshopper no_prices_found Page rendered, but plans are revealed only on a post-CTA flow.
Microsoft Teams Phone no_prices_found Microsoft's marketing page shows no dollar amount — prices are inside the M365 admin console.

Five confidently verified, eight in one of the "we have to label this honestly" buckets.

Field notes — the thing I actually learned

I went into this thinking the story would be "big players block scrapers, small players don't." It wasn't that simple.

It was more like: the vendors with the most transparent product positioning also had the most scrapeable pricing pages. RingCentral, Nextiva, and Google Voice want you to land on the price and decide — there's no "book a demo" dance. The 8x8 and Dialpad model is closer to consultative selling, so their pricing pages are engineered to push you into sales contact, and that makes the public HTML less informative to both a human and a scraper.

So "can a regex find the price" turned out to be a surprisingly good proxy for "does this vendor want self-service SMB customers."

The honest uncomfortable bit: I had Nextiva's prices wrong in my calculator going in. I'd pulled them from a 2024 source and never refreshed. When the scraper came back with different numbers, I checked manually (actually opened their pricing page in a real browser, around 14:30 IST on the day of the audit) and the scraper was right. Same thing for OpenPhone/Quo — my data was stale. That's a cheap way to be wrong in public.

The payoff — the calculator

All of this fed into a free calculator I've been building: compare 13 business VoIP providers on pricing + features + 3-year TCO, with adjustable feature weights, hidden-cost toggles, shareable URLs, and — critically — per-provider verification badges: green ✓ verified [date] for the 5 I could pin down, amber ~ estimated for the 8 I couldn't.

It's live here, no signup: https://dialphonelimited.codeberg.page/calculator/

Every input is user-controlled; every price is labelled by confidence. That's the whole pitch.

If you're building something similar — comparison tools, pricing dashboards, anything that scrapes from the public web — my advice is: label your confidence per-data-point, not site-wide. A page that says "as of 22 April 2026" implies more certainty than you actually have. A price that says verified 22 Apr or estimated, source blocked lets the reader weight your output appropriately.

It costs you nothing in the UI and wins you a lot of credibility. Which, for a new-ish vendor trying to get editorial coverage, is the real currency.

About the calculator tool

The calculator is built by the team at DialPhone — a business VoIP provider covering US and Canadian SMBs with flat per-user pricing, 99.999% uptime SLA, SOC 2 / HIPAA / GDPR / PCI-DSS compliance, and a 14-day free trial. We built the comparison because we'd needed one when evaluating adjacent tools, and couldn't find one that labelled price freshness honestly.

Roast it, use it, poke holes in the verification badges if the date's too old. That's kind of the point.


Code approach & full scraper: separate post in the pipeline. If there's specific anti-bot behaviour you've run into and worked around, would love to hear about it in the comments.

Top comments (0)