Tinyfishie

Posted on May 19 • Originally published at tinyfish.ai

How to Choose a Web Automation Tool by Page Volume (With Real Cost Estimates)

#webautomation #toolselection #pagevolume #costanalysis

Most web automation tool comparisons treat page volume as a footnote. It isn't.

The tool that handles 500 pages a day beautifully will silently degrade at 50,000. The infrastructure that's cost-effective at 10,000 pages becomes the most expensive option in the room at 500,000. And the free tier that feels like a reasonable starting point has a ceiling that catches most teams by surprise somewhere in the middle of a project.

Page volume and access requirement level are the two primary variables that determine your tool decision — more than AI capability, ease of use, or no-code vs. code. Get either wrong and you're either paying for infrastructure you don't need or running a pipeline that breaks under load exactly when it matters.

This guide maps each volume tier to the tools that actually work at that scale, with real cost estimates at each level so you can make the comparison with numbers rather than intuition.

How to Estimate Your Page Volume

Quick decision rules before the detail:

Under 1K pages/day — free tiers work; pick for convenience, not capability.
1K–10K pages/day — managed tools beat self-hosted once you count setup time.
10K–100K pages/day — engineering maintenance cost exceeds tool subscription cost; factor both.
100K+ pages/day — you're buying infrastructure, not a tool; build vs. buy is the real decision.

Before matching tools to volume tiers, you need an accurate number. Teams consistently underestimate this, and the underestimate is what causes mid-project tool switches.

The formula:

Daily pages = (number of target URLs) × (crawl frequency per day) × (pages per URL path)

A few scenarios to calibrate against:

Competitor price monitoring across 50 e-commerce sites, updated daily: If each site has ~200 product pages, that's 10,000 pages/day. If you need hourly updates, that's 240,000 pages/day.
Lead enrichment from 2,000 company profile pages, run once a week: ~285 pages/day on average. Looks small — until you need it done in a 2-hour window, which effectively makes it ~1,000 pages/hour.
News monitoring across 30 publications, 4x daily: If each publication has ~50 new articles per cycle, that's 6,000 pages/day.

The number that matters for tool selection isn't the total — it's the peak load your pipeline needs to sustain, and whether you need it done in a tight time window or can spread it across the day.

Volume Tier 1: Under 1,000 Pages Per Day

What this looks like

One-off research pulls. Small recurring monitors. Proof-of-concept scrapes before committing to a larger pipeline. A freelancer pulling a client's competitor catalog. A researcher collecting data from academic directories.

What works

At this volume, almost any tool works. The decision is about convenience and your technical comfort level, not about infrastructure.

Free options that are genuinely capable here:

Web Scraper Chrome Extension (free): Works for public, unprotected pages. No scheduling, no parallelism, but for a one-time pull of a few hundred rows it's the fastest path to a CSV.
ParseHub free tier: 5 projects, up to 200 pages per run. If your target has fewer than 200 pages, this is a complete solution at zero cost.
Octoparse free tier: 2 simultaneous scrapers, 10 tasks limit, up to 50K rows/month export. Better for recurring small-volume scrapes than ParseHub, but verify the task and row limits against your actual target before committing.
TinyFish free tier: 500 credits, no credit card. The value here isn't the volume — it's that you get to test an AI agent against your actual target site, including any access requirements it enforces.

What to watch for: Free tiers hide their ceilings. ParseHub's 200-page-per-run limit is the one most teams hit mid-project. If your target has 250 product pages, you're already over the limit. Verify the ceiling against your actual target page count before building a workflow around any free tier.

Real cost at this volume

Tool	Monthly cost at ~500 pages/day	Notes
Web Scraper Extension	$0	No scheduling, uses your IP
ParseHub	$0 (free tier)	200 pages/run limit
Octoparse	$0 (free tier)	Local runs only
TinyFish	$0 (free tier)	500 credits total, not per day
Scrapy (self-hosted)	$0 + server cost (~$5–10/mo VPS)	Requires Python setup

Volume Tier 2: 1,000 to 10,000 Pages Per Day

What this looks like

A small team's recurring data feed. Daily price monitoring across dozens of sites. A startup's competitive intelligence pipeline. Most "we scrape data to inform our product decisions" use cases live here.

What works

This is where free tiers run out and you start paying for infrastructure. The key trade-off at this volume is between simplicity (managed cloud tools) and cost efficiency (self-hosted frameworks).

Managed cloud tools (simpler, higher per-page cost):

Apify: Solid at this volume. A well-configured Actor running 5,000 pages/day typically costs $30–60/month in compute. The marketplace of pre-built Actors covers most common targets (Amazon, LinkedIn, Google Maps) and gets you to first data in under ten minutes without writing selectors. For targets outside the catalog, you're writing and maintaining custom Actors — factor that time in.
TinyFish Browser API: $15/month (Starter) for developers already using Playwright, Puppeteer, or Selenium. Connects via CDP over WebSocket — no SDK swap, you point your existing browser automation at TinyFish's endpoint instead. Sub-250ms cold start means parallelism scales cleanly without queuing delays. Best fit at this tier: developers who want managed browser infrastructure without rebuilding their scraping stack.
TinyFish Web Agent (Starter, $15/month, 1,650 credits): Better fit when your target requires multi-step navigation or authentication rather than straightforward page extraction. A simple extraction runs 2–3 steps/page; an authenticated flow runs 8–10.

The Browser API and Web Agent share the same credit pool, so you can mix both within one plan depending on what each target requires.

Self-hosted frameworks (more work, lower marginal cost):

Scrapy: Free to run, but you're paying for a server and your own time. A $20/month cloud instance handles this volume easily. The real cost is the 4–8 hours of setup time and ongoing maintenance when target sites change. If your targets are static HTML with no strict access requirements, this is the most cost-efficient option at this volume.

What to watch for: At 1,000–10,000 pages/day, you're large enough that sites with strict access requirements become a more significant factor. A managed tool that includes proxy rotation (like TinyFish) absorbs that cost into the subscription. A self-hosted Scrapy setup needs a separate proxy budget — residential proxies (e.g., Bright Data) run ~$8/GB PAYG at this tier, which adds $20–80/month depending on page weight.

Real cost at this volume

Estimated monthly cost at 5,000 pages/day:

Tool	Base cost	Proxy cost	Estimated total/mo
Scrapy (self-hosted)	~$20 (server)	$30–80 (if needed)	$20–100
Apify (pay-as-you-go)	~$40–60 (compute)	Separate	$40–140
TinyFish Starter	$15	Included	$15
TinyFish Pro	$150	Included	$150

Note: TinyFish pricing includes browsers, proxies, and AI inference. Apify and Scrapy costs are compute only — add proxy costs separately for protected sites.

Volume Tier 3: 10,000 to 100,000 Pages Per Day

What this looks like

A mid-size company's market intelligence operation. An e-commerce brand monitoring pricing across hundreds of competitor sites. A SaaS product that needs fresh web data as a core feature. This is where scraping stops being a side project and becomes infrastructure.

What works

At this volume, the hidden cost of scraping is no longer the tool subscription — it's engineering time. Selector-based scrapers break when target sites update. Proxy pools need management. Failure monitoring becomes a dedicated function. The teams that underestimate this end up with a part-time engineer whose primary job is keeping the scraping pipeline alive.

Managed infrastructure wins on total cost here:

TinyFish *(Browser API, for teams migrating from Playwright/Puppeteer): $150/month for 16,500 credits, 50 concurrent sessions; pay-as-you-go at $0.015/credit. TinyFish billing depends on which API you use — the calculations below assume 10 seconds per page; actual costs vary with page load time and workflow complexity.

Browser API (billed per time — 1 credit = 4 minutes, minimum 1 minute):

10 sec/page → rounds up to 1 min → 0.25 credits/page

50,000 pages/day × 0.25 credits × 30 days = 375,000 credits/month

→ PAYG: ~$5,625/month | Pro plan (overage at $0.012/credit): ~$4,452/month

Web Agent (billed per step — 1 credit = 1 step; for complex multi-step workflows):

~3 steps/page × 50,000 pages/day × 30 days = 4,500,000 steps/month

→ PAYG: ~$67,500/month | Not practical for bulk simple extraction at this volume.

Most bulk extraction pipelines at this tier use the Browser API. Web Agent pricing is designed for complex authenticated workflows where the automation value per workflow justifies the cost.*

Apify (Starter plan): Starts at $29/month; the Scale plan ($199/month) is typically required at 50,000 pages/day. At this volume, expect $200–500/month in compute, plus significant proxy costs for protected sites. Custom Actors require ongoing maintenance.
Bright Data: At this volume, Bright Data's Scraping Browser becomes relevant — a fully managed Chrome instance with built-in proxy rotation. Cost is primarily proxy bandwidth: residential proxies at ~$8/GB (PAYG; source: brightdata.com, April 2026). A 50,000-page/day operation scraping typical retail pages (~500KB each) uses roughly 25GB/day — approximately $6,000/month in proxy costs alone. Bright Data makes sense when geographic targeting or anti-detection reliability is the primary requirement, not as a general-purpose option.

Self-hosted at this volume:

Scrapy + infrastructure: Technically possible, but at 50,000 pages/day you need distributed infrastructure — multiple servers, a job queue (Redis or Celery), a monitoring stack, and proxy management. A realistic infrastructure budget is $200–500/month, plus 20+ hours/month of engineering maintenance. Justified if you have a dedicated data engineering team and highly customized requirements.

What to watch for: This is the volume tier where silent failure becomes a serious business problem. A pipeline that silently returns empty results for three days at 50,000 pages/day is a data quality incident, not a minor inconvenience. Factor monitoring and alerting into your tool evaluation — not just happy-path performance.

Real cost at this volume

Estimated monthly cost at 50,000 pages/day, assuming a mixed target set of simple and JS-heavy sites requiring managed browser infrastructure:

Tool	Estimated total/mo	Selector maintenance	Failure visibility
Scrapy + proxies	$2,000–2,300 ⁽¹⁾	High (you own it)	Manual
Apify (custom Actors)	$500–900	Medium (Actor updates)	Dashboard
Bright Data (proxy infra)	$4,500–6,000+ ⁽²⁾	High (your scrapers)	Manual
TinyFish Browser API (PAYG)	~$5,625 ⁽³⁾	None	Built-in
TinyFish Browser API (Pro)	~$4,452 ⁽³⁾	None	Built-in

⁽¹⁾ Scrapy estimate: ~$200–500/month compute (industry estimate, no official source; based on 3–5 VPS instances + job queue) + ~$1,800/month residential proxy for ~30% protected pages (15,000 pages/day × 500KB × 30 days = 225GB × $8/GB). Compute only would be $200–500/month — proxy is the larger cost at this volume.

⁽²⁾ Bright Data: residential proxy at $8/GB PAYG (source: brightdata.com, April 2026). 750GB/month for a mixed site set × $8 = $6,000/month.

⁽³⁾ TinyFish: based on tinyfish.ai/pricing (April 2026) + assumed 10 sec/page (minimum 1 min billing = 0.25 credits/page). Actual costs vary with page load time. See calculation detail in the section above.

The TinyFish number looks higher than Scrapy until you add engineering time. At $150/hour for a developer, 20 hours/month of maintenance is $3,000 — not in the tool budget, but real cost.

Volume Tier 4: 100,000+ Pages Per Day

What this looks like

Enterprise-scale data operations. Google-scale inventory aggregation. A rideshare company collecting millions of pricing variables monthly. Financial services firms monitoring hundreds of regulatory portals in real time. This is not a side project.

What works

At this volume, you're buying infrastructure, not tools. The question is whether you build it or buy it.

Build: A custom distributed scraping stack — Scrapy or custom crawlers running on Kubernetes, Bright Data or a private proxy pool for IP management, a data pipeline for cleaning and delivery. Engineering cost to build: 3–6 months of a senior engineer's time. Ongoing maintenance: a dedicated team. Justified for organizations with highly specific data requirements, existing data engineering capacity, and volume that makes the economics work.

Buy: TinyFish's enterprise tier is designed for this. At this tier, the economics shift from per-page cost to total infrastructure cost — the platform is running production workflows at this scale across multiple enterprise customers. The value proposition at this tier isn't the per-page cost — it's that you're buying a system that's already been hardened at that scale, with the reliability and compliance requirements enterprise operations need. Custom pricing at this tier; contact sales for specifics.

What to watch for: At 100,000+ pages/day, the decision isn't really between tools — it's between building and buying. Both have merit depending on your engineering resources and how central web data collection is to your product. The right question isn't "which tool is cheapest per page?" It's "how much of our engineering capacity do we want this to consume?"

The Full Picture: Volume × Site Complexity

Volume alone doesn't determine your tool. Site complexity — how much infrastructure the target requires — is the other axis. This matrix combines both:

	Static / simple pages	JS-heavy, requires managed browser	Authenticated access (your own accounts)
< 1K pages/day	Free tools (ParseHub, Octoparse)	TinyFish free tier	TinyFish free tier
1K–10K pages/day	Scrapy (self-hosted) or Apify	Apify or TinyFish Starter	TinyFish Starter/Pro
10K–100K pages/day	Scrapy + infra, Apify, or TinyFish Pro	Apify or TinyFish Pro	TinyFish Pro
100K+ pages/day	Custom stack or TinyFish Enterprise	TinyFish Enterprise	TinyFish Enterprise

The pattern: at low volume on simple sites, almost anything works and the cheapest option wins. As volume or site complexity increases, the tools that don't require ongoing maintenance become progressively more cost-effective when you count engineering time.

The Cost Calculation Most Teams Get Wrong

Every tool comparison in this category focuses on subscription price. The number that actually determines total cost is:

Total cost = tool subscription + proxy costs + (engineering hours × hourly rate)

Scrapy is free. But if a developer spends 15 hours/month keeping selectors current, that's $2,250/month at $150/hour — more expensive than any managed tool at comparable volume. The teams that make this mistake are the ones who calculate tool cost from the pricing page and engineering time from zero.

The inversion point — where managed infrastructure becomes cheaper than self-hosted — happens somewhere between 5,000 and 20,000 pages/day for most teams, depending on target site complexity and how often sites update their frontend.

How to Estimate Your Starting Point

If you're not sure where your project falls, start with the TinyFish free tier (500 credits, no credit card). Run it against your actual target site. The results tell you three things at once: whether AI-based extraction handles your target's structure, what your step-per-page ratio looks like for cost projection, and whether there's strict access requirements you didn't know about.

That's a better calibration than any estimate you can make from a pricing page.

Start free — 500 credits, no credit card required

Frequently Asked Questions

How much does web scraping cost?

It depends on volume and tool choice, but the honest answer is that the subscription price is rarely the whole number. At under 1,000 pages/day, free tiers from ParseHub, Octoparse, and TinyFish cover most use cases at zero cost. At 5,000 pages/day, expect $15–100/month depending on whether you need strict access handling. At 50,000 pages/day, total cost including infrastructure and proxy fees typically runs $2,000–5,600/month depending on tool and proxy requirements — and if you're on a self-hosted setup, add engineering maintenance time on top of that. The full formula is: tool subscription + proxy costs + (engineering hours × hourly rate). Teams that only look at the subscription line consistently underestimate real cost by 2–3x.

What counts as a "page" for automation tool pricing?

It depends on the tool. For Scrapy and most traditional scrapers, a page is one HTTP request. For AI web agents like TinyFish, the unit is a "step" — a discrete action (navigate, click, extract). A single page extraction might require 2–5 steps; a multi-step authenticated workflow might require 10–15. Always ask vendors for step-to-page ratios for your specific use case before committing to a plan.

Is Scrapy actually free at high volume?

The software is open source, but the infrastructure isn't free. At 50,000 pages/day you need distributed computing, job queues, monitoring, and proxy pools. A realistic total infrastructure cost is $400–800/month, plus ongoing engineering time. Scrapy is the most cost-efficient option when you have the engineering capacity to run it — it's not free, it's a trade of money for engineering time.

What happens if I exceed my plan's page or step limit?

Most managed tools handle this differently. Apify charges compute unit overages at the pay-as-you-go rate. TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step, 4 minutes of browser session, or 15 page fetches — the effective per-page cost depends on which API you use. Scrapy has no limit — your infrastructure is the ceiling. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.

How do I know if my volume estimate is accurate?

It usually isn't, in the direction of underestimation. The most common mistake: counting target URLs but not accounting for crawl frequency, or not including the pages you need to navigate through to reach the data (pagination, category pages, authentication flows). Add 30–50% to your estimate before selecting a plan tier.

Related reading:

The Best Web Scraping Tools in 2026 — Ranked and Reviewed →

📸 IMAGE — Matrix showing web automation tool recommendations by page volume and access requirement level

Try TinyFish Free

500 free steps, no credit card. The fastest way to test whether TinyFish fits your workflow.

Start free →

Want to scrape the web without getting blocked? Try TinyFish — a browser API built for AI agents and developers.

DEV Community

How to Choose a Web Automation Tool by Page Volume (With Real Cost Estimates)

How to Estimate Your Page Volume

Volume Tier 1: Under 1,000 Pages Per Day

What this looks like

What works

Real cost at this volume

Volume Tier 2: 1,000 to 10,000 Pages Per Day

What this looks like

What works

Real cost at this volume

Volume Tier 3: 10,000 to 100,000 Pages Per Day

What this looks like

What works

Real cost at this volume

Volume Tier 4: 100,000+ Pages Per Day

What this looks like

What works

The Full Picture: Volume × Site Complexity

The Cost Calculation Most Teams Get Wrong

How to Estimate Your Starting Point

Frequently Asked Questions

Try TinyFish Free

Top comments (0)