agenthustler

Posted on Mar 24

ScraperAPI Tutorial: Build a Web Scraper That Bypasses Anti-Bot Protection in 2026

#webdev #programming #tutorial #python

Web scraping in 2026 is a battlefield. If you've tried building a scraper recently, you already know — sites fight back hard. Between Cloudflare's evolving bot detection, browser fingerprinting, CAPTCHAs on every other page, and aggressive IP banning, getting data at scale feels like running through a minefield.

The days of simple requests.get() are over. Modern anti-bot systems use TLS fingerprinting, behavioral analysis, and machine learning to distinguish humans from scripts. Even headless browsers like Playwright get flagged within minutes. Your residential IP gets burned after a handful of requests to protected sites.

This is where proxy-based scraping APIs come in. Instead of managing proxy pools, solving CAPTCHAs, and rotating user agents yourself, you offload all of that to a service built specifically for it. In this tutorial, I'll show you how to use ScraperAPI to build scrapers that actually work against modern anti-bot protections — with full Python code examples.

What Is ScraperAPI and Why Use It?

ScraperAPI is a proxy API that handles the three hardest parts of web scraping:

Proxy rotation — ScraperAPI manages a pool of 40M+ residential and datacenter IPs across multiple countries. Each request gets a fresh IP, so you never get rate-limited or banned.
JavaScript rendering — Many sites load content dynamically. ScraperAPI can render JavaScript before returning the HTML, so you get the full page — not an empty shell.
CAPTCHA solving — When a site throws up a CAPTCHA, ScraperAPI solves it automatically. You never see it.

The API is dead simple: you send a URL, it returns the HTML. All the proxy management, header rotation, retry logic, and anti-detection happens behind the scenes.

Why not just use free proxies?

Free proxy lists are unreliable, slow, and often compromised. You'll spend more engineering time maintaining a proxy rotator than actually building your scraper. ScraperAPI gives you 5,000 free requests per month with no credit card required — that's enough to prototype and test any project.

Getting Started

Step 1: Sign Up

Head to ScraperAPI and create a free account. You get 5,000 API credits per month — no credit card needed.

Step 2: Get Your API Key

After signing up, grab your API key from the dashboard. You'll use it in every request.

Step 3: Install Dependencies

pip install requests beautifulsoup4 aiohttp

Step 4: Your First Request

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://httpbin.org/ip"

response = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={url}"
)
print(response.text)

Run this and you'll see a different IP address every time — that's ScraperAPI's proxy rotation in action.

Practical Examples: Scraping Google, Amazon, and LinkedIn

Example 1: Scraping Google Search Results

Google is one of the hardest sites to scrape. It detects automated traffic aggressively and serves CAPTCHAs or blocks your IP within a few requests.

import requests
from bs4 import BeautifulSoup

API_KEY = "YOUR_SCRAPERAPI_KEY"

def scrape_google(query, num_results=10):
    url = f"https://www.google.com/search?q={query}&num={num_results}"
    response = requests.get(
        "http://api.scraperapi.com",
        params={"api_key": API_KEY, "url": url}
    )

    soup = BeautifulSoup(response.text, "html.parser")
    results = []

    for g in soup.select("div.tF2Cxc"):
        title = g.select_one("h3")
        link = g.select_one("a")
        snippet = g.select_one(".VwiC3b")

        if title and link:
            results.append({
                "title": title.text,
                "url": link["href"],
                "snippet": snippet.text if snippet else ""
            })

    return results

results = scrape_google("best python web frameworks 2026")
for r in results:
    print(f"{r['title']}\n  {r['url']}\n")

No CAPTCHA solving, no proxy rotation code — ScraperAPI handles it all.

Example 2: Scraping Amazon Product Data

Amazon's anti-bot system is notoriously aggressive. ScraperAPI's autoparse feature can return structured JSON instead of raw HTML for supported sites.

def scrape_amazon_product(asin):
    url = f"https://www.amazon.com/dp/{asin}"
    response = requests.get(
        "http://api.scraperapi.com",
        params={
            "api_key": API_KEY,
            "url": url,
            "render": "true"  # Enable JS rendering
        }
    )

    soup = BeautifulSoup(response.text, "html.parser")

    title = soup.select_one("#productTitle")
    price = soup.select_one(".a-price .a-offscreen")
    rating = soup.select_one("#acrPopover")

    return {
        "title": title.text.strip() if title else None,
        "price": price.text.strip() if price else None,
        "rating": rating.get("title", "").strip() if rating else None,
    }

product = scrape_amazon_product("B0CHXKQ59Q")
print(product)

The render=true parameter tells ScraperAPI to use a headless browser, which is essential for Amazon's JavaScript-heavy pages.

Example 3: Scraping LinkedIn Job Listings

LinkedIn blocks scrapers within seconds. With ScraperAPI, you can extract public job listings without getting your IP blacklisted:

def scrape_linkedin_jobs(keywords, location="United States"):
    url = (
        f"https://www.linkedin.com/jobs/search/"
        f"?keywords={keywords}&location={location}"
    )
    response = requests.get(
        "http://api.scraperapi.com",
        params={
            "api_key": API_KEY,
            "url": url,
            "render": "true",
            "country_code": "us"
        }
    )

    soup = BeautifulSoup(response.text, "html.parser")
    jobs = []

    for card in soup.select(".base-card"):
        title = card.select_one(".base-search-card__title")
        company = card.select_one(".base-search-card__subtitle")
        location = card.select_one(".job-search-card__location")

        if title:
            jobs.append({
                "title": title.text.strip(),
                "company": company.text.strip() if company else "",
                "location": location.text.strip() if location else ""
            })

    return jobs

jobs = scrape_linkedin_jobs("python developer")
for job in jobs[:5]:
    print(f"{job['title']} at {job['company']} — {job['location']}")

The country_code parameter routes your request through a US-based residential proxy, which is key for getting accurate LinkedIn results.

Advanced Features

Async Mode for High-Volume Scraping

For large-scale jobs (thousands of URLs), ScraperAPI offers an async mode. You submit URLs in bulk and poll for results:

import time

def async_scrape(urls):
    # Submit batch
    jobs = []
    for url in urls:
        resp = requests.post(
            "https://async.scraperapi.com/jobs",
            json={"apiKey": API_KEY, "url": url}
        )
        jobs.append(resp.json())

    # Poll for results
    results = []
    for job in jobs:
        while True:
            status = requests.get(
                f"https://async.scraperapi.com/jobs/{job['id']}"
            ).json()
            if status["status"] == "finished":
                results.append(status["response"])
                break
            time.sleep(2)

    return results

Async mode is more cost-efficient for large batches and avoids timeout issues on slow-loading pages.

Structured Data Endpoints

ScraperAPI provides dedicated endpoints for popular sites that return clean JSON — no parsing required:

# Google Search structured data
resp = requests.get(
    "https://api.scraperapi.com/structured/google/search",
    params={"api_key": API_KEY, "query": "web scraping tools"}
)
data = resp.json()  # Clean JSON with titles, URLs, snippets

# Amazon product structured data
resp = requests.get(
    "https://api.scraperapi.com/structured/amazon/product",
    params={"api_key": API_KEY, "asin": "B0CHXKQ59Q"}
)
product = resp.json()  # Price, rating, reviews — all parsed

Cost Comparison

Approach	Monthly Cost	Setup Time	Maintenance
DIY proxies + CAPTCHA solving	$200-500+	Days	Constant
Headless browser farm	$100-300+	Hours	Weekly
ScraperAPI	$0 (free tier) to $49+	Minutes	None

The free tier gives you 5,000 requests/month. The Hobby plan at $49/month gives you 100,000 — enough for most side projects and MVPs.

Real Project: Product Price Monitor

Let's put it all together. Here's a complete price monitoring script that tracks Amazon products and alerts you when prices drop:

import requests
from bs4 import BeautifulSoup
import json
import smtplib
from email.mime.text import MIMEText
from datetime import datetime

API_KEY = "YOUR_SCRAPERAPI_KEY"
PRICE_FILE = "prices.json"

PRODUCTS = [
    {"name": "Wireless Headphones", "asin": "B0CHXKQ59Q", "target": 50.00},
    {"name": "Mechanical Keyboard", "asin": "B09HKF3MHB", "target": 80.00},
    {"name": "USB-C Hub", "asin": "B087QTVPHT", "target": 25.00},
]

def get_price(asin):
    resp = requests.get(
        "http://api.scraperapi.com",
        params={
            "api_key": API_KEY,
            "url": f"https://www.amazon.com/dp/{asin}",
            "render": "true"
        },
        timeout=60
    )
    soup = BeautifulSoup(resp.text, "html.parser")
    price_el = soup.select_one(".a-price .a-offscreen")
    if price_el:
        return float(price_el.text.replace("$", "").replace(",", ""))
    return None

def load_history():
    try:
        with open(PRICE_FILE) as f:
            return json.load(f)
    except FileNotFoundError:
        return {}

def save_history(history):
    with open(PRICE_FILE, "w") as f:
        json.dump(history, f, indent=2)

def check_prices():
    history = load_history()
    timestamp = datetime.now().isoformat()
    alerts = []

    for product in PRODUCTS:
        price = get_price(product["asin"])
        if price is None:
            print(f"  Could not get price for {product['name']}")
            continue

        print(f"  {product['name']}: ${price:.2f} (target: ${product['target']:.2f})")

        # Save to history
        key = product["asin"]
        if key not in history:
            history[key] = []
        history[key].append({"price": price, "date": timestamp})

        # Check if below target
        if price <= product["target"]:
            alerts.append(f"{product['name']}: ${price:.2f} (target: ${product['target']:.2f})")

    save_history(history)

    if alerts:
        print(f"\n  PRICE ALERTS:\n" + "\n".join(f"    {a}" for a in alerts))

    return alerts

if __name__ == "__main__":
    print("Checking prices...")
    check_prices()

Run this on a cron job (daily or hourly) and you have a fully automated price tracker. Each check uses just 3 API credits — well within the free tier for daily monitoring.

Conclusion

Web scraping doesn't have to be an arms race against anti-bot systems. ScraperAPI abstracts away the hardest parts — proxy management, CAPTCHA solving, JavaScript rendering — so you can focus on what you actually want: the data.

In this tutorial, we built scrapers for Google, Amazon, and LinkedIn, used advanced features like async mode and structured data endpoints, and put together a complete price monitoring system. All of it runs on ScraperAPI's free tier.

Ready to start? Get 5,000 free API requests at ScraperAPI — no credit card required. That's enough to build and test any scraping project.

DEV Community