Web scraping in 2026 is a battlefield. If you've tried building a scraper recently, you already know — sites fight back hard. Between Cloudflare's evolving bot detection, browser fingerprinting, CAPTCHAs on every other page, and aggressive IP banning, getting data at scale feels like running through a minefield.
The days of simple requests.get() are over. Modern anti-bot systems use TLS fingerprinting, behavioral analysis, and machine learning to distinguish humans from scripts. Even headless browsers like Playwright get flagged within minutes. Your residential IP gets burned after a handful of requests to protected sites.
This is where proxy-based scraping APIs come in. Instead of managing proxy pools, solving CAPTCHAs, and rotating user agents yourself, you offload all of that to a service built specifically for it. In this tutorial, I'll show you how to use ScraperAPI to build scrapers that actually work against modern anti-bot protections — with full Python code examples.
What Is ScraperAPI and Why Use It?
ScraperAPI is a proxy API that handles the three hardest parts of web scraping:
- Proxy rotation — ScraperAPI manages a pool of 40M+ residential and datacenter IPs across multiple countries. Each request gets a fresh IP, so you never get rate-limited or banned.
- JavaScript rendering — Many sites load content dynamically. ScraperAPI can render JavaScript before returning the HTML, so you get the full page — not an empty shell.
- CAPTCHA solving — When a site throws up a CAPTCHA, ScraperAPI solves it automatically. You never see it.
The API is dead simple: you send a URL, it returns the HTML. All the proxy management, header rotation, retry logic, and anti-detection happens behind the scenes.
Why not just use free proxies?
Free proxy lists are unreliable, slow, and often compromised. You'll spend more engineering time maintaining a proxy rotator than actually building your scraper. ScraperAPI gives you 5,000 free requests per month with no credit card required — that's enough to prototype and test any project.
Getting Started
Step 1: Sign Up
Head to ScraperAPI and create a free account. You get 5,000 API credits per month — no credit card needed.
Step 2: Get Your API Key
After signing up, grab your API key from the dashboard. You'll use it in every request.
Step 3: Install Dependencies
pip install requests beautifulsoup4 aiohttp
Step 4: Your First Request
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://httpbin.org/ip"
response = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}"
)
print(response.text)
Run this and you'll see a different IP address every time — that's ScraperAPI's proxy rotation in action.
Practical Examples: Scraping Google, Amazon, and LinkedIn
Example 1: Scraping Google Search Results
Google is one of the hardest sites to scrape. It detects automated traffic aggressively and serves CAPTCHAs or blocks your IP within a few requests.
import requests
from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPERAPI_KEY"
def scrape_google(query, num_results=10):
url = f"https://www.google.com/search?q={query}&num={num_results}"
response = requests.get(
"http://api.scraperapi.com",
params={"api_key": API_KEY, "url": url}
)
soup = BeautifulSoup(response.text, "html.parser")
results = []
for g in soup.select("div.tF2Cxc"):
title = g.select_one("h3")
link = g.select_one("a")
snippet = g.select_one(".VwiC3b")
if title and link:
results.append({
"title": title.text,
"url": link["href"],
"snippet": snippet.text if snippet else ""
})
return results
results = scrape_google("best python web frameworks 2026")
for r in results:
print(f"{r['title']}\n {r['url']}\n")
No CAPTCHA solving, no proxy rotation code — ScraperAPI handles it all.
Example 2: Scraping Amazon Product Data
Amazon's anti-bot system is notoriously aggressive. ScraperAPI's autoparse feature can return structured JSON instead of raw HTML for supported sites.
def scrape_amazon_product(asin):
url = f"https://www.amazon.com/dp/{asin}"
response = requests.get(
"http://api.scraperapi.com",
params={
"api_key": API_KEY,
"url": url,
"render": "true" # Enable JS rendering
}
)
soup = BeautifulSoup(response.text, "html.parser")
title = soup.select_one("#productTitle")
price = soup.select_one(".a-price .a-offscreen")
rating = soup.select_one("#acrPopover")
return {
"title": title.text.strip() if title else None,
"price": price.text.strip() if price else None,
"rating": rating.get("title", "").strip() if rating else None,
}
product = scrape_amazon_product("B0CHXKQ59Q")
print(product)
The render=true parameter tells ScraperAPI to use a headless browser, which is essential for Amazon's JavaScript-heavy pages.
Example 3: Scraping LinkedIn Job Listings
LinkedIn blocks scrapers within seconds. With ScraperAPI, you can extract public job listings without getting your IP blacklisted:
def scrape_linkedin_jobs(keywords, location="United States"):
url = (
f"https://www.linkedin.com/jobs/search/"
f"?keywords={keywords}&location={location}"
)
response = requests.get(
"http://api.scraperapi.com",
params={
"api_key": API_KEY,
"url": url,
"render": "true",
"country_code": "us"
}
)
soup = BeautifulSoup(response.text, "html.parser")
jobs = []
for card in soup.select(".base-card"):
title = card.select_one(".base-search-card__title")
company = card.select_one(".base-search-card__subtitle")
location = card.select_one(".job-search-card__location")
if title:
jobs.append({
"title": title.text.strip(),
"company": company.text.strip() if company else "",
"location": location.text.strip() if location else ""
})
return jobs
jobs = scrape_linkedin_jobs("python developer")
for job in jobs[:5]:
print(f"{job['title']} at {job['company']} — {job['location']}")
The country_code parameter routes your request through a US-based residential proxy, which is key for getting accurate LinkedIn results.
Advanced Features
Async Mode for High-Volume Scraping
For large-scale jobs (thousands of URLs), ScraperAPI offers an async mode. You submit URLs in bulk and poll for results:
import time
def async_scrape(urls):
# Submit batch
jobs = []
for url in urls:
resp = requests.post(
"https://async.scraperapi.com/jobs",
json={"apiKey": API_KEY, "url": url}
)
jobs.append(resp.json())
# Poll for results
results = []
for job in jobs:
while True:
status = requests.get(
f"https://async.scraperapi.com/jobs/{job['id']}"
).json()
if status["status"] == "finished":
results.append(status["response"])
break
time.sleep(2)
return results
Async mode is more cost-efficient for large batches and avoids timeout issues on slow-loading pages.
Structured Data Endpoints
ScraperAPI provides dedicated endpoints for popular sites that return clean JSON — no parsing required:
# Google Search structured data
resp = requests.get(
"https://api.scraperapi.com/structured/google/search",
params={"api_key": API_KEY, "query": "web scraping tools"}
)
data = resp.json() # Clean JSON with titles, URLs, snippets
# Amazon product structured data
resp = requests.get(
"https://api.scraperapi.com/structured/amazon/product",
params={"api_key": API_KEY, "asin": "B0CHXKQ59Q"}
)
product = resp.json() # Price, rating, reviews — all parsed
Cost Comparison
| Approach | Monthly Cost | Setup Time | Maintenance |
|---|---|---|---|
| DIY proxies + CAPTCHA solving | $200-500+ | Days | Constant |
| Headless browser farm | $100-300+ | Hours | Weekly |
| ScraperAPI | $0 (free tier) to $49+ | Minutes | None |
The free tier gives you 5,000 requests/month. The Hobby plan at $49/month gives you 100,000 — enough for most side projects and MVPs.
Real Project: Product Price Monitor
Let's put it all together. Here's a complete price monitoring script that tracks Amazon products and alerts you when prices drop:
import requests
from bs4 import BeautifulSoup
import json
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
API_KEY = "YOUR_SCRAPERAPI_KEY"
PRICE_FILE = "prices.json"
PRODUCTS = [
{"name": "Wireless Headphones", "asin": "B0CHXKQ59Q", "target": 50.00},
{"name": "Mechanical Keyboard", "asin": "B09HKF3MHB", "target": 80.00},
{"name": "USB-C Hub", "asin": "B087QTVPHT", "target": 25.00},
]
def get_price(asin):
resp = requests.get(
"http://api.scraperapi.com",
params={
"api_key": API_KEY,
"url": f"https://www.amazon.com/dp/{asin}",
"render": "true"
},
timeout=60
)
soup = BeautifulSoup(resp.text, "html.parser")
price_el = soup.select_one(".a-price .a-offscreen")
if price_el:
return float(price_el.text.replace("$", "").replace(",", ""))
return None
def load_history():
try:
with open(PRICE_FILE) as f:
return json.load(f)
except FileNotFoundError:
return {}
def save_history(history):
with open(PRICE_FILE, "w") as f:
json.dump(history, f, indent=2)
def check_prices():
history = load_history()
timestamp = datetime.now().isoformat()
alerts = []
for product in PRODUCTS:
price = get_price(product["asin"])
if price is None:
print(f" Could not get price for {product['name']}")
continue
print(f" {product['name']}: ${price:.2f} (target: ${product['target']:.2f})")
# Save to history
key = product["asin"]
if key not in history:
history[key] = []
history[key].append({"price": price, "date": timestamp})
# Check if below target
if price <= product["target"]:
alerts.append(f"{product['name']}: ${price:.2f} (target: ${product['target']:.2f})")
save_history(history)
if alerts:
print(f"\n PRICE ALERTS:\n" + "\n".join(f" {a}" for a in alerts))
return alerts
if __name__ == "__main__":
print("Checking prices...")
check_prices()
Run this on a cron job (daily or hourly) and you have a fully automated price tracker. Each check uses just 3 API credits — well within the free tier for daily monitoring.
Conclusion
Web scraping doesn't have to be an arms race against anti-bot systems. ScraperAPI abstracts away the hardest parts — proxy management, CAPTCHA solving, JavaScript rendering — so you can focus on what you actually want: the data.
In this tutorial, we built scrapers for Google, Amazon, and LinkedIn, used advanced features like async mode and structured data endpoints, and put together a complete price monitoring system. All of it runs on ScraperAPI's free tier.
Ready to start? Get 5,000 free API requests at ScraperAPI — no credit card required. That's enough to build and test any scraping project.
Top comments (0)