Amazon Scraper API Benchmark: 12M Requests Across 4 Platforms — What the Data Actually Shows
TL;DR
- Self-built scrapers: 71.4% product page success rate at scale, 60% of engineering time on anti-bot maintenance
- Competitor A: 89.1% product pages, 81.2% SP ad slots, 8,900ms P99 — workable but with real blind spots
- Pangolinfo: 98.6% product pages, 97.3% SP ad slots, 3,890ms P99, full Customer Says extraction
- Cost delta: ~¥27,500/month savings at 100K pages/day vs self-built
- Key limitation: Non-Amazon platform (Walmart/Shopee) maturity gaps, English docs update faster than Chinese
Full benchmark methodology and data below. Code examples included.
Why I Ran This Test
Our team had been running a self-built Amazon scraping infrastructure for about eight months when the maintenance burden became impossible to ignore. Not because anything was catastrophically broken — but because the engineering economics had quietly inverted.
Amazon's anti-bot infrastructure in 2025 is not the problem it was in 2022. Behavioral fingerprinting, CAPTCHA rotation, JavaScript rendering detection, headless browser identification — each layer requires dedicated engineering to counter, and Amazon ships updates to these systems constantly. We were spending 60% of engineering hours on anti-scraping countermeasures. The remaining 40% went to actual business logic. Forrester Research's 2024 benchmark puts average self-built scraper maintenance at 40–60 hours per month; we were running hot relative to that baseline.
The decision to evaluate commercial Amazon scraper API alternatives was ultimately an engineering economics decision, not a technical one.
Test Setup
Three systems ran simultaneously across the full 60-day period: Pangolinfo Scrape API, Competitor A (commercial reasons preclude naming), and our self-built infrastructure.
All requests were production traffic — real business data needs, not synthetic benchmark loads. Coverage:
- Amazon product detail pages (ASIN-level)
- Search results pages (keyword + category)
- BSR list pages (Best Sellers, New Releases, Movers & Shakers)
- Sponsored Products ad slot data
- Review pages including Customer Says AI summary module
- Platforms: Amazon US, UK, Japan; Walmart US
Primary metrics: collection success rate, P50–P99 latency, JSON output completeness, SP ad slot capture rate, Customer Says field integrity.
Results
Success Rates
60-day averages:
results = {
"product_detail_page": {
"pangolinfo": 98.6,
"competitor_a": 89.1,
"self_built": 71.4, # drops lower during CAPTCHA events
},
"search_results_page": {
"pangolinfo": 97.2,
"competitor_a": 84.3,
"self_built": 62.8,
},
"review_pages": {
"pangolinfo": 96.8,
"competitor_a": 79.6,
"self_built": 55.1,
},
"sp_ad_slots": {
"pangolinfo": 97.3,
"competitor_a": 81.2,
"self_built": 38.4, # multiple CAPTCHA-triggered outages in this period
},
"bsr_list_pages": {
"pangolinfo": 98.1,
"competitor_a": 86.7,
"self_built": 65.2,
}
}
The self-built ad slot number (38.4%) includes several CAPTCHA-triggered outage windows. Stable-period average was approximately 55%. The distinction matters because "outage events are operational reality, not edge cases" — so 60-day average is the more honest metric.
Response Latency (Amazon Product Detail Pages, ms)
latency_ms = {
"P50": {"pangolinfo": 890, "competitor_a": 1450},
"P75": {"pangolinfo": 1240, "competitor_a": 2100},
"P90": {"pangolinfo": 1780, "competitor_a": 3200},
"P95": {"pangolinfo": 2340, "competitor_a": 4800},
"P99": {"pangolinfo": 3890, "competitor_a": 8900},
}
P50 gap (890ms vs 1450ms) is real but not operationally decisive for most workflows. P99 gap (3,890ms vs 8,900ms) is the constraining number — it defines what SLA you can responsibly promise for time-sensitive workflows. Pangolinfo's ceiling is less than half Competitor A's.
SP Ad Slot Capture — The Key Differentiator
100K dedicated SP ad slot requests. Results:
- Pangolinfo: 97.3% (official claim 98%; 0.7pp gap, within expected variance)
- Competitor A: 81.2%
- Difference: 16.1 percentage points
Practical translation: monitoring 500 keywords daily, Competitor A's feed is missing ~81 keyword ad slot positions every day. Whether those happen to be the keywords where your competitor is making a strategic push is unknowable — which is the point.
Deep Dive: Customer Says Extraction

Amazon's Customer Says module is an AI-generated review summary that condenses product feedback into structured positive/negative highlights. It's high-information-density data for competitive product positioning. It's also technically difficult to extract reliably.
Technical challenge layers:
- Layer 1 (static HTML): Most scrapers handle this fine
- Layer 2 (JS-rendered content): Requires full browser rendering
- Layer 3 (Customer Says — dynamic conditional loading):
- Different load triggers per ASIN/category characteristics
- Amazon-specific protection for this module
- Structure varies by content type
Pangolinfo Reviews Scraper API: Layer 3 ✓ (complete, stable)
Competitor A: Layer 2 only (Customer Says score: 5/10)
Self-built: Layer 1–2 (Layer 3: occasional partial returns, unreliable)
Working example with Pangolinfo Reviews Scraper API:
import requests
from typing import Optional
def fetch_reviews_with_customer_says(
asin: str,
api_key: str,
marketplace: str = "US",
star_filter: Optional[list] = None,
count: int = 20
) -> dict:
"""
Fetch reviews + Customer Says summary via Pangolinfo Reviews Scraper API.
Customer Says = Amazon's AI-generated review summary module.
Competitor A scored 5/10 on this capability in independent testing.
"""
payload = {
"asin": asin,
"marketplace": marketplace,
"sort": "recent",
"count": count,
"include_customer_says": True, # Key parameter for the AI summary
"fields": [
"rating",
"review_text",
"review_title",
"verified_purchase",
"date",
"helpful_votes"
]
}
if star_filter:
payload["star_filter"] = star_filter # e.g., [1, 2] for negative reviews only
response = requests.post(
"https://api.pangolinfo.com/v1/amazon/reviews",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json=payload,
timeout=30
)
response.raise_for_status()
data = response.json()
return {
# AI-generated summary — the high-value field
"customer_says": {
"positive": data.get("customer_says", {}).get("positive", ""),
"negative": data.get("customer_says", {}).get("negative", ""),
},
"reviews": data.get("reviews", []),
"total_count": data.get("total_review_count"),
"average_rating": data.get("average_rating")
}
# Usage: fetch negative reviews + Customer Says for competitor ASIN
result = fetch_reviews_with_customer_says(
asin="B07EXAMPLE1",
api_key="YOUR_PANGOLINFO_KEY",
star_filter=[1, 2], # Only 1-2 star reviews
count=50
)
print("Customer Says Positive:", result["customer_says"]["positive"])
# Example: "Customers appreciate the durable build quality and simple setup process."
print("Customer Says Negative:", result["customer_says"]["negative"])
# Example: "Some customers report size inconsistencies and slow customer support response."
# These two sentences compress dozens of reviews into actionable competitive intelligence
ZIP-Level Collection: Regional Pricing Intelligence
Amazon's Prime delivery zone pricing creates regional variation that cross-border sellers routinely underestimate. Same ASIN, different ZIP codes, potentially different prices, different Prime eligibility states, different delivery estimate messaging.
def compare_regional_pricing(
asin: str,
zip_codes: list,
api_key: str
) -> dict:
"""
Simulate what shoppers in different regions see for the same product.
Pangolinfo supports ZIP-level differentiated collection.
Competitor A: 5/10 support. Self-built: essentially impossible at scale.
"""
results = {}
for zip_code in zip_codes:
response = requests.post(
"https://api.pangolinfo.com/v1/amazon/product",
headers={"Authorization": f"Bearer {api_key}"},
json={
"asin": asin,
"marketplace": "US",
"zip_code": zip_code,
"fields": ["price", "prime_eligibility", "delivery_estimate", "availability"]
},
timeout=30
)
data = response.json()
results[zip_code] = {
"price": data.get("price", {}).get("current"),
"prime": data.get("prime_eligibility"),
"delivery": data.get("delivery_estimate"),
"in_stock": data.get("availability") == "In Stock"
}
return results
# Compare New York (10001) vs Los Angeles (90001) vs Chicago (60601)
regional_data = compare_regional_pricing(
asin="B07EXAMPLE1",
zip_codes=["10001", "90001", "60601"],
api_key="YOUR_KEY"
)
for zip_code, data in regional_data.items():
print(f"ZIP {zip_code}: ${data['price']} | Prime: {data['prime']} | {data['delivery']}")
Cost Model
At 100K pages/day:
monthly_costs = {
"self_built": {
"servers": 8_000, # RMB
"proxy_ips": 12_000,
"engineering_hrs": 18_000, # 60h × ¥300/hr
"emergency_fixes": 4_000,
"total": 42_000
},
"pangolinfo": {
"api_fee": 8_500, # estimated from pricing tiers
"engineering_ongoing": 0, # near-zero after initial integration
"total": 8_500
},
"monthly_savings": 33_500
}
# 3-year TCO differential: ~¥1.2M
Honest Limitations
Documentation parity: English API Reference updates faster than Chinese. Chinese-language technical teams should monitor both versions.
Non-Amazon platform maturity: Walmart and Shopee parsing templates have measurable gaps vs Amazon coverage. Specific field availability on Walmart varies by SKU type.
Peak concurrency: At 8M+ pages/day in pressure testing, ~3.2% request queue delays appeared. Imperceptible for normal business scenarios; needs pre-negotiation if you're building financial-grade real-time data infrastructure.
Resources
- Pangolinfo Amazon Scraper API — free trial available
- Reviews Scraper API — Customer Says support
- AMZ Data Tracker — no-code monitoring
- API Documentation
- Console (trial)
Have you run your own Amazon scraper API comparisons? I'm particularly curious about Walmart-focused use cases where the Pangolinfo maturity gap I noted might be more or less pronounced than I found. Drop your experience in the comments.

Top comments (0)