TL;DR: Amazon's Buy Box data requires JavaScript rendering—standard HTTP requests won't work. DIY Playwright scrapers fail at scale (35–55% success rate in 2026 conditions). Pangolinfo Scrape API is the production path: >95% success rate, structured JSON output, 5–15 min freshness. Complete working code below.
Why This Is Harder Than It Looks
Before writing any code, there's an architectural reality to understand: Amazon's Buy Box data isn't in the HTML.
The #buybox DOM section on Amazon product pages loads via async JavaScript—roughly 800ms–2s after the initial page shell. Any approach using requests, httpx, or urllib will retrieve an empty container. The seller name, price, and fulfillment type fields simply aren't present in the static response.
That's the baseline problem. The harder challenge is Amazon's anti-scraping stack:
- TLS fingerprint detection (JA3 hash analysis) — identifies non-browser client signatures
- Behavioral analysis — flags request patterns that deviate from human browsing rhythms
- IP block lists — now covering most commercial datacenter ASNs and many residential proxy providers
- CAPTCHA gates — triggered on high-frequency access patterns
Real-world Playwright success rates against Amazon product pages in 2026: 35–55% without advanced anti-detection. That's a serious problem if you're building a repricing tool that needs reliable data.
The Data Schema You Actually Need
Not all Buy Box fields are equally valuable. Here's the priority breakdown:
{
"buy_box": {
"seller_id": "A3ABC123DEF456", // Critical: identity matching
"seller_name": "BrandX Official",
"price": 29.99, // Critical: repricing baseline
"shipping": 0.00,
"total_price": 29.99,
"fulfillment_type": "FBA", // Critical: determines response strategy
"is_prime": true,
"availability": "in_stock", // Important: stockout detection
"condition": "New",
"seller_rating": 4.8
},
"other_sellers": [
{
"seller_id": "A7XYZ987",
"price": 31.49,
"fulfillment_type": "FBM",
"is_prime": false
}
]
}
The FBA vs. FBM distinction is non-negotiable. Amazon's Buy Box algorithm weights fulfillment reliability—an FBM seller at your price is a different competitive threat than an FBA seller at your price. Ignoring this field leads to repricing decisions that are technically data-driven but practically wrong.
Option 1: DIY Playwright (With Caveats)
Use this only for < 5,000 daily requests. Not recommended for production repricing systems.
from playwright.async_api import async_playwright
import asyncio
import json
async def scrape_buybox_diy(asin: str) -> dict:
"""
DIY Buy Box scraper using Playwright.
WARNING:
- Success rate ~55-75% in 2026 conditions
- Requires residential proxy for sustained use
- Parsing selectors may break on Amazon A/B tests
- Not recommended for > 5,000 daily requests
"""
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=[
'--disable-blink-features=AutomationControlled',
'--disable-dev-shm-usage',
]
)
context = await browser.new_context(
proxy={"server": "http://residential_proxy:port"},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
viewport={"width": 1920, "height": 1080}
)
page = await context.new_page()
try:
await page.goto(
f"https://www.amazon.com/dp/{asin}",
wait_until="networkidle",
timeout=30000
)
# Wait for Buy Box to load
await page.wait_for_selector("#buybox", timeout=15000)
# WARNING: These selectors break when Amazon runs A/B tests
# You'll need to maintain them manually
price_el = await page.query_selector(
"#corePriceDisplay_desktop_feature_div .a-price-whole"
)
price = await price_el.text_content() if price_el else None
seller_el = await page.query_selector("#sellerProfileTriggerId")
seller = await seller_el.text_content() if seller_el else "Amazon"
return {
"asin": asin,
"price": price,
"seller": seller,
# fulfillment_type NOT reliably extractable without complex logic
}
except Exception as e:
print(f"Failed: {e}")
return {}
finally:
await browser.close()
# Run it
result = asyncio.run(scrape_buybox_diy("B0CXXX1234"))
print(result)
Known failure modes:
- Amazon frequently A/B tests different page layouts, breaking CSS selectors
- Headless detection gets you blocked within 50–200 requests on many IPs
- Buy Box FBA/FBM status requires complex multi-element parsing that breaks independently
Option 2: Pangolinfo Scrape API (Recommended)
Full production implementation with error handling, retry logic, and structured output:
import requests
import asyncio
import aiohttp
from typing import Optional, List
from dataclasses import dataclass, field
from datetime import datetime
import logging
logger = logging.getLogger(__name__)
@dataclass
class BuyBoxWinner:
seller_id: str
seller_name: str
price: float
shipping: float
total_price: float
fulfillment: str # "FBA" | "FBM"
is_prime: bool
availability: str # "in_stock" | "out_of_stock" | "limited"
seller_rating: float
scraped_at: datetime
@dataclass
class BuyBoxSnapshot:
asin: str
marketplace: str
winner: BuyBoxWinner
competing_sellers: List[dict] = field(default_factory=list)
class PangolinBuyBoxClient:
"""
Production Amazon Buy Box data scraping client.
Uses Pangolinfo Scrape API for reliable, structured data extraction.
Features:
- >95% success rate across all Amazon marketplaces
- Structured JSON output with complete Buy Box field schema
- Async batch submission for high-volume monitoring
- Built-in retry handling
"""
BASE_URL = "https://api.pangolinfo.com/v1/scrape"
def __init__(self, api_key: str):
self.api_key = api_key
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def fetch_sync(self, asin: str, marketplace: str = "US") -> Optional[BuyBoxSnapshot]:
"""Synchronous single ASIN fetch."""
response = requests.post(
self.BASE_URL,
headers=self.headers,
json=self._build_payload(asin, marketplace),
timeout=30
)
response.raise_for_status()
return self._parse_response(asin, marketplace, response.json())
async def fetch_async(
self,
asin: str,
marketplace: str = "US",
session: aiohttp.ClientSession = None
) -> Optional[BuyBoxSnapshot]:
"""Async single ASIN fetch for concurrent monitoring."""
async with session.post(
self.BASE_URL,
headers=self.headers,
json=self._build_payload(asin, marketplace)
) as resp:
data = await resp.json()
return self._parse_response(asin, marketplace, data)
async def fetch_batch_async(
self,
asin_list: List[str],
marketplace: str = "US",
concurrency: int = 20
) -> List[Optional[BuyBoxSnapshot]]:
"""
Concurrent batch fetch with configurable concurrency limit.
Use for monitoring hundreds of ASINs in a single cycle.
"""
semaphore = asyncio.Semaphore(concurrency)
async def fetch_with_limit(asin: str, session: aiohttp.ClientSession):
async with semaphore:
try:
return await self.fetch_async(asin, marketplace, session)
except Exception as e:
logger.error(f"Failed to fetch ASIN {asin}: {e}")
return None
async with aiohttp.ClientSession() as session:
tasks = [fetch_with_limit(asin, session) for asin in asin_list]
return await asyncio.gather(*tasks)
def _build_payload(self, asin: str, marketplace: str) -> dict:
return {
"url": f"https://www.amazon.com/dp/{asin}",
"marketplace": marketplace,
"parse_type": "product_detail",
"include_buybox": True,
"include_offers": True
}
def _parse_response(
self,
asin: str,
marketplace: str,
data: dict
) -> Optional[BuyBoxSnapshot]:
bb = data.get("buy_box")
if not bb:
return None
return BuyBoxSnapshot(
asin=asin,
marketplace=marketplace,
winner=BuyBoxWinner(
seller_id=bb["seller_id"],
seller_name=bb["seller_name"],
price=float(bb["price"]),
shipping=float(bb.get("shipping", 0)),
total_price=float(bb["total_price"]),
fulfillment=bb["fulfillment_type"],
is_prime=bool(bb["is_prime"]),
availability=bb["availability"],
seller_rating=float(bb.get("seller_rating", 0)),
scraped_at=datetime.fromisoformat(
data["scraped_at"].replace("Z", "+00:00")
)
),
competing_sellers=data.get("other_sellers", [])
)
# Usage example: monitor a list of ASINs
async def monitor_buybox_batch():
client = PangolinBuyBoxClient(api_key="your_pangolinfo_api_key")
watch_list = [
"B0CXXX1234",
"B0CYYY5678",
"B0CZZZ9012",
]
snapshots = await client.fetch_batch_async(watch_list, marketplace="US", concurrency=10)
for snapshot in snapshots:
if snapshot:
w = snapshot.winner
print(f"{snapshot.asin}: {w.seller_name} | ${w.price} | {w.fulfillment} | {w.availability}")
asyncio.run(monitor_buybox_batch())
Repricing Decision Logic
from enum import Enum
from typing import Optional
class Action(Enum):
HOLD = "hold"
RAISE = "raise"
MATCH = "match"
UNDERCUT = "undercut"
WAIT = "wait"
def decide_repricing(
snapshot: BuyBoxSnapshot,
my_seller_id: str,
my_current_price: float,
price_floor: float,
price_ceiling: float
) -> tuple[Action, Optional[float], str]:
"""
Three-level Buy Box repricing decision engine.
Returns: (action, target_price, reason)
"""
winner = snapshot.winner
# Level 1: Do we own the Buy Box?
if winner.seller_id == my_seller_id:
if my_current_price < price_ceiling * 0.95:
target = min(my_current_price * 1.02, price_ceiling)
return Action.RAISE, target, "We own Buy Box — testing price increase"
return Action.HOLD, None, "We own Buy Box at ceiling — holding"
# Level 2: Is competitor out of stock?
if winner.availability == "out_of_stock":
return Action.WAIT, None, "Competitor OOS — waiting for natural Buy Box recovery"
# Level 3: FBM competitor — FBA advantage may flip Box at price parity
if winner.fulfillment == "FBM":
target = winner.total_price
if target >= price_floor:
return Action.MATCH, target, f"FBM competitor — matching ${target} (FBA algo advantage)"
return Action.HOLD, None, f"FBM competitor below floor ${price_floor} — holding"
# Level 4: FBA competitor — minimal undercut
target = round(winner.total_price - 0.01, 2)
if target < price_floor:
return Action.HOLD, None, f"FBA competitor ${winner.total_price} below floor — no reprice"
return Action.UNDERCUT, target, f"FBA competitor — undercutting to ${target}"
Performance Numbers (2026 Production Data)
| Metric | DIY Playwright | Pangolinfo Scrape API |
|---|---|---|
| Success rate | 35–75% | >95% |
| Avg latency | 8–45s | 3–12s |
| Parse maintenance | 5–15 hrs/month | 0 hrs |
| Multi-marketplace | Manual per site |
marketplace param |
| Batch support | Build yourself | Native async API |
GitHub Repo & Docs
- Pangolinfo Scrape API Documentation
- Amazon Scraper Skill for MCP/Claude integration
- Pangolinfo Scrape API
Questions about the repricing logic or batch architecture? Drop them in the comments.
Top comments (0)