TL;DR
- Sync API: ~5s per call, blocks until result returns. Use for real-time, user-triggered queries.
- Async API: submits task in <200ms, result delivered via callback. Use for batch/scheduled work.
- Same credit cost both ways (1 credit/JSON call)
- Pangolinfo Scrape API supports both modes under the same token
The Problem with Defaulting to Sync
Amazon Scrape API calls take around 5 seconds per response—realistic given what the server has to do (load Amazon pages, bypass anti-bot systems, parse DOM, return structured JSON). For a single query, that's totally fine.
For a monitoring system tracking 5,000 ASINs daily? That's 6.9 hours of serial waiting. Add network variance and you're looking at 8–10 hours. Most business use cases can't absorb that delay.
Async Amazon data scraping breaks the dependency between request throughput and client-side wait time. You submit tasks in bulk, the platform processes them concurrently server-side, and results arrive at your callback endpoint. Your client never blocks.
API Endpoint Comparison
| Sync | Async | |
|---|---|---|
| Endpoint | POST /api/v1/scrape |
POST /api/v1/scrape/async |
| Client wait | ~5 seconds | <200ms (returns taskId) |
| Result delivery | Inline response body | POST to your callbackUrl
|
| Infrastructure req | None | Public callback endpoint |
| Credits (JSON) | 1/call | 1/call |
Sync Mode: Complete Example
import requests
def sync_scrape(token: str, asin: str) -> dict | None:
"""
Sync Amazon product detail scrape.
Blocks ~5s. Use for real-time queries.
"""
resp = requests.post(
"https://scrapeapi.pangolinfo.com/api/v1/scrape",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
},
json={
"url": f"https://www.amazon.com/dp/{asin}",
"parserName": "amzProductDetail",
"site": "",
"content": "",
"format": "json",
"bizContext": {"zipcode": "10041"}
},
timeout=30 # Must exceed API processing time (~5s)
)
data = resp.json()
if data.get("code") == 0:
return data["data"]["json"][0]["data"]["results"][0]
raise RuntimeError(f"API error: {data.get('message')}")
# Usage
product = sync_scrape("your_token", "B0DYTF8L2W")
print(product["title"])
print(product["star"]) # e.g. "4.5"
print(product["bestSellersRank"])
Available Parser Types
PARSER_NAMES = {
"amzProductDetail": "Product detail by ASIN or URL",
"amzKeyword": "Keyword search results",
"amzProductOfCategory": "Category product list (by Node ID)",
"amzProductOfSeller": "Seller product list",
"amzBestSellers": "Best sellers ranking",
"amzNewReleases": "New releases ranking"
}
Async Mode: Complete Implementation
1. Submit Tasks
import requests
import time
from typing import Optional
ASYNC_URL = "https://scrapeapi.pangolinfo.com/api/v1/scrape/async"
def async_submit(
token: str,
asin: str,
callback_url: str,
parser_name: str = "amzProductDetail",
zipcode: str = "10041"
) -> Optional[str]:
"""Submit async scraping task. Returns task_id immediately."""
resp = requests.post(
ASYNC_URL,
headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
json={
"url": f"https://www.amazon.com/dp/{asin}",
"callbackUrl": callback_url,
"zipcode": zipcode,
"format": "json",
"parserName": parser_name
},
timeout=10 # Submission itself is fast
)
data = resp.json()
if data.get("code") == 0:
return data["data"]["data"] # task_id
print(f"Submit failed for {asin}: {data.get('message')}")
return None
def bulk_submit(
token: str,
asins: list[str],
callback_url: str,
rate_delay: float = 0.1 # seconds between submissions
) -> dict[str, str]:
"""Bulk submit. Returns {asin: task_id}."""
task_map = {}
for i, asin in enumerate(asins):
task_id = async_submit(token, asin, callback_url)
if task_id:
task_map[asin] = task_id
if (i + 1) % 500 == 0:
print(f"Submitted {i+1}/{len(asins)}")
time.sleep(rate_delay)
return task_map
2. Callback Receiver (FastAPI)
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
app = FastAPI()
processed_ids: set[str] = set() # Use Redis in production
@app.post("/api/callback")
async def receive_amazon_callback(request: Request):
payload = await request.json()
task_id = payload.get("taskId")
# Idempotency guard - critical for production
if task_id in processed_ids:
return JSONResponse({"code": 0, "message": "duplicate"})
processed_ids.add(task_id)
# Extract results
results = (
payload
.get("data", {})
.get("json", [{}])[0]
.get("data", {})
.get("results", [])
)
if results:
product = results[0]
print(f"Received: {product.get('title', 'N/A')[:60]}")
# persist_to_db(task_id, product)
return JSONResponse({"code": 0, "message": "ok"})
3. Timeout Compensation
def compensate_missing_callbacks(
token: str,
task_map: dict[str, str],
received_ids: set[str]
) -> None:
"""
For tasks that didn't callback within expected window,
fetch results via sync API as fallback.
"""
missing = {asin: tid for asin, tid in task_map.items()
if tid not in received_ids}
for asin, task_id in missing.items():
print(f"Compensating {asin} (task {task_id})")
result = sync_scrape(token, asin) # fallback to sync
if result:
print(f"Compensated: {result.get('title', 'N/A')[:40]}")
Decision Framework
Daily task volume?
│
├── < 100 tasks/day
│ └── Need instant results?
│ ├── Yes → Sync API ✓
│ └── No → Sync API ✓ (async overhead not worth it)
│
└── ≥ 100 tasks/day OR scheduled batches
└── Can deploy public callback server?
├── Yes → Async API ✓ (10x+ throughput)
└── No → Sync + threading, or AMZ Data Tracker (no-code)
Local Development Setup
Testing async locally without deploying to a server:
# Terminal 1: Run your callback receiver
uvicorn main:app --port 5000
# Terminal 2: Expose it publicly via ngrok
ngrok http 5000
# → Forwarding: https://abc123.ngrok.io → http://localhost:5000
# Use the ngrok URL as your callbackUrl for testing
CALLBACK_URL = "https://abc123.ngrok.io/api/callback"
Performance Benchmarks
| Task Count | Sync (single-thread) | Async |
|---|---|---|
| 100 | ~8 min | ~2 min |
| 1,000 | ~83 min | ~5-10 min |
| 5,000 | ~7 hrs | ~20-40 min |
| 10,000 | ~14 hrs | ~45-90 min |
Credit cost is identical in both modes. Performance gap grows linearly with volume.
Resources
- Pangolinfo Scrape API — supports both sync and async
- API Documentation — complete parameter reference
- AMZ Data Tracker — no-code scheduled data collection
Questions? Drop them in the comments.
Top comments (0)