Pangolinfo

Posted on Apr 1

Stop Drowning in Amazon Data: Build a Decision Framework with Scrape API

#api #python #ecommerce #automation

TL;DR: Amazon seller data overload is a real problem, but it's fixable with the right infrastructure. This tutorial shows you how to unify BSR, keyword SERP, and ad position data using Pangolinfo's Scrape API — so your analytics dashboard answers questions instead of creating more of them.

The Problem

Amazon operators typically juggle three data sources that don't talk to each other:

BSR rankings — updates hourly, extracted from product pages
ABA keyword data — weekly CSV downloads from Seller Central (manual)
Advertising reports — 24-48 hour delay, different metrics entirely

When BSR drops 20 positions, diagnosing the root cause means manually correlating data from all three sources. Average time: 2-3 hours per incident. And the diagnosis still might be wrong.

The fix is a unified data pipeline that aligns these three streams on a shared time axis. Here's how to build one.

Prerequisites

pip install requests psycopg2-binary python-dotenv

PANGOLINFO_API_KEY=your_key_here
DATABASE_URL=postgresql://user:pass@localhost:5432/amazon_analytics

Step 1: Database Schema

-- Unified time-series schema for Amazon operational data
CREATE TABLE asin_snapshots (
    id              BIGSERIAL PRIMARY KEY,
    asin            VARCHAR(20) NOT NULL,
    marketplace     VARCHAR(5)  NOT NULL DEFAULT 'US',
    bsr_main        INTEGER,
    bsr_category    VARCHAR(100),
    price           DECIMAL(10,2),
    review_count    INTEGER,
    rating          DECIMAL(3,2),
    collected_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE keyword_snapshots (
    id                  BIGSERIAL PRIMARY KEY,
    keyword             TEXT NOT NULL,
    target_asin         VARCHAR(20),
    organic_rank        SMALLINT,       -- NULL = not on first page
    sp_positions_count  SMALLINT NOT NULL DEFAULT 0,
    target_sp_rank      SMALLINT,       -- NULL = no SP ad running
    collected_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX ON asin_snapshots (asin, collected_at DESC);
CREATE INDEX ON keyword_snapshots (keyword, collected_at DESC);

Step 2: Data Collector

"""
collector.py - Pangolinfo Scrape API wrapper
Handles ASIN snapshots and keyword SERP collection
"""

import os
import requests
from datetime import datetime
from dataclasses import dataclass
from typing import Optional

API_KEY = os.getenv("PANGOLINFO_API_KEY")
BASE_URL = "https://api.pangolinfo.com/v1/amazon"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


@dataclass
class AsinSnapshot:
    asin: str
    marketplace: str
    bsr_main: Optional[int]
    bsr_category: Optional[str]
    price: Optional[float]
    review_count: Optional[int]
    rating: Optional[float]
    collected_at: str


@dataclass  
class KeywordSnapshot:
    keyword: str
    target_asin: Optional[str]
    organic_rank: Optional[int]
    sp_positions_count: int
    target_sp_rank: Optional[int]
    collected_at: str


def collect_asin(asin: str, marketplace: str = "US") -> AsinSnapshot:
    """Collect product snapshot: BSR, price, reviews, ratings."""
    resp = requests.post(
        f"{BASE_URL}/product",
        headers=HEADERS,
        json={
            "asin": asin,
            "marketplace": marketplace,
            "fields": ["bsr", "price", "reviews", "ratings"],
            "output_format": "json"
        }
    )
    resp.raise_for_status()
    d = resp.json()

    return AsinSnapshot(
        asin=asin,
        marketplace=marketplace,
        bsr_main=d.get("bsr", {}).get("main_category_rank"),
        bsr_category=d.get("bsr", {}).get("main_category_name"),
        price=d.get("price", {}).get("current"),
        review_count=d.get("reviews", {}).get("count"),
        rating=d.get("ratings", {}).get("average"),
        collected_at=datetime.utcnow().isoformat()
    )


def collect_keyword(
    keyword: str,
    target_asin: Optional[str] = None,
    marketplace: str = "US"
) -> KeywordSnapshot:
    """
    Collect keyword SERP: organic ranks + sponsored positions.

    Note: Sponsored ad position extraction is one of the more
    technically challenging aspects of Amazon scraping due to
    dynamic rendering. Pangolinfo handles this at the API level,
    so you get complete SP position data reliably.
    """
    resp = requests.post(
        f"{BASE_URL}/keyword-serp",
        headers=HEADERS,
        json={
            "keyword": keyword,
            "marketplace": marketplace,
            "include_ad_positions": True,
            "output_format": "json"
        }
    )
    resp.raise_for_status()
    d = resp.json()

    # Find target ASIN's organic position
    organic_rank = None
    if target_asin:
        for item in d.get("organic_results", []):
            if item.get("asin") == target_asin:
                organic_rank = item.get("position")
                break

    # Find target ASIN's sponsored position
    sp_positions = d.get("sponsored_positions", [])
    target_sp_rank = None
    if target_asin:
        for sp in sp_positions:
            if sp.get("asin") == target_asin:
                target_sp_rank = sp.get("position")
                break

    return KeywordSnapshot(
        keyword=keyword,
        target_asin=target_asin,
        organic_rank=organic_rank,
        sp_positions_count=len(sp_positions),
        target_sp_rank=target_sp_rank,
        collected_at=datetime.utcnow().isoformat()
    )

Step 3: Database Writer

"""
writer.py - Persist collected data to PostgreSQL
"""

import os
import psycopg2
from dataclasses import asdict
from collector import AsinSnapshot, KeywordSnapshot

conn = psycopg2.connect(os.getenv("DATABASE_URL"))


def write_asin_snapshot(snapshot: AsinSnapshot):
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO asin_snapshots
                (asin, marketplace, bsr_main, bsr_category, price, review_count, rating, collected_at)
            VALUES
                (%(asin)s, %(marketplace)s, %(bsr_main)s, %(bsr_category)s,
                 %(price)s, %(review_count)s, %(rating)s, %(collected_at)s)
        """, asdict(snapshot))
    conn.commit()


def write_keyword_snapshot(snapshot: KeywordSnapshot):
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO keyword_snapshots
                (keyword, target_asin, organic_rank, sp_positions_count, target_sp_rank, collected_at)
            VALUES
                (%(keyword)s, %(target_asin)s, %(organic_rank)s,
                 %(sp_positions_count)s, %(target_sp_rank)s, %(collected_at)s)
        """, asdict(snapshot))
    conn.commit()

Step 4: Scheduled Monitoring Job

"""
monitor.py - Daily monitoring routine
Run via cron: 0 */6 * * * python monitor.py
"""

import json
from collector import collect_asin, collect_keyword
from writer import write_asin_snapshot, write_keyword_snapshot

# Configure your monitoring scope here
CONFIG = {
    "target_asin": "B09XXXXXXX",
    "competitor_asins": ["B08YYYYYYY", "B07ZZZZZZZ"],
    "core_keywords": [
        "camping chair lightweight",
        "portable outdoor chair",
        "folding camp chair"
    ],
    "marketplace": "US"
}


def run_monitoring():
    all_asins = [CONFIG["target_asin"]] + CONFIG["competitor_asins"]

    print("Collecting ASIN snapshots...")
    for asin in all_asins:
        snapshot = collect_asin(asin, CONFIG["marketplace"])
        write_asin_snapshot(snapshot)
        print(f"  [{asin}] BSR: {snapshot.bsr_main} | Reviews: {snapshot.review_count}")

    print("\nCollecting keyword SERP data...")
    for keyword in CONFIG["core_keywords"]:
        snapshot = collect_keyword(keyword, CONFIG["target_asin"], CONFIG["marketplace"])
        write_keyword_snapshot(snapshot)
        print(
            f"  [{keyword}] "
            f"Organic: {snapshot.organic_rank or 'not ranked'} | "
            f"SP slots: {snapshot.sp_positions_count} | "
            f"Your SP: {snapshot.target_sp_rank or 'not running'}"
        )


if __name__ == "__main__":
    run_monitoring()

Step 5: Anomaly Detection Query

Once you have a few days of data, this query detects BSR drops that may require immediate attention:

-- Detect BSR drops > 10% vs previous collection
WITH ranked AS (
    SELECT
        asin,
        bsr_main,
        LAG(bsr_main) OVER (PARTITION BY asin ORDER BY collected_at) AS prev_bsr,
        collected_at
    FROM asin_snapshots
    WHERE collected_at >= NOW() - INTERVAL '24 hours'
),
anomalies AS (
    SELECT
        asin,
        prev_bsr,
        bsr_main AS current_bsr,
        ROUND(((bsr_main - prev_bsr)::float / prev_bsr * 100)::numeric, 1) AS change_pct,
        collected_at
    FROM ranked
    WHERE prev_bsr IS NOT NULL
      AND bsr_main > prev_bsr * 1.10  -- BSR increased (rank worsened) by 10%+
)
SELECT * FROM anomalies ORDER BY change_pct DESC;

Cross-reference any anomaly result against keyword_snapshots on the same timestamp window to see if the BSR drop correlates with a sponsored position displacement.

What You Get

After running for a few days, you'll have a queryable time-series of:

ASIN BSR trajectory (visualize in Grafana or Metabase)
Keyword organic rank changes per collection window
SP position count fluctuations on your core keywords
Your ASIN's presence/absence in sponsored positions

When BSR drops, run the anomaly query. Cross-reference with keyword data. The root cause that used to take three hours manually is now visible in one joined query.

No-Code Alternative

Building and maintaining this pipeline requires ongoing engineering time. If you want the same structured, unified view without writing code, AMZ Data Tracker provides a configuration interface for the same monitoring logic — ASIN tracking, keyword ranking, competitor monitoring — with automatic collection and a built-in dashboard.

API documentation for the Scrape API is at docs.pangolinfo.com. Free trial credits available.

Have questions about the implementation or want to share how you've built your own Amazon analytics stack? Drop a comment — happy to discuss architecture tradeoffs.

DEV Community