Mox Loop

Posted on Jan 27

Amazon Listing Traffic Analysis: Building a Real-Time Traffic Attribution System

#dataanalysis #api #python #ecommerce

TL;DR

When your Amazon listing's sales suddenly double but you can't identify the traffic source, you need a systematic approach to traffic analysis. This guide shows you how to build an automated traffic monitoring system using APIs to track organic rankings, competitor dynamics, and advertising performance in real-time.

What you'll learn:

How to identify Amazon listing traffic sources (organic, PPC, external)
Building an automated traffic monitoring system with Python
Using Pangolinfo Scrape API for real-time Amazon data collection
Setting up anomaly detection and alerts for traffic changes

The Problem: Amazon's Traffic Attribution Black Box

A seller recently contacted me with a common problem: their silicone baking mat listing went from 15 daily orders to 35+ orders overnight. They were only running a single automatic campaign with a $30 daily budget, and their advertising report showed no significant change in ad-generated orders.

The question: Where did the extra 20 daily orders come from?

This scenario highlights a fundamental challenge for Amazon sellers: Amazon's Seller Central doesn't provide clear traffic source attribution. You get aggregate metrics like Sessions and Page Views, but no breakdown of:

How much traffic comes from organic search vs. paid ads
Which keywords are driving the most traffic
Whether competitor stockouts are sending traffic your way
If external promotion campaigns are actually working

For developers building seller tools or data-driven sellers with technical teams, this black box is unacceptable.

Understanding Amazon Traffic Sources

Before we dive into the technical solution, let's map out the traffic landscape:

1. Organic On-Platform Traffic

Search Results: Buyers searching keywords and finding your listing

Ranking position is everything (page 1 vs. page 2 = 10x traffic difference)
Influenced by sales velocity, conversion rate, reviews, and relevance

Ranking Lists: Best Sellers, New Releases, Movers & Shakers

High-intent traffic with strong conversion rates
Provides brand exposure beyond keyword searches

Related Traffic: "Customers who bought this also bought"

Often overlooked but can represent 20-30% of traffic
Especially valuable for complementary products

2. Paid On-Platform Traffic

Sponsored Products (SP): Appear in search results and product pages
Sponsored Brands (SB): Top-of-search brand ads
Sponsored Display (SD): Retargeting and audience-based ads

3. External Traffic

Social media (Facebook, Instagram, TikTok)
Influencer marketing
Deal sites (Slickdeals, Kinja Deals)
Google Ads
Independent website referrals

Building a Traffic Attribution System

Here's how to build a system that actually tells you where your traffic comes from.

Architecture Overview

┌─────────────────────────────────────────────────┐
│           Data Collection Layer                 │
│  (Pangolinfo Scrape API + Amazon Ad API)       │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│           Data Storage Layer                    │
│     (PostgreSQL / MongoDB / CSV files)         │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│         Analysis & Detection Layer              │
│  (Ranking changes, anomaly detection, alerts)  │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│        Visualization & Reporting Layer          │
│      (Dashboard, charts, notifications)        │
└─────────────────────────────────────────────────┘

Step 1: Data Collection Setup

First, let's set up automated data collection using Python and Pangolinfo's Scrape API.

import requests
import json
from datetime import datetime
import pandas as pd

class AmazonTrafficMonitor:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.pangolinfo.com/v1"

    def track_keyword_rankings(self, keywords, marketplace="US"):
        """
        Track organic and sponsored rankings for target keywords
        """
        rankings_data = []

        for keyword in keywords:
            # Call Pangolinfo Scrape API to get search results
            response = requests.post(
                f"{self.base_url}/scrape/amazon/search",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={
                    "keyword": keyword,
                    "marketplace": marketplace,
                    "include_sponsored": True
                }
            )

            if response.status_code == 200:
                data = response.json()
                rankings_data.append({
                    "keyword": keyword,
                    "timestamp": datetime.now(),
                    "organic_results": data.get("organic_results", []),
                    "sponsored_results": data.get("sponsored_results", []),
                })

        return rankings_data

    def find_product_position(self, results, target_asin):
        """
        Find the ranking position of target ASIN in results
        """
        for idx, product in enumerate(results):
            if product.get("asin") == target_asin:
                return idx + 1
        return None

    def track_competitors(self, competitor_asins, marketplace="US"):
        """
        Monitor competitor pricing, inventory, ratings
        """
        competitor_data = []

        for asin in competitor_asins:
            response = requests.post(
                f"{self.base_url}/scrape/amazon/product",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={
                    "asin": asin,
                    "marketplace": marketplace
                }
            )

            if response.status_code == 200:
                data = response.json()
                competitor_data.append({
                    "asin": asin,
                    "timestamp": datetime.now(),
                    "price": data.get("price"),
                    "in_stock": data.get("availability", {}).get("in_stock"),
                    "rating": data.get("rating"),
                    "review_count": data.get("review_count"),
                    "bsr": data.get("best_sellers_rank")
                })

        return competitor_data

# Usage example
monitor = AmazonTrafficMonitor(api_key="your_api_key_here")

# Define monitoring targets
target_asin = "B08XYZ1234"
keywords = [
    "silicone baking mat",
    "non-stick baking sheet",
    "reusable baking liner"
]
competitor_asins = ["B07ABC1234", "B09DEF5678", "B06GHI9012"]

# Collect daily data
keyword_rankings = monitor.track_keyword_rankings(keywords)
competitor_data = monitor.track_competitors(competitor_asins)

# Find your product's positions
for ranking in keyword_rankings:
    position = monitor.find_product_position(
        ranking["organic_results"], 
        target_asin
    )
    print(f"Keyword: {ranking['keyword']}, Position: {position}")

Step 2: Automated Scheduling

Set up a cron job or use a task scheduler to run data collection daily:

# schedule_monitor.py
import schedule
import time
from traffic_monitor import AmazonTrafficMonitor
import csv

def daily_monitoring_job():
    """
    Run daily at 2 AM to collect data
    """
    monitor = AmazonTrafficMonitor(api_key="your_api_key")

    # Collect data
    keyword_rankings = monitor.track_keyword_rankings(keywords)
    competitor_data = monitor.track_competitors(competitor_asins)

    # Save to CSV for historical tracking
    timestamp = datetime.now().strftime("%Y%m%d")

    with open(f"rankings_{timestamp}.csv", "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["keyword", "position", "timestamp"])
        writer.writeheader()
        for ranking in keyword_rankings:
            position = monitor.find_product_position(
                ranking["organic_results"], 
                target_asin
            )
            writer.writerow({
                "keyword": ranking["keyword"],
                "position": position,
                "timestamp": ranking["timestamp"]
            })

    print(f"Data collection completed: {timestamp}")

# Schedule daily execution
schedule.every().day.at("02:00").do(daily_monitoring_job)

while True:
    schedule.run_pending()
    time.sleep(60)

Step 3: Anomaly Detection

Implement logic to detect significant traffic changes:

import pandas as pd
import numpy as np

class TrafficAnomalyDetector:
    def __init__(self, threshold=0.3):
        """
        threshold: percentage change to trigger alert (0.3 = 30%)
        """
        self.threshold = threshold

    def detect_ranking_changes(self, historical_data, current_data):
        """
        Compare current rankings with historical average
        """
        alerts = []

        for keyword in current_data.keys():
            current_position = current_data[keyword]

            # Get historical positions for this keyword
            hist_positions = historical_data[
                historical_data["keyword"] == keyword
            ]["position"].values

            if len(hist_positions) > 0:
                avg_position = np.mean(hist_positions)

                # Significant improvement (lower position number = better)
                if current_position < avg_position * (1 - self.threshold):
                    alerts.append({
                        "type": "ranking_improvement",
                        "keyword": keyword,
                        "previous_avg": avg_position,
                        "current": current_position,
                        "change": avg_position - current_position
                    })

                # Significant drop
                elif current_position > avg_position * (1 + self.threshold):
                    alerts.append({
                        "type": "ranking_drop",
                        "keyword": keyword,
                        "previous_avg": avg_position,
                        "current": current_position,
                        "change": current_position - avg_position
                    })

        return alerts

    def detect_competitor_changes(self, historical_data, current_data):
        """
        Detect competitor stockouts, price changes, etc.
        """
        alerts = []

        for competitor in current_data:
            asin = competitor["asin"]

            # Check for stockouts
            if not competitor["in_stock"]:
                alerts.append({
                    "type": "competitor_stockout",
                    "asin": asin,
                    "message": f"Competitor {asin} is out of stock"
                })

            # Check for significant price changes
            hist_prices = historical_data[
                historical_data["asin"] == asin
            ]["price"].values

            if len(hist_prices) > 0:
                avg_price = np.mean(hist_prices)
                current_price = competitor["price"]

                if current_price < avg_price * (1 - self.threshold):
                    alerts.append({
                        "type": "competitor_price_drop",
                        "asin": asin,
                        "previous_avg": avg_price,
                        "current": current_price,
                        "change_pct": (avg_price - current_price) / avg_price * 100
                    })

        return alerts

# Usage
detector = TrafficAnomalyDetector(threshold=0.2)

# Load historical data
historical_rankings = pd.read_csv("historical_rankings.csv")
historical_competitors = pd.read_csv("historical_competitors.csv")

# Check for anomalies
ranking_alerts = detector.detect_ranking_changes(
    historical_rankings, 
    current_rankings
)

competitor_alerts = detector.detect_competitor_changes(
    historical_competitors,
    current_competitor_data
)

# Send notifications
for alert in ranking_alerts + competitor_alerts:
    send_notification(alert)  # Email, Slack, etc.

Step 4: Traffic Source Attribution Logic

Now combine all data sources to attribute traffic:

def attribute_traffic_source(
    sales_change,
    ranking_changes,
    competitor_changes,
    ad_data
):
    """
    Determine most likely traffic source for sales changes
    """
    attribution = {
        "primary_source": None,
        "contributing_factors": [],
        "confidence": 0
    }

    # Check advertising data first
    ad_order_change = ad_data["current_orders"] - ad_data["avg_orders"]

    if ad_order_change / sales_change > 0.7:
        # 70%+ of sales increase from ads
        attribution["primary_source"] = "paid_advertising"
        attribution["confidence"] = 0.9
        return attribution

    # Check for ranking improvements
    significant_ranking_improvements = [
        r for r in ranking_changes 
        if r["type"] == "ranking_improvement" and r["change"] > 5
    ]

    if significant_ranking_improvements:
        attribution["primary_source"] = "organic_ranking_improvement"
        attribution["contributing_factors"] = [
            f"{r['keyword']}: {r['previous_avg']} → {r['current']}"
            for r in significant_ranking_improvements
        ]
        attribution["confidence"] = 0.85
        return attribution

    # Check for competitor stockouts
    stockouts = [
        c for c in competitor_changes 
        if c["type"] == "competitor_stockout"
    ]

    if stockouts:
        attribution["primary_source"] = "competitor_stockout"
        attribution["contributing_factors"] = [
            c["asin"] for c in stockouts
        ]
        attribution["confidence"] = 0.75
        return attribution

    # If no clear source, likely external traffic
    attribution["primary_source"] = "external_or_unknown"
    attribution["confidence"] = 0.5

    return attribution

Real-World Case Study

Let's return to the silicone baking mat example. Here's what the data revealed:

Data collected:

Keyword rankings for 10 core keywords (daily)
Competitor data for 5 main competitors (daily)
Advertising data from Amazon Ads API

Analysis results:

# Ranking change detected
{
    "keyword": "silicone baking mat",
    "previous_avg_position": 23,
    "current_position": 8,
    "change": +15 positions,
    "date": "2026-01-23"
}

# Competitor change detected
{
    "competitor_asin": "B07ABC1234",
    "status": "out_of_stock",
    "previous_position": 5,
    "date": "2026-01-22"
}

# Attribution result
{
    "primary_source": "organic_ranking_improvement",
    "contributing_factors": [
        "Ranking jump for 'silicone baking mat' from #23 to #8",
        "Competitor B07ABC1234 stockout"
    ],
    "confidence": 0.9
}

Root cause: A Facebook promotion one week earlier generated 50 concentrated orders, improving sales velocity and conversion rate, which triggered Amazon's algorithm to boost organic rankings.

Action taken:

Increased ad budget to consolidate ranking position
Accelerated inventory replenishment
Result: Sustained 30+ daily orders (50% improvement)

Why Pangolinfo for Amazon Data Collection

After testing multiple solutions, here's why Pangolinfo Scrape API stands out:

1. High Accuracy for Sponsored Ads

98% success rate for SP ad position scraping
Critical for understanding competitive advertising landscape

2. Real-Time Data

No estimation models—actual scraped data
Minute-level updates available

3. Comprehensive Coverage

Search results (organic + sponsored)
Product details
Best Sellers rankings
Reviews
Competitor data

4. Developer-Friendly

RESTful API with clear documentation
Multiple output formats (JSON, CSV, HTML)
Webhook support for real-time alerts

5. Flexible Pricing

Pay per request (no unused features)
Scales with your needs
Much more cost-effective than $3,588/year SaaS tools

Check out the API documentation for implementation details.

Best Practices

1. Start Simple, Scale Gradually

Begin with 5-10 core keywords
Add more metrics as you understand patterns

2. Automate Everything

Manual data collection doesn't scale
Set up scheduled jobs from day one

3. Focus on Actionable Metrics

Don't just collect data—define what actions each metric triggers
Example: Ranking drop > 10 positions → increase ad budget

4. Combine Multiple Data Sources

Amazon Ads API for advertising data
Pangolinfo for organic rankings and competitor data
Amazon Attribution for external traffic

5. Build Historical Context

Traffic analysis requires time-series data
Collect data for at least 2-4 weeks before drawing conclusions

Conclusion

Amazon's lack of transparent traffic attribution doesn't have to be a black box. By building a systematic monitoring system with automated data collection, anomaly detection, and attribution logic, you can:

Quickly identify traffic source changes
Respond to opportunities (ranking improvements, competitor stockouts)
Optimize advertising spend based on actual data
Make data-driven decisions instead of guessing

The technical implementation is straightforward—the real value comes from consistent execution and acting on insights.

Resources:

Discussion

Have you built your own Amazon traffic monitoring system? What challenges did you face? Share your experiences in the comments!

Tags: #api #python #ecommerce #amazon #dataanalysis #automation

DEV Community