DEV Community

Cover image for Amazon Listing Traffic Analysis: Building a Real-Time Traffic Attribution System
Mox Loop
Mox Loop

Posted on

Amazon Listing Traffic Analysis: Building a Real-Time Traffic Attribution System

TL;DR

When your Amazon listing's sales suddenly double but you can't identify the traffic source, you need a systematic approach to traffic analysis. This guide shows you how to build an automated traffic monitoring system using APIs to track organic rankings, competitor dynamics, and advertising performance in real-time.

What you'll learn:

  • How to identify Amazon listing traffic sources (organic, PPC, external)
  • Building an automated traffic monitoring system with Python
  • Using Pangolinfo Scrape API for real-time Amazon data collection
  • Setting up anomaly detection and alerts for traffic changes

The Problem: Amazon's Traffic Attribution Black Box

A seller recently contacted me with a common problem: their silicone baking mat listing went from 15 daily orders to 35+ orders overnight. They were only running a single automatic campaign with a $30 daily budget, and their advertising report showed no significant change in ad-generated orders.

The question: Where did the extra 20 daily orders come from?

This scenario highlights a fundamental challenge for Amazon sellers: Amazon's Seller Central doesn't provide clear traffic source attribution. You get aggregate metrics like Sessions and Page Views, but no breakdown of:

  • How much traffic comes from organic search vs. paid ads
  • Which keywords are driving the most traffic
  • Whether competitor stockouts are sending traffic your way
  • If external promotion campaigns are actually working

For developers building seller tools or data-driven sellers with technical teams, this black box is unacceptable.


Understanding Amazon Traffic Sources

Before we dive into the technical solution, let's map out the traffic landscape:

1. Organic On-Platform Traffic

Search Results: Buyers searching keywords and finding your listing

  • Ranking position is everything (page 1 vs. page 2 = 10x traffic difference)
  • Influenced by sales velocity, conversion rate, reviews, and relevance

Ranking Lists: Best Sellers, New Releases, Movers & Shakers

  • High-intent traffic with strong conversion rates
  • Provides brand exposure beyond keyword searches

Related Traffic: "Customers who bought this also bought"

  • Often overlooked but can represent 20-30% of traffic
  • Especially valuable for complementary products

2. Paid On-Platform Traffic

Sponsored Products (SP): Appear in search results and product pages
Sponsored Brands (SB): Top-of-search brand ads
Sponsored Display (SD): Retargeting and audience-based ads

3. External Traffic

  • Social media (Facebook, Instagram, TikTok)
  • Influencer marketing
  • Deal sites (Slickdeals, Kinja Deals)
  • Google Ads
  • Independent website referrals

Building a Traffic Attribution System

Here's how to build a system that actually tells you where your traffic comes from.

Architecture Overview

┌─────────────────────────────────────────────────┐
│           Data Collection Layer                 │
│  (Pangolinfo Scrape API + Amazon Ad API)       │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│           Data Storage Layer                    │
│     (PostgreSQL / MongoDB / CSV files)         │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│         Analysis & Detection Layer              │
│  (Ranking changes, anomaly detection, alerts)  │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│        Visualization & Reporting Layer          │
│      (Dashboard, charts, notifications)        │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Step 1: Data Collection Setup

First, let's set up automated data collection using Python and Pangolinfo's Scrape API.

import requests
import json
from datetime import datetime
import pandas as pd

class AmazonTrafficMonitor:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.pangolinfo.com/v1"

    def track_keyword_rankings(self, keywords, marketplace="US"):
        """
        Track organic and sponsored rankings for target keywords
        """
        rankings_data = []

        for keyword in keywords:
            # Call Pangolinfo Scrape API to get search results
            response = requests.post(
                f"{self.base_url}/scrape/amazon/search",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={
                    "keyword": keyword,
                    "marketplace": marketplace,
                    "include_sponsored": True
                }
            )

            if response.status_code == 200:
                data = response.json()
                rankings_data.append({
                    "keyword": keyword,
                    "timestamp": datetime.now(),
                    "organic_results": data.get("organic_results", []),
                    "sponsored_results": data.get("sponsored_results", []),
                })

        return rankings_data

    def find_product_position(self, results, target_asin):
        """
        Find the ranking position of target ASIN in results
        """
        for idx, product in enumerate(results):
            if product.get("asin") == target_asin:
                return idx + 1
        return None

    def track_competitors(self, competitor_asins, marketplace="US"):
        """
        Monitor competitor pricing, inventory, ratings
        """
        competitor_data = []

        for asin in competitor_asins:
            response = requests.post(
                f"{self.base_url}/scrape/amazon/product",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={
                    "asin": asin,
                    "marketplace": marketplace
                }
            )

            if response.status_code == 200:
                data = response.json()
                competitor_data.append({
                    "asin": asin,
                    "timestamp": datetime.now(),
                    "price": data.get("price"),
                    "in_stock": data.get("availability", {}).get("in_stock"),
                    "rating": data.get("rating"),
                    "review_count": data.get("review_count"),
                    "bsr": data.get("best_sellers_rank")
                })

        return competitor_data

# Usage example
monitor = AmazonTrafficMonitor(api_key="your_api_key_here")

# Define monitoring targets
target_asin = "B08XYZ1234"
keywords = [
    "silicone baking mat",
    "non-stick baking sheet",
    "reusable baking liner"
]
competitor_asins = ["B07ABC1234", "B09DEF5678", "B06GHI9012"]

# Collect daily data
keyword_rankings = monitor.track_keyword_rankings(keywords)
competitor_data = monitor.track_competitors(competitor_asins)

# Find your product's positions
for ranking in keyword_rankings:
    position = monitor.find_product_position(
        ranking["organic_results"], 
        target_asin
    )
    print(f"Keyword: {ranking['keyword']}, Position: {position}")
Enter fullscreen mode Exit fullscreen mode

Step 2: Automated Scheduling

Set up a cron job or use a task scheduler to run data collection daily:

# schedule_monitor.py
import schedule
import time
from traffic_monitor import AmazonTrafficMonitor
import csv

def daily_monitoring_job():
    """
    Run daily at 2 AM to collect data
    """
    monitor = AmazonTrafficMonitor(api_key="your_api_key")

    # Collect data
    keyword_rankings = monitor.track_keyword_rankings(keywords)
    competitor_data = monitor.track_competitors(competitor_asins)

    # Save to CSV for historical tracking
    timestamp = datetime.now().strftime("%Y%m%d")

    with open(f"rankings_{timestamp}.csv", "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["keyword", "position", "timestamp"])
        writer.writeheader()
        for ranking in keyword_rankings:
            position = monitor.find_product_position(
                ranking["organic_results"], 
                target_asin
            )
            writer.writerow({
                "keyword": ranking["keyword"],
                "position": position,
                "timestamp": ranking["timestamp"]
            })

    print(f"Data collection completed: {timestamp}")

# Schedule daily execution
schedule.every().day.at("02:00").do(daily_monitoring_job)

while True:
    schedule.run_pending()
    time.sleep(60)
Enter fullscreen mode Exit fullscreen mode

Step 3: Anomaly Detection

Implement logic to detect significant traffic changes:

import pandas as pd
import numpy as np

class TrafficAnomalyDetector:
    def __init__(self, threshold=0.3):
        """
        threshold: percentage change to trigger alert (0.3 = 30%)
        """
        self.threshold = threshold

    def detect_ranking_changes(self, historical_data, current_data):
        """
        Compare current rankings with historical average
        """
        alerts = []

        for keyword in current_data.keys():
            current_position = current_data[keyword]

            # Get historical positions for this keyword
            hist_positions = historical_data[
                historical_data["keyword"] == keyword
            ]["position"].values

            if len(hist_positions) > 0:
                avg_position = np.mean(hist_positions)

                # Significant improvement (lower position number = better)
                if current_position < avg_position * (1 - self.threshold):
                    alerts.append({
                        "type": "ranking_improvement",
                        "keyword": keyword,
                        "previous_avg": avg_position,
                        "current": current_position,
                        "change": avg_position - current_position
                    })

                # Significant drop
                elif current_position > avg_position * (1 + self.threshold):
                    alerts.append({
                        "type": "ranking_drop",
                        "keyword": keyword,
                        "previous_avg": avg_position,
                        "current": current_position,
                        "change": current_position - avg_position
                    })

        return alerts

    def detect_competitor_changes(self, historical_data, current_data):
        """
        Detect competitor stockouts, price changes, etc.
        """
        alerts = []

        for competitor in current_data:
            asin = competitor["asin"]

            # Check for stockouts
            if not competitor["in_stock"]:
                alerts.append({
                    "type": "competitor_stockout",
                    "asin": asin,
                    "message": f"Competitor {asin} is out of stock"
                })

            # Check for significant price changes
            hist_prices = historical_data[
                historical_data["asin"] == asin
            ]["price"].values

            if len(hist_prices) > 0:
                avg_price = np.mean(hist_prices)
                current_price = competitor["price"]

                if current_price < avg_price * (1 - self.threshold):
                    alerts.append({
                        "type": "competitor_price_drop",
                        "asin": asin,
                        "previous_avg": avg_price,
                        "current": current_price,
                        "change_pct": (avg_price - current_price) / avg_price * 100
                    })

        return alerts

# Usage
detector = TrafficAnomalyDetector(threshold=0.2)

# Load historical data
historical_rankings = pd.read_csv("historical_rankings.csv")
historical_competitors = pd.read_csv("historical_competitors.csv")

# Check for anomalies
ranking_alerts = detector.detect_ranking_changes(
    historical_rankings, 
    current_rankings
)

competitor_alerts = detector.detect_competitor_changes(
    historical_competitors,
    current_competitor_data
)

# Send notifications
for alert in ranking_alerts + competitor_alerts:
    send_notification(alert)  # Email, Slack, etc.
Enter fullscreen mode Exit fullscreen mode

Step 4: Traffic Source Attribution Logic

Now combine all data sources to attribute traffic:

def attribute_traffic_source(
    sales_change,
    ranking_changes,
    competitor_changes,
    ad_data
):
    """
    Determine most likely traffic source for sales changes
    """
    attribution = {
        "primary_source": None,
        "contributing_factors": [],
        "confidence": 0
    }

    # Check advertising data first
    ad_order_change = ad_data["current_orders"] - ad_data["avg_orders"]

    if ad_order_change / sales_change > 0.7:
        # 70%+ of sales increase from ads
        attribution["primary_source"] = "paid_advertising"
        attribution["confidence"] = 0.9
        return attribution

    # Check for ranking improvements
    significant_ranking_improvements = [
        r for r in ranking_changes 
        if r["type"] == "ranking_improvement" and r["change"] > 5
    ]

    if significant_ranking_improvements:
        attribution["primary_source"] = "organic_ranking_improvement"
        attribution["contributing_factors"] = [
            f"{r['keyword']}: {r['previous_avg']}{r['current']}"
            for r in significant_ranking_improvements
        ]
        attribution["confidence"] = 0.85
        return attribution

    # Check for competitor stockouts
    stockouts = [
        c for c in competitor_changes 
        if c["type"] == "competitor_stockout"
    ]

    if stockouts:
        attribution["primary_source"] = "competitor_stockout"
        attribution["contributing_factors"] = [
            c["asin"] for c in stockouts
        ]
        attribution["confidence"] = 0.75
        return attribution

    # If no clear source, likely external traffic
    attribution["primary_source"] = "external_or_unknown"
    attribution["confidence"] = 0.5

    return attribution
Enter fullscreen mode Exit fullscreen mode

Real-World Case Study

Let's return to the silicone baking mat example. Here's what the data revealed:

Data collected:

  • Keyword rankings for 10 core keywords (daily)
  • Competitor data for 5 main competitors (daily)
  • Advertising data from Amazon Ads API

Analysis results:

# Ranking change detected
{
    "keyword": "silicone baking mat",
    "previous_avg_position": 23,
    "current_position": 8,
    "change": +15 positions,
    "date": "2026-01-23"
}

# Competitor change detected
{
    "competitor_asin": "B07ABC1234",
    "status": "out_of_stock",
    "previous_position": 5,
    "date": "2026-01-22"
}

# Attribution result
{
    "primary_source": "organic_ranking_improvement",
    "contributing_factors": [
        "Ranking jump for 'silicone baking mat' from #23 to #8",
        "Competitor B07ABC1234 stockout"
    ],
    "confidence": 0.9
}
Enter fullscreen mode Exit fullscreen mode

Root cause: A Facebook promotion one week earlier generated 50 concentrated orders, improving sales velocity and conversion rate, which triggered Amazon's algorithm to boost organic rankings.

Action taken:

  1. Increased ad budget to consolidate ranking position
  2. Accelerated inventory replenishment
  3. Result: Sustained 30+ daily orders (50% improvement)

Why Pangolinfo for Amazon Data Collection

After testing multiple solutions, here's why Pangolinfo Scrape API stands out:

1. High Accuracy for Sponsored Ads

  • 98% success rate for SP ad position scraping
  • Critical for understanding competitive advertising landscape

2. Real-Time Data

  • No estimation models—actual scraped data
  • Minute-level updates available

3. Comprehensive Coverage

  • Search results (organic + sponsored)
  • Product details
  • Best Sellers rankings
  • Reviews
  • Competitor data

4. Developer-Friendly

  • RESTful API with clear documentation
  • Multiple output formats (JSON, CSV, HTML)
  • Webhook support for real-time alerts

5. Flexible Pricing

  • Pay per request (no unused features)
  • Scales with your needs
  • Much more cost-effective than $3,588/year SaaS tools

Check out the API documentation for implementation details.


Best Practices

1. Start Simple, Scale Gradually

  • Begin with 5-10 core keywords
  • Add more metrics as you understand patterns

2. Automate Everything

  • Manual data collection doesn't scale
  • Set up scheduled jobs from day one

3. Focus on Actionable Metrics

  • Don't just collect data—define what actions each metric triggers
  • Example: Ranking drop > 10 positions → increase ad budget

4. Combine Multiple Data Sources

  • Amazon Ads API for advertising data
  • Pangolinfo for organic rankings and competitor data
  • Amazon Attribution for external traffic

5. Build Historical Context

  • Traffic analysis requires time-series data
  • Collect data for at least 2-4 weeks before drawing conclusions

Conclusion

Amazon's lack of transparent traffic attribution doesn't have to be a black box. By building a systematic monitoring system with automated data collection, anomaly detection, and attribution logic, you can:

  • Quickly identify traffic source changes
  • Respond to opportunities (ranking improvements, competitor stockouts)
  • Optimize advertising spend based on actual data
  • Make data-driven decisions instead of guessing

The technical implementation is straightforward—the real value comes from consistent execution and acting on insights.

Resources:


Discussion

Have you built your own Amazon traffic monitoring system? What challenges did you face? Share your experiences in the comments!

Tags: #api #python #ecommerce #amazon #dataanalysis #automation

Top comments (0)