DEV Community

Cover image for How to Build an AliExpress Price Tracker with Python and Playwright
Robert N. Gutierrez
Robert N. Gutierrez

Posted on

How to Build an AliExpress Price Tracker with Python and Playwright

AliExpress is a goldmine for competitive pricing, but those prices are notoriously volatile. Between "Choice Day" deals, flash sales, and dynamic pricing based on inventory, a product that costs $20 today might be $12 tomorrow or $30 next week. For dropshippers or market analysts, manual price checking is a losing game.

Automating this process is the only way to stay competitive. By building a custom price monitor, you can track competitor SKUs daily, identify genuine discounts versus "fake" sales, and build a historical dataset for better purchasing decisions.

This guide demonstrates how to build a production-ready price monitoring pipeline using Python, Playwright, and the open-source Aliexpress.com-Scrapers repository. We will move from raw data extraction to a fully automated system that visualizes price trends over time.

Prerequisites

To follow along, you’ll need a few things set up in your environment:

  • Python 3.8+ installed on your machine.
  • A ScrapeOps API Key: AliExpress uses aggressive anti-bot measures. We will use the ScrapeOps Proxy to bypass CAPTCHAs and blocks. You can get a free API key here.
  • Basic Python Knowledge: You should be comfortable with lists, dictionaries, and installing packages via pip.

Setup: Cloning the Scraper Engine

Rather than writing a scraper from scratch, we can use the Aliexpress.com-Scrapers repository. This repo contains optimized selectors and logic for handling the complex AliExpress DOM.

First, clone the repository and navigate to the Playwright implementation:

git clone https://github.com/scraper-bank/Aliexpress.com-Scrapers.git
cd Aliexpress.com-Scrapers/python/playwright/product_data
Enter fullscreen mode Exit fullscreen mode

Next, install the required dependencies:

pip install playwright playwright-stealth pandas matplotlib schedule
playwright install chromium
Enter fullscreen mode Exit fullscreen mode

Step 1: Understanding the Scraper Logic

Before modifying the code, let's look at how the core engine works. Inside scraper/aliexpress_scraper_product_data_v1.py, there is a ScrapedData dataclass. This acts as the blueprint for the extracted information:

@dataclass
class ScrapedData:
    name: str = ""
    price: Optional[Any] = None
    preDiscountPrice: Optional[Any] = None
    productId: str = ""
    url: str = ""
    # ... other fields
Enter fullscreen mode Exit fullscreen mode

The script uses Playwright because AliExpress relies heavily on JavaScript to render prices. A simple requests call often returns empty tags, but Playwright loads the full browser environment. It also integrates with the ScrapeOps Proxy to rotate residential IPs, which prevents the monitor from being blacklisted during extended operation.

Step 2: Customizing for Price Monitoring

To turn this from a one-time scraper into a monitor, we need to add timestamps and persistent storage. We want to append every new price check to a single file rather than overwriting it.

Modify the ScrapedData class and the DataPipeline in a new file called monitor_engine.py:

from datetime import datetime
from dataclasses import dataclass, asdict, field
import json

@dataclass
class ScrapedData:
    name: str = ""
    price: float = 0.0
    preDiscountPrice: float = 0.0
    productId: str = ""
    url: str = ""
    # Add a timestamp so we can plot data over time
    scraped_at: str = field(default_factory=lambda: datetime.now().isoformat())

class PriceHistoryPipeline:
    def __init__(self, filename="price_history.jsonl"):
        self.filename = filename

    def save(self, data: ScrapedData):
        with open(self.filename, "a", encoding="utf-8") as f:
            f.write(json.dumps(asdict(data)) + "\n")
Enter fullscreen mode Exit fullscreen mode

Using JSONL (JSON Lines) format ensures the data is streamable. If you track 100 products for a year, the file will grow large. JSONL allows you to read it line-by-line without loading the entire history into memory.

Step 3: Automating the Scan

Now we need a wrapper script to run the scraper against a specific list of competitor URLs at a set interval. This example uses the schedule library to run a scan every day at 9:00 AM.

Create a file named run_monitor.py:

import schedule
import time
import asyncio
from datetime import datetime
from playwright.async_api import async_playwright
from scraper.aliexpress_scraper_product_data_v1 import extract_data

# The URLs you want to track
COMPETITOR_URLS = [
    "https://www.aliexpress.com/item/100500123.html",
    "https://www.aliexpress.com/item/100500456.html"
]

async def track_prices():
    print(f"Starting price check at {datetime.now()}")
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        pipeline = PriceHistoryPipeline()

        for url in COMPETITOR_URLS:
            try:
                # Use the extraction logic from the repo
                data = await extract_data(page, url)
                if data:
                    pipeline.save(data)
                    print(f"Tracked: {data.name[:30]}... - ${data.price}")
            except Exception as e:
                print(f"Error tracking {url}: {e}")

        await browser.close()

# Schedule the job
schedule.every().day.at("09:00").do(lambda: asyncio.run(track_prices()))

while True:
    schedule.run_pending()
    time.sleep(60)
Enter fullscreen mode Exit fullscreen mode

Step 4: Visualizing the Data

Raw JSON files are useful for storage, but trends are easier to spot visually. Use Pandas to clean the data and Matplotlib to generate a price trend graph.

The scraper often returns prices as strings, such as "$12.50," so these must be cleaned before plotting.

import pandas as pd
import matplotlib.pyplot as plt

def plot_trends():
    # Load the JSONL file
    df = pd.read_json('price_history.jsonl', lines=True)

    # Convert scraped_at to datetime objects
    df['scraped_at'] = pd.to_datetime(df['scraped_at'])

    # Ensure price is a float (remove currency symbols if present)
    if df['price'].dtype == object:
        df['price'] = df['price'].str.replace(r'[^\d.]', '', regex=True).astype(float)

    # Pivot data: Rows = Date, Columns = Product Name, Values = Price
    pivot_df = df.pivot(index='scraped_at', columns='name', values='price')

    # Plotting
    pivot_df.plot(figsize=(10, 6), marker='o')
    plt.title('AliExpress Competitor Price Trends')
    plt.ylabel('Price (USD)')
    plt.xlabel('Date')
    plt.grid(True)
    plt.legend(loc='upper left', prop={'size': 8})
    plt.savefig('price_trends.png')
    plt.show()

if __name__ == "__main__":
    plot_trends()
Enter fullscreen mode Exit fullscreen mode

This script generates a multi-line chart where each line represents a product. This makes it easy to see exactly when a competitor lowered their price or when a "limited time" sale actually ended.

Common Obstacles

When monitoring AliExpress at scale, you will likely encounter these three issues:

  1. Selector Drift: AliExpress frequently updates its CSS classes, changing .product-price to something like .price--currentPrice--123. If the scraper returns None, inspect the page in your browser and update the selectors in extract_data.
  2. Currency Mismatches: Depending on the proxy location, AliExpress might show prices in EUR or GBP. Force the currency by appending ?cur=USD to your target URLs in COMPETITOR_URLS.
  3. Bot Detection: If failure rates increase, check the ScrapeOps dashboard. You may need to enable render_js: True in your proxy settings to handle advanced "Slide to Verify" CAPTCHAs.

To Wrap Up

Building a custom price tracker provides a data-driven edge in the e-commerce market. By combining the extraction logic of the Aliexpress.com-Scrapers repository with a simple automation loop, you can move from manual guesswork to automated intelligence.

Key Takeaways:

  • Playwright is necessary for handling AliExpress's dynamic JavaScript rendering.
  • JSONL is a reliable format for long-term time-series data storage.
  • Proxies are required to avoid persistent blocking; residential rotation is the recommended approach.

As a next step, consider adding a notification layer using smtplib or a Discord Webhook. Setting a threshold, such as "Alert me if Product A drops below $15," makes the monitor truly proactive. For more advanced scraping patterns, explore the other implementations in the ScrapeOps Scraper Bank.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.