AliExpress is a goldmine for competitive pricing, but those prices are notoriously volatile. Between "Choice Day" deals, flash sales, and dynamic pricing based on inventory, a product that costs $20 today might be $12 tomorrow or $30 next week. For dropshippers or market analysts, manual price checking is a losing game.
Automating this process is the only way to stay competitive. By building a custom price monitor, you can track competitor SKUs daily, identify genuine discounts versus "fake" sales, and build a historical dataset for better purchasing decisions.
This guide demonstrates how to build a production-ready price monitoring pipeline using Python, Playwright, and the open-source Aliexpress.com-Scrapers repository. We will move from raw data extraction to a fully automated system that visualizes price trends over time.
Prerequisites
To follow along, you’ll need a few things set up in your environment:
- Python 3.8+ installed on your machine.
- A ScrapeOps API Key: AliExpress uses aggressive anti-bot measures. We will use the ScrapeOps Proxy to bypass CAPTCHAs and blocks. You can get a free API key here.
- Basic Python Knowledge: You should be comfortable with lists, dictionaries, and installing packages via
pip.
Setup: Cloning the Scraper Engine
Rather than writing a scraper from scratch, we can use the Aliexpress.com-Scrapers repository. This repo contains optimized selectors and logic for handling the complex AliExpress DOM.
First, clone the repository and navigate to the Playwright implementation:
git clone https://github.com/scraper-bank/Aliexpress.com-Scrapers.git
cd Aliexpress.com-Scrapers/python/playwright/product_data
Next, install the required dependencies:
pip install playwright playwright-stealth pandas matplotlib schedule
playwright install chromium
Step 1: Understanding the Scraper Logic
Before modifying the code, let's look at how the core engine works. Inside scraper/aliexpress_scraper_product_data_v1.py, there is a ScrapedData dataclass. This acts as the blueprint for the extracted information:
@dataclass
class ScrapedData:
name: str = ""
price: Optional[Any] = None
preDiscountPrice: Optional[Any] = None
productId: str = ""
url: str = ""
# ... other fields
The script uses Playwright because AliExpress relies heavily on JavaScript to render prices. A simple requests call often returns empty tags, but Playwright loads the full browser environment. It also integrates with the ScrapeOps Proxy to rotate residential IPs, which prevents the monitor from being blacklisted during extended operation.
Step 2: Customizing for Price Monitoring
To turn this from a one-time scraper into a monitor, we need to add timestamps and persistent storage. We want to append every new price check to a single file rather than overwriting it.
Modify the ScrapedData class and the DataPipeline in a new file called monitor_engine.py:
from datetime import datetime
from dataclasses import dataclass, asdict, field
import json
@dataclass
class ScrapedData:
name: str = ""
price: float = 0.0
preDiscountPrice: float = 0.0
productId: str = ""
url: str = ""
# Add a timestamp so we can plot data over time
scraped_at: str = field(default_factory=lambda: datetime.now().isoformat())
class PriceHistoryPipeline:
def __init__(self, filename="price_history.jsonl"):
self.filename = filename
def save(self, data: ScrapedData):
with open(self.filename, "a", encoding="utf-8") as f:
f.write(json.dumps(asdict(data)) + "\n")
Using JSONL (JSON Lines) format ensures the data is streamable. If you track 100 products for a year, the file will grow large. JSONL allows you to read it line-by-line without loading the entire history into memory.
Step 3: Automating the Scan
Now we need a wrapper script to run the scraper against a specific list of competitor URLs at a set interval. This example uses the schedule library to run a scan every day at 9:00 AM.
Create a file named run_monitor.py:
import schedule
import time
import asyncio
from datetime import datetime
from playwright.async_api import async_playwright
from scraper.aliexpress_scraper_product_data_v1 import extract_data
# The URLs you want to track
COMPETITOR_URLS = [
"https://www.aliexpress.com/item/100500123.html",
"https://www.aliexpress.com/item/100500456.html"
]
async def track_prices():
print(f"Starting price check at {datetime.now()}")
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
pipeline = PriceHistoryPipeline()
for url in COMPETITOR_URLS:
try:
# Use the extraction logic from the repo
data = await extract_data(page, url)
if data:
pipeline.save(data)
print(f"Tracked: {data.name[:30]}... - ${data.price}")
except Exception as e:
print(f"Error tracking {url}: {e}")
await browser.close()
# Schedule the job
schedule.every().day.at("09:00").do(lambda: asyncio.run(track_prices()))
while True:
schedule.run_pending()
time.sleep(60)
Step 4: Visualizing the Data
Raw JSON files are useful for storage, but trends are easier to spot visually. Use Pandas to clean the data and Matplotlib to generate a price trend graph.
The scraper often returns prices as strings, such as "$12.50," so these must be cleaned before plotting.
import pandas as pd
import matplotlib.pyplot as plt
def plot_trends():
# Load the JSONL file
df = pd.read_json('price_history.jsonl', lines=True)
# Convert scraped_at to datetime objects
df['scraped_at'] = pd.to_datetime(df['scraped_at'])
# Ensure price is a float (remove currency symbols if present)
if df['price'].dtype == object:
df['price'] = df['price'].str.replace(r'[^\d.]', '', regex=True).astype(float)
# Pivot data: Rows = Date, Columns = Product Name, Values = Price
pivot_df = df.pivot(index='scraped_at', columns='name', values='price')
# Plotting
pivot_df.plot(figsize=(10, 6), marker='o')
plt.title('AliExpress Competitor Price Trends')
plt.ylabel('Price (USD)')
plt.xlabel('Date')
plt.grid(True)
plt.legend(loc='upper left', prop={'size': 8})
plt.savefig('price_trends.png')
plt.show()
if __name__ == "__main__":
plot_trends()
This script generates a multi-line chart where each line represents a product. This makes it easy to see exactly when a competitor lowered their price or when a "limited time" sale actually ended.
Common Obstacles
When monitoring AliExpress at scale, you will likely encounter these three issues:
- Selector Drift: AliExpress frequently updates its CSS classes, changing
.product-priceto something like.price--currentPrice--123. If the scraper returnsNone, inspect the page in your browser and update the selectors inextract_data. - Currency Mismatches: Depending on the proxy location, AliExpress might show prices in EUR or GBP. Force the currency by appending
?cur=USDto your target URLs inCOMPETITOR_URLS. - Bot Detection: If failure rates increase, check the ScrapeOps dashboard. You may need to enable
render_js: Truein your proxy settings to handle advanced "Slide to Verify" CAPTCHAs.
To Wrap Up
Building a custom price tracker provides a data-driven edge in the e-commerce market. By combining the extraction logic of the Aliexpress.com-Scrapers repository with a simple automation loop, you can move from manual guesswork to automated intelligence.
Key Takeaways:
- Playwright is necessary for handling AliExpress's dynamic JavaScript rendering.
- JSONL is a reliable format for long-term time-series data storage.
- Proxies are required to avoid persistent blocking; residential rotation is the recommended approach.
As a next step, consider adding a notification layer using smtplib or a Discord Webhook. Setting a threshold, such as "Alert me if Product A drops below $15," makes the monitor truly proactive. For more advanced scraping patterns, explore the other implementations in the ScrapeOps Scraper Bank.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.