How to Use Ecommerce Product Datasets for Competitor Pricing Analysis at Scale

#data #analytics

In Southeast Asia’s ecommerce market, monitoring competitor pricing on Shopee, Lazada, and TikTok Shop demands stable, scalable data infrastructure. Rather than relying on fragile crawlers that often break, businesses now use standardized ecommerce product datasets to run pricing analysis across millions of SKUs.

In this article, Easy Data shows how to design a production-ready data schema and apply Python-based processing to turn ecommerce datasets into actionable pricing intelligence.

What Should a Standard Ecommerce Product Dataset for Pricing Analysis Include?

When working with ecommerce product datasets from Southeast Asian marketplaces, technical teams often run into one major obstacle: inconsistent unstructured text data.

For dynamic pricing engines and machine learning models to consume data efficiently, the input schema must be standardized and cleaned directly at the extraction pipeline layer.

Below is the standardized data schema that Easy Data has optimized and deployed across large-scale ecommerce data pipelines in Singapore, Thailand, and Vietnam.

Field Name	Data Type	Functional Description	Technical Utility for Pricing Engines
`category`	VARCHAR	Multi-level product taxonomy (e.g. Beauty > Skin Care > Sunscreen)	Enables clustering and accurate benchmark pricing segmentation
`price`	NUMERIC	Actual selling price after visible vouchers and discounts	Core variable for realtime Price Index calculations
`price_before_discount`	NUMERIC	Original listed price before promotions	Detects anchor pricing strategies and competitor discount depth
`discountPercent`	INTEGER	Discount percentage displayed on the marketplace listing	Measures promotion frequency and average markdown intensity
`price_min`	NUMERIC	Lowest recorded price of the SKU within a time window	Detects price floors and abnormal pricing glitches
`price_max`	NUMERIC	Highest recorded price of the SKU within a time window	Identifies pricing ceilings and market fluctuation ranges
`historySold`	BIGINT	Cumulative sales volume since SKU creation	Used as a weighting factor for revenue share and price elasticity analysis

Using a unified schema like this eliminates noisy unstructured fields and turns every row in an ecommerce product dataset into high‑value intelligence, ready for enterprise BI platforms such as Power BI and Tableau.

How Businesses Use Ecommerce Product Datasets for Competitor Pricing Analysis

Transforming a raw ecommerce product dataset into actionable insight requires much more than simply looking at competitor prices. Market research analysts typically combine pricing data with statistical modeling and pricing algorithms to uncover the hidden strategies behind competitor behavior.

Dynamic Price Positioning Matrix

Businesses cannot build an effective pricing strategy without understanding where they sit within the broader pricing landscape of their category. By extracting fields such as price, price_min, and price_max, the system can automatically calculate a product’s Price Index (PI) against the median market price of competitors within the same category.

Price Index Formula:

Applying ecommerce product datasets into dynamic price positioning analysis allows businesses to automate pricing thresholds and trigger strategic pricing reactions:

PI > 110 (Premium Segment):Products are priced above market average → The system flags opportunities to improve brand positioning, packaging value, or bundled offers.
90 ≤ PI ≤ 110 (Parity Segment):Market-level pricing parity → Maintain pricing while optimizing advertising efficiency through vouchers and keyword bidding.
PI < 90 (Penetration Segment): Aggressive market penetration pricing → Evaluate whether margins are being negatively affected and adjust pricing accordingly.

Price Elasticity & Sales Velocity Modeling

This model helps brands solve one of the most important profitability questions: If product pricing changes, will order volume increase enough to compensate and maximize total profit margin?

When feeding an ecommerce product dataset into regression models, data scientists typically combine changes in the price variable with growth velocity from the historySold field over time to estimate price elasticity of demand (ϵ).

The system continuously scans competitor catalogs to identify sales spike patterns. As a result, businesses can detect the optimal discountPercent competitors commonly use to clear inventory or gain market share without damaging average basket value.

MAP Violation & Price Variance Analytics

For brands operating through multiple authorized distributors across Southeast Asian ecommerce marketplaces, unauthorized pricing below the minimum advertised price (MAP) can severely disrupt channel stability.

By continuously monitoring ecommerce product datasets in realtime, the analytics engine automatically compares each distributor’s price_min against the company’s official MAP policies. Through deeper discount-layer analysis using discountPercent, the system can accurately distinguish between:

Intentional distributor discounting
Temporary marketplace-sponsored promotions (e.g. Shopee or Lazada voucher subsidies)

This helps brand management teams to make far more accurate enforcement decisions.

Example: Processing Ecommerce Product Datasets Using Python from Easy Data

To accelerate implementation for data engineering teams, direct access to clean and production-ready processing logic is critical.

Below is a sample Python workflow using Pandas to clean, standardize, and extract the Top 10 products with the deepest discount intensity combined with high sales volume from a real ecommerce product dataset.

import pandas as pd

# Load ecommerce product dataset
# Example: Shopee Thailand category dataset
df = pd.read_csv("shopee_thailand_ecommerce_product_dataset.csv")

# 1. Data preprocessing layer
# Remove invalid or noisy records
df = df[(df['price'] > 0) & (df['historySold'] >= 0)]

# 2. Calculate absolute discount value
df['absolute_discount'] = (
    df['price_before_discount'] - df['price']
)

# 3. Filter products within the "Face Mask" category
# Sort by highest discount percentage and highest sales volume
target_category_analysis = (
    df[df['category'] == 'Face Mask']
    .sort_values(
        by=['discountPercent', 'historySold'],
        ascending=[False, False]
    )
)

# 4. Extract Top 10 competitor products
# following aggressive liquidation pricing strategies
top_10_competitors = target_category_analysis.head(10)

print(
    top_10_competitors[
        [
            'price',
            'price_before_discount',
            'discountPercent',
            'historySold'
        ]
    ]
)

When executed on a standardized ecommerce product dataset, this workflow performs efficiently at scale (approximately O(N log N) for sorting operations), letting enterprise data systems remain stable even when processing millions of records.

Final Thought

Building a realtime competitor pricing intelligence system at scale involves far more than simply writing crawlers or running scraping scripts. It requires a stable ecommerce product dataset infrastructure capable of automatically adapting its pipelines against constantly evolving anti-bot systems deployed by major Southeast Asian marketplaces.

To help businesses solve this infrastructure challenge, Easy Data provides standardized ecommerce product datasets tailored to each project’s preferred schema structure and update frequency.

Through a custom end-to-end ecommerce data scraping service, datasets are cleaned, normalized, and delivered directly into enterprise cloud storage or APIs. This helps data scientists and analysts to focus more on optimizing AI models and pricing strategies instead of maintaining fragile crawling infrastructure.