Last Black Friday, I watched my mom excitedly show me a "70% off" gaming laptop deal. The original price? $1,299. Sale price? $899. Seemed legit until I checked the price history—that laptop had been $899 for the past 6 months. The "original price" was completely fabricated.
That moment sparked something. At Avluz.com, we track prices across 10,000+ products from Amazon, eBay, and Walmart. We had the data. We had the problem. We just needed to build something that could catch these scams automatically.
Thirty days later, our AI-powered fake deal detector had flagged 2,347 suspicious "deals" and saved our users an estimated $47,000 in avoided bad purchases.
Here's exactly how we built it, including the mistakes that almost derailed the entire project.
The $5,000 Mistake That Taught Us Everything
Our first attempt was a disaster. I spent three weeks building a rule-based system with hardcoded thresholds:
# DON'T DO THIS
def is_fake_deal(current_price, original_price, avg_price):
discount = (original_price - current_price) / original_price
if discount > 0.5: # More than 50% off? Suspicious!
return True
if original_price > avg_price * 1.3: # Inflated by 30%? Flag it!
return True
return False
The problem? E-commerce pricing is way more nuanced than simple rules can handle. We had:
- 67% false positives (flagging legitimate deals)
- 8.5 seconds average processing time
- Angry users complaining about missed deals
- One very frustrated engineer (me)
I was ready to scrap the whole thing until our data scientist suggested: "What if we let the machine figure out the patterns?"
That conversation changed everything.
Why AI Actually Makes Sense Here
Most blog posts jump straight to "use AI!" without explaining why. Here's the reality: fake deals aren't just about simple math. Retailers employ sophisticated pricing psychology:
- Pre-inflation strategy: Raise prices 2-3 weeks before a sale
- Anchor pricing: Show an inflated "compare at" price
- Flash sale tactics: Create urgency with fake scarcity
- Cross-platform games: Different "original prices" on different sites
- Dynamic pricing: Constant micro-adjustments that hide patterns
Traditional rules can't adapt to these evolving tactics. Machine learning can identify patterns we humans would never spot—like how certain sellers always inflate prices exactly 47% before Prime Day, or how "limited time" deals repeat every 12 days.
This approach now powers our real-time price comparison engine at Avluz.com, processing 2.4 million price checks daily.
The Architecture: How It Actually Works
Our system has five main components:
1. Price Scraper (The Data Collector)
import asyncio
from aiohttp import ClientSession
from bs4 import BeautifulSoup
class PriceScraper:
def __init__(self, redis_client):
self.redis = redis_client
self.session = None
async def scrape_product(self, url, retailer):
"""Async scraping with rate limiting"""
# Check Redis cache first (60s TTL)
cached = await self.redis.get(f"price:{url}")
if cached:
return json.loads(cached)
async with self.session.get(url) as response:
html = await response.text()
price_data = self.parse_price(html, retailer)
# Cache for next request
await self.redis.setex(
f"price:{url}",
60,
json.dumps(price_data)
)
return price_data
def parse_price(self, html, retailer):
"""Retailer-specific parsing logic"""
soup = BeautifulSoup(html, 'lxml')
if retailer == 'amazon':
current = soup.select_one('.a-price-whole')
original = soup.select_one('.a-text-price')
elif retailer == 'walmart':
current = soup.select_one('[itemprop="price"]')
original = soup.select_one('.was-price')
return {
'current_price': self.clean_price(current),
'original_price': self.clean_price(original),
'timestamp': time.time()
}
Key lessons:
- Use Redis caching to avoid hammering retailer APIs
- Async/await for parallel scraping (went from 45s to 2.3s for 100 products)
- Retailer-specific parsers (each site has different HTML structures)
2. Historical Data Store (MongoDB)
We store 90 days of price history for every product. The schema is surprisingly simple:
{
'_id': ObjectId('...'),
'asin': 'B08N5WRWNW', # Amazon Standard ID
'retailer': 'amazon',
'price_history': [
{'price': 149.99, 'date': '2025-11-01', 'on_sale': False},
{'price': 149.99, 'date': '2025-11-02', 'on_sale': False},
{'price': 299.99, 'date': '2025-11-24', 'on_sale': False},
{'price': 149.99, 'date': '2025-11-25', 'on_sale': True, 'claimed_original': 299.99}
],
'stats': {
'avg_price': 153.47,
'min_price': 142.00,
'max_price': 299.99, # Suspicious spike!
'std_dev': 8.32
}
}
The claimed_original field is crucial—it lets us compare what retailers claim the "original price" was versus what we actually observed.
3. Feature Engineering (The Secret Sauce)
This is where most tutorials stop, but feature engineering is where the magic happens. Here's what we feed into the model:
def extract_features(product_data):
"""Convert raw price data into ML features"""
history = product_data['price_history']
current = history[-1]
features = {
# Basic discount metrics
'claimed_discount_pct': calculate_discount(
current['claimed_original'],
current['price']
),
'true_discount_pct': calculate_discount(
product_data['stats']['avg_price'],
current['price']
),
# Historical context (last 90 days)
'price_volatility': product_data['stats']['std_dev'],
'days_since_last_sale': days_since_last_sale(history),
'sale_frequency': count_sales(history) / 90,
# Red flags
'price_spike_before_sale': detect_pre_sale_inflation(history),
'claimed_vs_observed_ratio': (
current['claimed_original'] / product_data['stats']['max_price']
),
'is_round_number': current['claimed_original'] % 100 == 0,
# Temporal patterns
'is_major_sale_event': is_prime_day_or_black_friday(),
'day_of_week': current['date'].weekday(),
'hour_of_day': current['timestamp'].hour,
# Retailer behavior
'retailer_avg_markup': get_retailer_stats(product_data['retailer']),
'seller_reputation_score': get_seller_Score(product_data['seller_id'])
}
return features
def detect_pre_sale_inflation(history):
"""Check if price was artificially raised before sale"""
if len(history) < 30:
return False
# Compare last 7 days before sale to previous 30 days
recent_avg = np.mean([p['price'] for p in history[-7:-1]])
baseline_avg = np.mean([p['price'] for p in history[-37:-7]])
# If recent avg is 20%+ higher, that's suspicious
return recent_avg > baseline_avg * 1.20
The breakthrough insight: It's not just about the discount percentage. It's about the pattern leading up to the sale. Legitimate deals show consistent pricing before the discount. Fake deals show sudden price spikes right before the "sale."
4. The ML Model (Simpler Than You Think)
After testing Random Forests, XGBoost, and even a neural network, we settled on Gradient Boosting for its interpretability and performance:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
import numpy as np
class FakeDealDetector:
def __init__(self):
self.model = GradientBoostingClassifier(
n_estimators=200,
learning_rate=0.1,
max_depth=5,
random_state=42
)
self.scaler = StandardScaler()
def train(self, features, labels):
"""Train on historical labeled data"""
X_train, X_test, y_train, y_test = train_test_split(
features, labels, test_size=0.2, random_state=42
)
# Normalize features
X_train_scaled = self.scaler.fit_transform(X_train)
X_test_scaled = self.scaler.transform(X_test)
# Train model
self.model.fit(X_train_scaled, y_train)
# Evaluate
train_score = self.model.score(X_train_scaled, y_train)
test_score = self.model.score(X_test_scaled, y_test)
print(f"Training accuracy: {train_score:.2%}")
print(f"Test accuracy: {test_score:.2%}")
return self.model
def predict_fake_probability(self, product_features):
"""Return probability that a deal is fake (0-1)"""
features_scaled = self.scaler.transform([product_features])
probabilities = self.model.predict_proba(features_scaled)
return probabilities[0][1] # Probability of class 1 (fake)
def get_feature_importance(self):
"""Which features matter most?"""
importance = self.model.feature_importances_
return sorted(
zip(self.feature_names, importance),
key=lambda x: x[1],
reverse=True
)
Training data came from:
- 8,400 manually labeled deals (me + 2 colleagues, 3 weeks of work)
- Historical data where we caught obvious fakes (price = "original price")
- User reports of suspicious deals
- Competitor sites like CamelCamelCamel for validation
Model performance:
- 94% accuracy on test set
- 1.8 seconds average inference time (down from 8.5s)
- 12% false positive rate (down from 67%)
5. Real-Time Alert System
The final piece: alerting users when we detect something fishy.
class AlertSystem:
def __init__(self, detector, redis_client):
self.detector = detector
self.redis = redis_client
async def check_deal(self, product_id, user_id):
"""Check if a deal is legitimate"""
# Get product data
product = await get_product_data(product_id)
features = extract_features(product)
# Get fake probability
fake_prob = self.detector.predict_fake_probability(features)
# Thresholds based on user preferences
if fake_prob > 0.85:
await self.send_alert(
user_id,
product_id,
severity='high',
message=f"⚠️ This deal looks suspicious ({fake_prob:.0%} confidence)"
)
elif fake_prob > 0.60:
await self.send_alert(
user_id,
product_id,
severity='medium',
message=f"🤔 This deal might be inflated (check price history)"
)
# Log for monitoring
await self.redis.lpush(
'detections',
json.dumps({
'product_id': product_id,
'fake_probability': fake_prob,
'timestamp': time.time()
})
)
The Results (And What We Learned)
First 30 days:
- 2,347 fake deals detected across Amazon, eBay, and Walmart
- $47,000 estimated savings for users who avoided bad purchases
- 94% accuracy rate confirmed through user feedback
- 1.8 seconds average processing time per product
Most surprising finding: 34% of "deals" during Black Friday weekend had artificially inflated "original prices." The most common tactic? Raising the price by exactly 49% two weeks before the sale, then advertising a "50% off" discount.
Platform breakdown:
- Amazon: 18% of sales had inflated originals
- eBay: 41% (worse because of individual sellers)
- Walmart: 22%
What I'd Do Differently
1. Start with More Training Data
8,400 labeled examples wasn't enough initially. We should have used semi-supervised learning to bootstrap from unlabeled data.
2. Build Interpretability From Day One
Users don't just want a "fake" flag—they want to know why. We added explanations later:
def explain_detection(product_id, fake_probability):
"""Generate human-readable explanation"""
features = extract_features(get_product_data(product_id))
importance = detector.get_feature_importance()
reasons = []
if features['price_spike_before_sale']:
reasons.append("Price was raised 47% two weeks ago")
if features['claimed_vs_observed_ratio'] > 1.5:
reasons.append(
f"'Original price' is {features['claimed_vs_observed_ratio']:.0%} "
"higher than we've ever seen"
)
return {
'fake_probability': fake_probability,
'reasons': reasons,
'recommendation': 'Wait for a better deal' if fake_probability > 0.7 else 'Probably okay'
}
3. Monitor Retailer-Specific Patterns
Different retailers have different pricing behaviors. We should have trained separate models or added retailer embeddings.
The Tech Stack
Backend:
- Python 3.11 + FastAPI
- MongoDB for price history
- Redis for caching
- TensorFlow/Scikit-learn for ML
Scraping:
- Beautiful Soup + lxml
- Playwright for JavaScript-heavy sites
- Rotating proxies (Bright Data)
Deployment:
- AWS ECS (containerized)
- CloudWatch for monitoring
- S3 for model artifacts
Frontend (React):
// Price history chart component
import { LineChart } from 'recharts';
function PriceHistory({ priceData, suspiciousFlag }) {
return (
<div>
<LineChart data={priceData} width={600} height={300}>
<Line dataKey="price" stroke="#8884d8" />
{suspiciousFlag && (
<ReferenceLine
x={suspiciousFlag.date}
stroke="red"
label="Suspicious price spike"
/>
)}
</LineChart>
</div>
);
}
Understanding Price Manipulation Tactics
After analyzing millions of price points, we identified five common manipulation patterns:
- The Ramp-Up: Gradually increase price over 2-3 weeks, then "discount" back to normal
- The Anchor: Show an inflated "compare at" price that never actually existed
- The Rotation: Cycle between "sale" and "regular" every 10-14 days
- The Platform Arbitrage: Different "original prices" on Amazon vs. eBay vs. own website
- The Flash Fake: Create urgency with countdown timers on permanently available deals
Try It Yourself
Want to build your own version? Here's a simplified starter:
# Minimal fake deal detector
import requests
from datetime import datetime, timedelta
def simple_fake_detector(product_url, days_to_check=30):
"""Basic version you can build in a weekend"""
# 1. Scrape current price
current_data = scrape_price(product_url)
# 2. Get historical data (use Keepa API or similar)
history = get_price_history(product_url, days=days_to_check)
# 3. Calculate statistics
avg_price = sum(h['price'] for h in history) / len(history)
max_price = max(h['price'] for h in history)
# 4. Check for red flags
claimed_original = current_data.get('original_price', 0)
current_price = current_data['price']
red_flags = []
if claimed_original > max_price * 1.3:
red_flags.append("Original price never observed in 30-day history")
if claimed_original == current_price:
red_flags.append("No actual discount")
recent_prices = [h['price'] for h in history[-7:]]
if max(recent_prices) > avg_price * 1.2:
red_flags.append("Price was recently inflated")
return {
'is_suspicious': len(red_flags) > 0,
'red_flags': red_flags,
'confidence': len(red_flags) / 3 # Simple confidence score
}
Full code with training data available on GitHub (⭐ if you find it useful!).
What's Next?
We're currently working on:
- Browser extension for real-time alerts while shopping
- Multi-language support (expanding beyond US retailers)
- Community reporting to improve training data
- API access for other developers
See this technology in action on our deal tracking dashboard at Avluz.com.
Final Thoughts
Building this taught me that AI isn't about replacing human judgment—it's about scaling pattern recognition beyond what we can manually track. Could I spot one fake deal? Sure. Can I check 10,000 products every hour? Not a chance.
The real value isn't in catching the obvious scams. It's in identifying the subtle patterns that even experienced shoppers miss: the 2-week pre-inflation strategy, the cross-platform price discrepancies, the suspiciously round "original prices."
If you're thinking about building something similar, my advice: Start simple, but start with real data. Don't waste time building a complex ML pipeline until you've manually labeled a few hundred examples and understand what patterns you're actually looking for.
And most importantly: Your users care more about accurate alerts than fancy algorithms. A simple rule-based system that works is better than a neural network that doesn't.
Have questions about the implementation? Drop them in the comments. I'll answer everything I can without revealing our complete secret sauce. 😉
Resources:
- Scikit-learn Documentation
- MongoDB Time Series Collections
- Price Tracking APIs Comparison
- Keepa API for Amazon Price History
- Beautiful Soup Web Scraping Guide
Related Reading:
Written by a senior engineer at Avluz.com. We're hiring! Check out our careers page.
Top comments (0)