Milinda Biswas

Posted on Mar 1 • Originally published at avluz.com

I Built an AI-Powered Fake Deal Detector That Caught 2,347 Scams in 30 Days

#machinelearning #python #webdev #ecommerce

Last Black Friday, I watched my mom excitedly show me a "70% off" gaming laptop deal. The original price? $1,299. Sale price? $899. Seemed legit until I checked the price history—that laptop had been $899 for the past 6 months. The "original price" was completely fabricated.

That moment sparked something. At Avluz.com, we track prices across 10,000+ products from Amazon, eBay, and Walmart. We had the data. We had the problem. We just needed to build something that could catch these scams automatically.

Thirty days later, our AI-powered fake deal detector had flagged 2,347 suspicious "deals" and saved our users an estimated $47,000 in avoided bad purchases.

Here's exactly how we built it, including the mistakes that almost derailed the entire project.

The $5,000 Mistake That Taught Us Everything

Our first attempt was a disaster. I spent three weeks building a rule-based system with hardcoded thresholds:

# DON'T DO THIS
def is_fake_deal(current_price, original_price, avg_price):
    discount = (original_price - current_price) / original_price
    if discount > 0.5:  # More than 50% off? Suspicious!
        return True
    if original_price > avg_price * 1.3:  # Inflated by 30%? Flag it!
        return True
    return False

The problem? E-commerce pricing is way more nuanced than simple rules can handle. We had:

67% false positives (flagging legitimate deals)
8.5 seconds average processing time
Angry users complaining about missed deals
One very frustrated engineer (me)

I was ready to scrap the whole thing until our data scientist suggested: "What if we let the machine figure out the patterns?"

That conversation changed everything.

Why AI Actually Makes Sense Here

Most blog posts jump straight to "use AI!" without explaining why. Here's the reality: fake deals aren't just about simple math. Retailers employ sophisticated pricing psychology:

Pre-inflation strategy: Raise prices 2-3 weeks before a sale
Anchor pricing: Show an inflated "compare at" price
Flash sale tactics: Create urgency with fake scarcity
Cross-platform games: Different "original prices" on different sites
Dynamic pricing: Constant micro-adjustments that hide patterns

Traditional rules can't adapt to these evolving tactics. Machine learning can identify patterns we humans would never spot—like how certain sellers always inflate prices exactly 47% before Prime Day, or how "limited time" deals repeat every 12 days.

This approach now powers our real-time price comparison engine at Avluz.com, processing 2.4 million price checks daily.

The Architecture: How It Actually Works

Our system has five main components:

1. Price Scraper (The Data Collector)

import asyncio
from aiohttp import ClientSession
from bs4 import BeautifulSoup

class PriceScraper:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.session = None

    async def scrape_product(self, url, retailer):
        """Async scraping with rate limiting"""
        # Check Redis cache first (60s TTL)
        cached = await self.redis.get(f"price:{url}")
        if cached:
            return json.loads(cached)

        async with self.session.get(url) as response:
            html = await response.text()
            price_data = self.parse_price(html, retailer)

            # Cache for next request
            await self.redis.setex(
                f"price:{url}", 
                60, 
                json.dumps(price_data)
            )
            return price_data

    def parse_price(self, html, retailer):
        """Retailer-specific parsing logic"""
        soup = BeautifulSoup(html, 'lxml')

        if retailer == 'amazon':
            current = soup.select_one('.a-price-whole')
            original = soup.select_one('.a-text-price')
        elif retailer == 'walmart':
            current = soup.select_one('[itemprop="price"]')
            original = soup.select_one('.was-price')

        return {
            'current_price': self.clean_price(current),
            'original_price': self.clean_price(original),
            'timestamp': time.time()
        }

Key lessons:

Use Redis caching to avoid hammering retailer APIs
Async/await for parallel scraping (went from 45s to 2.3s for 100 products)
Retailer-specific parsers (each site has different HTML structures)

2. Historical Data Store (MongoDB)

We store 90 days of price history for every product. The schema is surprisingly simple:

{
    '_id': ObjectId('...'),
    'asin': 'B08N5WRWNW',  # Amazon Standard ID
    'retailer': 'amazon',
    'price_history': [
        {'price': 149.99, 'date': '2025-11-01', 'on_sale': False},
        {'price': 149.99, 'date': '2025-11-02', 'on_sale': False},
        {'price': 299.99, 'date': '2025-11-24', 'on_sale': False},
        {'price': 149.99, 'date': '2025-11-25', 'on_sale': True, 'claimed_original': 299.99}
    ],
    'stats': {
        'avg_price': 153.47,
        'min_price': 142.00,
        'max_price': 299.99,  # Suspicious spike!
        'std_dev': 8.32
    }
}

The claimed_original field is crucial—it lets us compare what retailers claim the "original price" was versus what we actually observed.

3. Feature Engineering (The Secret Sauce)

This is where most tutorials stop, but feature engineering is where the magic happens. Here's what we feed into the model:

def extract_features(product_data):
    """Convert raw price data into ML features"""
    history = product_data['price_history']
    current = history[-1]

    features = {
        # Basic discount metrics
        'claimed_discount_pct': calculate_discount(
            current['claimed_original'], 
            current['price']
        ),
        'true_discount_pct': calculate_discount(
            product_data['stats']['avg_price'],
            current['price']
        ),

        # Historical context (last 90 days)
        'price_volatility': product_data['stats']['std_dev'],
        'days_since_last_sale': days_since_last_sale(history),
        'sale_frequency': count_sales(history) / 90,

        # Red flags
        'price_spike_before_sale': detect_pre_sale_inflation(history),
        'claimed_vs_observed_ratio': (
            current['claimed_original'] / product_data['stats']['max_price']
        ),
        'is_round_number': current['claimed_original'] % 100 == 0,

        # Temporal patterns
        'is_major_sale_event': is_prime_day_or_black_friday(),
        'day_of_week': current['date'].weekday(),
        'hour_of_day': current['timestamp'].hour,

        # Retailer behavior
        'retailer_avg_markup': get_retailer_stats(product_data['retailer']),
        'seller_reputation_score': get_seller_Score(product_data['seller_id'])
    }

    return features

def detect_pre_sale_inflation(history):
    """Check if price was artificially raised before sale"""
    if len(history) < 30:
        return False

    # Compare last 7 days before sale to previous 30 days
    recent_avg = np.mean([p['price'] for p in history[-7:-1]])
    baseline_avg = np.mean([p['price'] for p in history[-37:-7]])

    # If recent avg is 20%+ higher, that's suspicious
    return recent_avg > baseline_avg * 1.20

The breakthrough insight: It's not just about the discount percentage. It's about the pattern leading up to the sale. Legitimate deals show consistent pricing before the discount. Fake deals show sudden price spikes right before the "sale."

4. The ML Model (Simpler Than You Think)

After testing Random Forests, XGBoost, and even a neural network, we settled on Gradient Boosting for its interpretability and performance:

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
import numpy as np

class FakeDealDetector:
    def __init__(self):
        self.model = GradientBoostingClassifier(
            n_estimators=200,
            learning_rate=0.1,
            max_depth=5,
            random_state=42
        )
        self.scaler = StandardScaler()

    def train(self, features, labels):
        """Train on historical labeled data"""
        X_train, X_test, y_train, y_test = train_test_split(
            features, labels, test_size=0.2, random_state=42
        )

        # Normalize features
        X_train_scaled = self.scaler.fit_transform(X_train)
        X_test_scaled = self.scaler.transform(X_test)

        # Train model
        self.model.fit(X_train_scaled, y_train)

        # Evaluate
        train_score = self.model.score(X_train_scaled, y_train)
        test_score = self.model.score(X_test_scaled, y_test)

        print(f"Training accuracy: {train_score:.2%}")
        print(f"Test accuracy: {test_score:.2%}")

        return self.model

    def predict_fake_probability(self, product_features):
        """Return probability that a deal is fake (0-1)"""
        features_scaled = self.scaler.transform([product_features])
        probabilities = self.model.predict_proba(features_scaled)
        return probabilities[0][1]  # Probability of class 1 (fake)

    def get_feature_importance(self):
        """Which features matter most?"""
        importance = self.model.feature_importances_
        return sorted(
            zip(self.feature_names, importance),
            key=lambda x: x[1],
            reverse=True
        )

Training data came from:

8,400 manually labeled deals (me + 2 colleagues, 3 weeks of work)
Historical data where we caught obvious fakes (price = "original price")
User reports of suspicious deals
Competitor sites like CamelCamelCamel for validation

Model performance:

94% accuracy on test set
1.8 seconds average inference time (down from 8.5s)
12% false positive rate (down from 67%)

5. Real-Time Alert System

The final piece: alerting users when we detect something fishy.

class AlertSystem:
    def __init__(self, detector, redis_client):
        self.detector = detector
        self.redis = redis_client

    async def check_deal(self, product_id, user_id):
        """Check if a deal is legitimate"""
        # Get product data
        product = await get_product_data(product_id)
        features = extract_features(product)

        # Get fake probability
        fake_prob = self.detector.predict_fake_probability(features)

        # Thresholds based on user preferences
        if fake_prob > 0.85:
            await self.send_alert(
                user_id, 
                product_id,
                severity='high',
                message=f"⚠️ This deal looks suspicious ({fake_prob:.0%} confidence)"
            )
        elif fake_prob > 0.60:
            await self.send_alert(
                user_id,
                product_id,
                severity='medium',
                message=f"🤔 This deal might be inflated (check price history)"
            )

        # Log for monitoring
        await self.redis.lpush(
            'detections',
            json.dumps({
                'product_id': product_id,
                'fake_probability': fake_prob,
                'timestamp': time.time()
            })
        )

The Results (And What We Learned)

First 30 days:

2,347 fake deals detected across Amazon, eBay, and Walmart
$47,000 estimated savings for users who avoided bad purchases
94% accuracy rate confirmed through user feedback
1.8 seconds average processing time per product

Most surprising finding: 34% of "deals" during Black Friday weekend had artificially inflated "original prices." The most common tactic? Raising the price by exactly 49% two weeks before the sale, then advertising a "50% off" discount.

Platform breakdown:

Amazon: 18% of sales had inflated originals
eBay: 41% (worse because of individual sellers)
Walmart: 22%

What I'd Do Differently

1. Start with More Training Data

8,400 labeled examples wasn't enough initially. We should have used semi-supervised learning to bootstrap from unlabeled data.

2. Build Interpretability From Day One

Users don't just want a "fake" flag—they want to know why. We added explanations later:

def explain_detection(product_id, fake_probability):
    """Generate human-readable explanation"""
    features = extract_features(get_product_data(product_id))
    importance = detector.get_feature_importance()

    reasons = []

    if features['price_spike_before_sale']:
        reasons.append("Price was raised 47% two weeks ago")

    if features['claimed_vs_observed_ratio'] > 1.5:
        reasons.append(
            f"'Original price' is {features['claimed_vs_observed_ratio']:.0%} "
            "higher than we've ever seen"
        )

    return {
        'fake_probability': fake_probability,
        'reasons': reasons,
        'recommendation': 'Wait for a better deal' if fake_probability > 0.7 else 'Probably okay'
    }

3. Monitor Retailer-Specific Patterns

Different retailers have different pricing behaviors. We should have trained separate models or added retailer embeddings.

The Tech Stack

Backend:

Python 3.11 + FastAPI
MongoDB for price history
Redis for caching
TensorFlow/Scikit-learn for ML

Scraping:

Beautiful Soup + lxml
Playwright for JavaScript-heavy sites
Rotating proxies (Bright Data)

Deployment:

AWS ECS (containerized)
CloudWatch for monitoring
S3 for model artifacts

Frontend (React):

// Price history chart component
import { LineChart } from 'recharts';

function PriceHistory({ priceData, suspiciousFlag }) {
  return (
    <div>
      <LineChart data={priceData} width={600} height={300}>
        <Line dataKey="price" stroke="#8884d8" />
        {suspiciousFlag && (
          <ReferenceLine 
            x={suspiciousFlag.date}
            stroke="red"
            label="Suspicious price spike"
          />
        )}
      </LineChart>
    </div>
  );
}

Understanding Price Manipulation Tactics

After analyzing millions of price points, we identified five common manipulation patterns:

The Ramp-Up: Gradually increase price over 2-3 weeks, then "discount" back to normal
The Anchor: Show an inflated "compare at" price that never actually existed
The Rotation: Cycle between "sale" and "regular" every 10-14 days
The Platform Arbitrage: Different "original prices" on Amazon vs. eBay vs. own website
The Flash Fake: Create urgency with countdown timers on permanently available deals

Try It Yourself

Want to build your own version? Here's a simplified starter:

# Minimal fake deal detector
import requests
from datetime import datetime, timedelta

def simple_fake_detector(product_url, days_to_check=30):
    """Basic version you can build in a weekend"""

    # 1. Scrape current price
    current_data = scrape_price(product_url)

    # 2. Get historical data (use Keepa API or similar)
    history = get_price_history(product_url, days=days_to_check)

    # 3. Calculate statistics
    avg_price = sum(h['price'] for h in history) / len(history)
    max_price = max(h['price'] for h in history)

    # 4. Check for red flags
    claimed_original = current_data.get('original_price', 0)
    current_price = current_data['price']

    red_flags = []

    if claimed_original > max_price * 1.3:
        red_flags.append("Original price never observed in 30-day history")

    if claimed_original == current_price:
        red_flags.append("No actual discount")

    recent_prices = [h['price'] for h in history[-7:]]
    if max(recent_prices) > avg_price * 1.2:
        red_flags.append("Price was recently inflated")

    return {
        'is_suspicious': len(red_flags) > 0,
        'red_flags': red_flags,
        'confidence': len(red_flags) / 3  # Simple confidence score
    }

Full code with training data available on GitHub (⭐ if you find it useful!).

What's Next?

We're currently working on:

Browser extension for real-time alerts while shopping
Multi-language support (expanding beyond US retailers)
Community reporting to improve training data
API access for other developers

See this technology in action on our deal tracking dashboard at Avluz.com.

Final Thoughts

Building this taught me that AI isn't about replacing human judgment—it's about scaling pattern recognition beyond what we can manually track. Could I spot one fake deal? Sure. Can I check 10,000 products every hour? Not a chance.

The real value isn't in catching the obvious scams. It's in identifying the subtle patterns that even experienced shoppers miss: the 2-week pre-inflation strategy, the cross-platform price discrepancies, the suspiciously round "original prices."

If you're thinking about building something similar, my advice: Start simple, but start with real data. Don't waste time building a complex ML pipeline until you've manually labeled a few hundred examples and understand what patterns you're actually looking for.

And most importantly: Your users care more about accurate alerts than fancy algorithms. A simple rule-based system that works is better than a neural network that doesn't.

Have questions about the implementation? Drop them in the comments. I'll answer everything I can without revealing our complete secret sauce. 😉

Resources:

Related Reading:

Written by a senior engineer at Avluz.com. We're hiring! Check out our careers page.

DEV Community