agenthustler

Posted on Mar 26 • Edited on Apr 19

Building a Product Review Aggregator Across Multiple Sites

#python #webdev #programming #tutorial

Product reviews are scattered across Amazon, Best Buy, Walmart, and dozens of niche sites. Building an aggregator that pulls reviews from multiple sources gives you a comprehensive view of any product. Here's how to build one with Python.

Why Aggregate Reviews?

Complete picture — no single site has all reviews
Spot fake reviews — cross-reference sentiment across platforms
Competitive analysis — compare products using aggregated ratings
Market research — understand what customers love and hate

Architecture

Our aggregator will:

Search for a product across multiple retail sites
Extract reviews, ratings, and metadata
Normalize data into a common format
Analyze sentiment and generate insights

Setting Up

pip install requests beautifulsoup4 pandas textblob

Base Review Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Amazon Review Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Best Buy Review Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

The Aggregator

import time
import pandas as pd
from textblob import TextBlob

class ReviewAggregator:
    def __init__(self, api_key):
        self.scrapers = {
            "amazon": AmazonReviewScraper(api_key),
            "bestbuy": BestBuyReviewScraper(api_key)
        }
        self.all_reviews = []

    def aggregate(self, urls):
        for source, url in urls.items():
            if source in self.scrapers:
                try:
                    reviews = self.scrapers[source].scrape_reviews(url)
                    self.all_reviews.extend(reviews)
                    print(f"Fetched {len(reviews)} reviews from {source}")
                    time.sleep(3)
                except Exception as e:
                    print(f"Error with {source}: {e}")
        return self.all_reviews

    def analyze_sentiment(self):
        for review in self.all_reviews:
            blob = TextBlob(review.text)
            review.sentiment = blob.sentiment.polarity

        positive = sum(1 for r in self.all_reviews if getattr(r, "sentiment", 0) > 0.1)
        negative = sum(1 for r in self.all_reviews if getattr(r, "sentiment", 0) < -0.1)
        neutral = len(self.all_reviews) - positive - negative

        return {
            "total": len(self.all_reviews),
            "positive": positive,
            "negative": negative,
            "neutral": neutral,
            "avg_rating": sum(r.rating for r in self.all_reviews) / len(self.all_reviews) if self.all_reviews else 0
        }

    def export(self, filename="reviews.csv"):
        data = [vars(r) for r in self.all_reviews]
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False)
        return df

agg = ReviewAggregator(api_key="YOUR_KEY")
urls = {
    "amazon": "https://www.amazon.com/dp/B09V3KXJPB/",
    "bestbuy": "https://www.bestbuy.com/site/reviews/6505727"
}
agg.aggregate(urls)
sentiment = agg.analyze_sentiment()
print(f"Average rating: {sentiment['avg_rating']:.1f}/5")
print(f"Positive: {sentiment['positive']}, Negative: {sentiment['negative']}")

Finding Common Themes

from collections import Counter
import re

def extract_themes(reviews, min_count=3):
    words = []
    stop_words = {"the", "a", "an", "is", "it", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "this", "that", "i", "my"}

    for review in reviews:
        text_words = re.findall(r"\b[a-z]+\b", review.text.lower())
        words.extend(w for w in text_words if w not in stop_words and len(w) > 3)

    common = Counter(words).most_common(20)
    return [(word, count) for word, count in common if count >= min_count]

themes = extract_themes(agg.all_reviews)
for word, count in themes:
    print(f"{word}: {count} mentions")

Proxy Strategy

Retail sites have strong anti-bot measures. Use ScraperAPI with JavaScript rendering for Amazon and Best Buy pages. For high-volume scraping across multiple stores, ThorData residential proxies ensure consistent access. Track your success rates with ScrapeOps to quickly identify when a scraper needs attention.

Conclusion

A review aggregator gives you a 360-degree view of product sentiment that no single platform provides. The key patterns — base scraper classes, common data models, and sentiment analysis — apply to any multi-source aggregation project. Start with two sources, validate your parsing, then expand to more.

Happy scraping!

DEV Community