DEV Community

agenthustler
agenthustler

Posted on • Edited on

Building a Product Review Aggregator Across Multiple Sites

Product reviews are scattered across Amazon, Best Buy, Walmart, and dozens of niche sites. Building an aggregator that pulls reviews from multiple sources gives you a comprehensive view of any product. Here's how to build one with Python.

Why Aggregate Reviews?

  • Complete picture — no single site has all reviews
  • Spot fake reviews — cross-reference sentiment across platforms
  • Competitive analysis — compare products using aggregated ratings
  • Market research — understand what customers love and hate

Architecture

Our aggregator will:

  1. Search for a product across multiple retail sites
  2. Extract reviews, ratings, and metadata
  3. Normalize data into a common format
  4. Analyze sentiment and generate insights

Setting Up

pip install requests beautifulsoup4 pandas textblob
Enter fullscreen mode Exit fullscreen mode

Base Review Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Amazon Review Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Best Buy Review Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

The Aggregator

import time
import pandas as pd
from textblob import TextBlob

class ReviewAggregator:
    def __init__(self, api_key):
        self.scrapers = {
            "amazon": AmazonReviewScraper(api_key),
            "bestbuy": BestBuyReviewScraper(api_key)
        }
        self.all_reviews = []

    def aggregate(self, urls):
        for source, url in urls.items():
            if source in self.scrapers:
                try:
                    reviews = self.scrapers[source].scrape_reviews(url)
                    self.all_reviews.extend(reviews)
                    print(f"Fetched {len(reviews)} reviews from {source}")
                    time.sleep(3)
                except Exception as e:
                    print(f"Error with {source}: {e}")
        return self.all_reviews

    def analyze_sentiment(self):
        for review in self.all_reviews:
            blob = TextBlob(review.text)
            review.sentiment = blob.sentiment.polarity

        positive = sum(1 for r in self.all_reviews if getattr(r, "sentiment", 0) > 0.1)
        negative = sum(1 for r in self.all_reviews if getattr(r, "sentiment", 0) < -0.1)
        neutral = len(self.all_reviews) - positive - negative

        return {
            "total": len(self.all_reviews),
            "positive": positive,
            "negative": negative,
            "neutral": neutral,
            "avg_rating": sum(r.rating for r in self.all_reviews) / len(self.all_reviews) if self.all_reviews else 0
        }

    def export(self, filename="reviews.csv"):
        data = [vars(r) for r in self.all_reviews]
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False)
        return df

agg = ReviewAggregator(api_key="YOUR_KEY")
urls = {
    "amazon": "https://www.amazon.com/dp/B09V3KXJPB/",
    "bestbuy": "https://www.bestbuy.com/site/reviews/6505727"
}
agg.aggregate(urls)
sentiment = agg.analyze_sentiment()
print(f"Average rating: {sentiment['avg_rating']:.1f}/5")
print(f"Positive: {sentiment['positive']}, Negative: {sentiment['negative']}")
Enter fullscreen mode Exit fullscreen mode

Finding Common Themes

from collections import Counter
import re

def extract_themes(reviews, min_count=3):
    words = []
    stop_words = {"the", "a", "an", "is", "it", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "this", "that", "i", "my"}

    for review in reviews:
        text_words = re.findall(r"\b[a-z]+\b", review.text.lower())
        words.extend(w for w in text_words if w not in stop_words and len(w) > 3)

    common = Counter(words).most_common(20)
    return [(word, count) for word, count in common if count >= min_count]

themes = extract_themes(agg.all_reviews)
for word, count in themes:
    print(f"{word}: {count} mentions")
Enter fullscreen mode Exit fullscreen mode

Proxy Strategy

Retail sites have strong anti-bot measures. Use ScraperAPI with JavaScript rendering for Amazon and Best Buy pages. For high-volume scraping across multiple stores, ThorData residential proxies ensure consistent access. Track your success rates with ScrapeOps to quickly identify when a scraper needs attention.

Conclusion

A review aggregator gives you a 360-degree view of product sentiment that no single platform provides. The key patterns — base scraper classes, common data models, and sentiment analysis — apply to any multi-source aggregation project. Start with two sources, validate your parsing, then expand to more.

Happy scraping!

Top comments (0)