Product reviews are scattered across Amazon, Best Buy, Walmart, and dozens of niche sites. Building an aggregator that pulls reviews from multiple sources gives you a comprehensive view of any product. Here's how to build one with Python.
Why Aggregate Reviews?
- Complete picture — no single site has all reviews
- Spot fake reviews — cross-reference sentiment across platforms
- Competitive analysis — compare products using aggregated ratings
- Market research — understand what customers love and hate
Architecture
Our aggregator will:
- Search for a product across multiple retail sites
- Extract reviews, ratings, and metadata
- Normalize data into a common format
- Analyze sentiment and generate insights
Setting Up
pip install requests beautifulsoup4 pandas textblob
Base Review Scraper
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Amazon Review Scraper
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Best Buy Review Scraper
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
The Aggregator
import time
import pandas as pd
from textblob import TextBlob
class ReviewAggregator:
def __init__(self, api_key):
self.scrapers = {
"amazon": AmazonReviewScraper(api_key),
"bestbuy": BestBuyReviewScraper(api_key)
}
self.all_reviews = []
def aggregate(self, urls):
for source, url in urls.items():
if source in self.scrapers:
try:
reviews = self.scrapers[source].scrape_reviews(url)
self.all_reviews.extend(reviews)
print(f"Fetched {len(reviews)} reviews from {source}")
time.sleep(3)
except Exception as e:
print(f"Error with {source}: {e}")
return self.all_reviews
def analyze_sentiment(self):
for review in self.all_reviews:
blob = TextBlob(review.text)
review.sentiment = blob.sentiment.polarity
positive = sum(1 for r in self.all_reviews if getattr(r, "sentiment", 0) > 0.1)
negative = sum(1 for r in self.all_reviews if getattr(r, "sentiment", 0) < -0.1)
neutral = len(self.all_reviews) - positive - negative
return {
"total": len(self.all_reviews),
"positive": positive,
"negative": negative,
"neutral": neutral,
"avg_rating": sum(r.rating for r in self.all_reviews) / len(self.all_reviews) if self.all_reviews else 0
}
def export(self, filename="reviews.csv"):
data = [vars(r) for r in self.all_reviews]
df = pd.DataFrame(data)
df.to_csv(filename, index=False)
return df
agg = ReviewAggregator(api_key="YOUR_KEY")
urls = {
"amazon": "https://www.amazon.com/dp/B09V3KXJPB/",
"bestbuy": "https://www.bestbuy.com/site/reviews/6505727"
}
agg.aggregate(urls)
sentiment = agg.analyze_sentiment()
print(f"Average rating: {sentiment['avg_rating']:.1f}/5")
print(f"Positive: {sentiment['positive']}, Negative: {sentiment['negative']}")
Finding Common Themes
from collections import Counter
import re
def extract_themes(reviews, min_count=3):
words = []
stop_words = {"the", "a", "an", "is", "it", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "this", "that", "i", "my"}
for review in reviews:
text_words = re.findall(r"\b[a-z]+\b", review.text.lower())
words.extend(w for w in text_words if w not in stop_words and len(w) > 3)
common = Counter(words).most_common(20)
return [(word, count) for word, count in common if count >= min_count]
themes = extract_themes(agg.all_reviews)
for word, count in themes:
print(f"{word}: {count} mentions")
Proxy Strategy
Retail sites have strong anti-bot measures. Use ScraperAPI with JavaScript rendering for Amazon and Best Buy pages. For high-volume scraping across multiple stores, ThorData residential proxies ensure consistent access. Track your success rates with ScrapeOps to quickly identify when a scraper needs attention.
Conclusion
A review aggregator gives you a 360-degree view of product sentiment that no single platform provides. The key patterns — base scraper classes, common data models, and sentiment analysis — apply to any multi-source aggregation project. Start with two sources, validate your parsing, then expand to more.
Happy scraping!
Top comments (0)