Trustpilot hosts over 300 million reviews for 1 million+ businesses. For brand monitoring, competitive intelligence, or market research — programmatic access to this data is invaluable.
This guide covers how to scrape Trustpilot reviews and business data in 2026 with Python, including working code examples and strategies for handling anti-bot protections.
What Trustpilot Data Can You Extract?
- Reviews: text, star rating, date, author, reply from business
- Business profiles: overall rating, total reviews, TrustScore, categories, location
- Review statistics: rating distribution, review frequency over time
- Reviewer profiles: number of reviews, location, verification status
Method 1: Scraping with Python + BeautifulSoup
Trustpilot renders most content server-side, making it straightforward to parse:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Method 2: Using Trustpilot's Hidden API
Trustpilot's frontend calls internal API endpoints that return clean JSON:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Handling Anti-Bot Protection
Trustpilot uses Cloudflare and its own bot detection. Here's how to handle it:
Proxy Rotation
Residential proxies are essential for any volume:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Smart Headers with ScrapeOps
ScrapeOps provides fake browser headers and proxy aggregation to improve success rates:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Method 3: Using a Managed Scraper
For production use cases where you need reliable, maintained infrastructure:
Trustpilot Scraper on Apify handles proxy rotation, CAPTCHA solving, and anti-bot bypassing out of the box. You provide business URLs, it returns structured review data in JSON, CSV, or Excel format.
This is the fastest path from zero to data if you don't want to maintain scraping infrastructure.
Use Cases for Trustpilot Data
Brand Monitoring
Track your company's review trends over time:
from collections import Counter
from datetime import datetime
def analyze_sentiment_trend(reviews: list) -> dict:
"""Analyze rating distribution and trends."""
ratings = Counter(r.get('rating', 0) for r in reviews)
# Monthly breakdown
monthly = {}
for review in reviews:
date_str = review.get('date', '')
if date_str:
month = date_str[:7] # YYYY-MM
if month not in monthly:
monthly[month] = []
monthly[month].append(review.get('rating', 0))
monthly_avg = {
month: sum(ratings_list) / len(ratings_list)
for month, ratings_list in sorted(monthly.items())
}
return {
'total_reviews': len(reviews),
'rating_distribution': dict(ratings),
'average_rating': (
sum(r.get('rating', 0) for r in reviews) / len(reviews)
if reviews else 0
),
'monthly_averages': monthly_avg,
}
reviews = scraper.get_reviews('yourcompany.com', pages=10)
trends = analyze_sentiment_trend(reviews)
print(f"Average: {trends['average_rating']:.1f}/5")
print(f"Distribution: {trends['rating_distribution']}")
Competitive Intelligence
def compare_competitors(domains: list) -> list:
"""Compare Trustpilot scores across competitors."""
scraper = TrustpilotScraper()
results = []
for domain in domains:
info = scraper.get_business_info(domain)
results.append({
'domain': domain,
'rating': info.get('rating', 'N/A'),
'reviews': info.get('review_count', 0),
})
time.sleep(random.uniform(3, 6))
results.sort(key=lambda x: float(x['rating'] or 0), reverse=True)
return results
competitors = ['shopify.com', 'woocommerce.com', 'bigcommerce.com']
ranking = compare_competitors(competitors)
for r in ranking:
print(f"{r['domain']}: {r['rating']}/5 ({r['reviews']} reviews)")
Storing Results
import csv
import json
def export_reviews(reviews: list, domain: str):
"""Export reviews to CSV and JSON."""
# CSV
csv_file = f'trustpilot_{domain.replace(".", "_")}.csv'
if reviews:
with open(csv_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=reviews[0].keys())
writer.writeheader()
writer.writerows(reviews)
# JSON
json_file = f'trustpilot_{domain.replace(".", "_")}.json'
with open(json_file, 'w', encoding='utf-8') as f:
json.dump(reviews, f, indent=2, ensure_ascii=False)
print(f'Saved {len(reviews)} reviews to {csv_file} and {json_file}')
Ethical Scraping Guidelines
- Respect rate limits: Add delays between requests (2-5 seconds minimum)
- Check robots.txt: Trustpilot's robots.txt allows scraping of review pages
- Don't scrape personal data beyond what's publicly visible
- Comply with GDPR/CCPA when processing reviewer data in EU/California
- Cache results: Don't re-scrape data you already have
Wrapping Up
Trustpilot scraping in 2026 comes down to three approaches:
- Quick and simple: BeautifulSoup + requests with a residential proxy
- Smarter scraping: Use ScrapeOps for header rotation and proxy aggregation
- Production-ready: Use a managed Trustpilot scraper for reliability at scale
Start with the Python examples above, and scale up to managed solutions when your volume demands it. The key is respecting the platform while getting the data you need.
Top comments (0)