agenthustler

Posted on Mar 26 • Edited on Apr 19

How to Scrape Trustpilot in 2026: Reviews, Ratings, and Business Data

#webdev #python #webscraping #tutorial

Trustpilot hosts over 300 million reviews for 1 million+ businesses. For brand monitoring, competitive intelligence, or market research — programmatic access to this data is invaluable.

This guide covers how to scrape Trustpilot reviews and business data in 2026 with Python, including working code examples and strategies for handling anti-bot protections.

What Trustpilot Data Can You Extract?

Reviews: text, star rating, date, author, reply from business
Business profiles: overall rating, total reviews, TrustScore, categories, location
Review statistics: rating distribution, review frequency over time
Reviewer profiles: number of reviews, location, verification status

Method 1: Scraping with Python + BeautifulSoup

Trustpilot renders most content server-side, making it straightforward to parse:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Method 2: Using Trustpilot's Hidden API

Trustpilot's frontend calls internal API endpoints that return clean JSON:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Handling Anti-Bot Protection

Trustpilot uses Cloudflare and its own bot detection. Here's how to handle it:

Proxy Rotation

Residential proxies are essential for any volume:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Smart Headers with ScrapeOps

ScrapeOps provides fake browser headers and proxy aggregation to improve success rates:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Method 3: Using a Managed Scraper

For production use cases where you need reliable, maintained infrastructure:

Trustpilot Scraper on Apify handles proxy rotation, CAPTCHA solving, and anti-bot bypassing out of the box. You provide business URLs, it returns structured review data in JSON, CSV, or Excel format.

This is the fastest path from zero to data if you don't want to maintain scraping infrastructure.

Use Cases for Trustpilot Data

Brand Monitoring

Track your company's review trends over time:

from collections import Counter
from datetime import datetime

def analyze_sentiment_trend(reviews: list) -> dict:
    """Analyze rating distribution and trends."""
    ratings = Counter(r.get('rating', 0) for r in reviews)

    # Monthly breakdown
    monthly = {}
    for review in reviews:
        date_str = review.get('date', '')
        if date_str:
            month = date_str[:7]  # YYYY-MM
            if month not in monthly:
                monthly[month] = []
            monthly[month].append(review.get('rating', 0))

    monthly_avg = {
        month: sum(ratings_list) / len(ratings_list)
        for month, ratings_list in sorted(monthly.items())
    }

    return {
        'total_reviews': len(reviews),
        'rating_distribution': dict(ratings),
        'average_rating': (
            sum(r.get('rating', 0) for r in reviews) / len(reviews)
            if reviews else 0
        ),
        'monthly_averages': monthly_avg,
    }

reviews = scraper.get_reviews('yourcompany.com', pages=10)
trends = analyze_sentiment_trend(reviews)
print(f"Average: {trends['average_rating']:.1f}/5")
print(f"Distribution: {trends['rating_distribution']}")

Competitive Intelligence

def compare_competitors(domains: list) -> list:
    """Compare Trustpilot scores across competitors."""
    scraper = TrustpilotScraper()
    results = []

    for domain in domains:
        info = scraper.get_business_info(domain)
        results.append({
            'domain': domain,
            'rating': info.get('rating', 'N/A'),
            'reviews': info.get('review_count', 0),
        })
        time.sleep(random.uniform(3, 6))

    results.sort(key=lambda x: float(x['rating'] or 0), reverse=True)
    return results


competitors = ['shopify.com', 'woocommerce.com', 'bigcommerce.com']
ranking = compare_competitors(competitors)
for r in ranking:
    print(f"{r['domain']}: {r['rating']}/5 ({r['reviews']} reviews)")

Storing Results

import csv
import json

def export_reviews(reviews: list, domain: str):
    """Export reviews to CSV and JSON."""
    # CSV
    csv_file = f'trustpilot_{domain.replace(".", "_")}.csv'
    if reviews:
        with open(csv_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=reviews[0].keys())
            writer.writeheader()
            writer.writerows(reviews)

    # JSON
    json_file = f'trustpilot_{domain.replace(".", "_")}.json'
    with open(json_file, 'w', encoding='utf-8') as f:
        json.dump(reviews, f, indent=2, ensure_ascii=False)

    print(f'Saved {len(reviews)} reviews to {csv_file} and {json_file}')

Ethical Scraping Guidelines

Respect rate limits: Add delays between requests (2-5 seconds minimum)
Check robots.txt: Trustpilot's robots.txt allows scraping of review pages
Don't scrape personal data beyond what's publicly visible
Comply with GDPR/CCPA when processing reviewer data in EU/California
Cache results: Don't re-scrape data you already have

Wrapping Up

Trustpilot scraping in 2026 comes down to three approaches:

Quick and simple: BeautifulSoup + requests with a residential proxy
Smarter scraping: Use ScrapeOps for header rotation and proxy aggregation
Production-ready: Use a managed Trustpilot scraper for reliability at scale

Start with the Python examples above, and scale up to managed solutions when your volume demands it. The key is respecting the platform while getting the data you need.

DEV Community