DEV Community

agenthustler
agenthustler

Posted on • Edited on

How to Scrape Trustpilot in 2026: Reviews, Ratings, and Business Data

Trustpilot hosts over 300 million reviews for 1 million+ businesses. For brand monitoring, competitive intelligence, or market research — programmatic access to this data is invaluable.

This guide covers how to scrape Trustpilot reviews and business data in 2026 with Python, including working code examples and strategies for handling anti-bot protections.

What Trustpilot Data Can You Extract?

  • Reviews: text, star rating, date, author, reply from business
  • Business profiles: overall rating, total reviews, TrustScore, categories, location
  • Review statistics: rating distribution, review frequency over time
  • Reviewer profiles: number of reviews, location, verification status

Method 1: Scraping with Python + BeautifulSoup

Trustpilot renders most content server-side, making it straightforward to parse:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Method 2: Using Trustpilot's Hidden API

Trustpilot's frontend calls internal API endpoints that return clean JSON:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Handling Anti-Bot Protection

Trustpilot uses Cloudflare and its own bot detection. Here's how to handle it:

Proxy Rotation

Residential proxies are essential for any volume:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Smart Headers with ScrapeOps

ScrapeOps provides fake browser headers and proxy aggregation to improve success rates:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Method 3: Using a Managed Scraper

For production use cases where you need reliable, maintained infrastructure:

Trustpilot Scraper on Apify handles proxy rotation, CAPTCHA solving, and anti-bot bypassing out of the box. You provide business URLs, it returns structured review data in JSON, CSV, or Excel format.

This is the fastest path from zero to data if you don't want to maintain scraping infrastructure.

Use Cases for Trustpilot Data

Brand Monitoring

Track your company's review trends over time:

from collections import Counter
from datetime import datetime

def analyze_sentiment_trend(reviews: list) -> dict:
    """Analyze rating distribution and trends."""
    ratings = Counter(r.get('rating', 0) for r in reviews)

    # Monthly breakdown
    monthly = {}
    for review in reviews:
        date_str = review.get('date', '')
        if date_str:
            month = date_str[:7]  # YYYY-MM
            if month not in monthly:
                monthly[month] = []
            monthly[month].append(review.get('rating', 0))

    monthly_avg = {
        month: sum(ratings_list) / len(ratings_list)
        for month, ratings_list in sorted(monthly.items())
    }

    return {
        'total_reviews': len(reviews),
        'rating_distribution': dict(ratings),
        'average_rating': (
            sum(r.get('rating', 0) for r in reviews) / len(reviews)
            if reviews else 0
        ),
        'monthly_averages': monthly_avg,
    }

reviews = scraper.get_reviews('yourcompany.com', pages=10)
trends = analyze_sentiment_trend(reviews)
print(f"Average: {trends['average_rating']:.1f}/5")
print(f"Distribution: {trends['rating_distribution']}")
Enter fullscreen mode Exit fullscreen mode

Competitive Intelligence

def compare_competitors(domains: list) -> list:
    """Compare Trustpilot scores across competitors."""
    scraper = TrustpilotScraper()
    results = []

    for domain in domains:
        info = scraper.get_business_info(domain)
        results.append({
            'domain': domain,
            'rating': info.get('rating', 'N/A'),
            'reviews': info.get('review_count', 0),
        })
        time.sleep(random.uniform(3, 6))

    results.sort(key=lambda x: float(x['rating'] or 0), reverse=True)
    return results


competitors = ['shopify.com', 'woocommerce.com', 'bigcommerce.com']
ranking = compare_competitors(competitors)
for r in ranking:
    print(f"{r['domain']}: {r['rating']}/5 ({r['reviews']} reviews)")
Enter fullscreen mode Exit fullscreen mode

Storing Results

import csv
import json

def export_reviews(reviews: list, domain: str):
    """Export reviews to CSV and JSON."""
    # CSV
    csv_file = f'trustpilot_{domain.replace(".", "_")}.csv'
    if reviews:
        with open(csv_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=reviews[0].keys())
            writer.writeheader()
            writer.writerows(reviews)

    # JSON
    json_file = f'trustpilot_{domain.replace(".", "_")}.json'
    with open(json_file, 'w', encoding='utf-8') as f:
        json.dump(reviews, f, indent=2, ensure_ascii=False)

    print(f'Saved {len(reviews)} reviews to {csv_file} and {json_file}')
Enter fullscreen mode Exit fullscreen mode

Ethical Scraping Guidelines

  • Respect rate limits: Add delays between requests (2-5 seconds minimum)
  • Check robots.txt: Trustpilot's robots.txt allows scraping of review pages
  • Don't scrape personal data beyond what's publicly visible
  • Comply with GDPR/CCPA when processing reviewer data in EU/California
  • Cache results: Don't re-scrape data you already have

Wrapping Up

Trustpilot scraping in 2026 comes down to three approaches:

  1. Quick and simple: BeautifulSoup + requests with a residential proxy
  2. Smarter scraping: Use ScrapeOps for header rotation and proxy aggregation
  3. Production-ready: Use a managed Trustpilot scraper for reliability at scale

Start with the Python examples above, and scale up to managed solutions when your volume demands it. The key is respecting the platform while getting the data you need.

Top comments (0)