DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Trustpilot Reviews in 2026: Build a Reputation Monitor with Python

Trustpilot is one of the most trusted review platforms in the world — but manually tracking reviews for your brand (or competitors) doesn't scale. In this guide, I'll show you how to scrape Trustpilot reviews programmatically in 2026 using Python.

Why Scrape Trustpilot?

Businesses use Trustpilot data for:

  • Reputation monitoring — get alerted when new negative reviews appear
  • Competitor analysis — track how your competitors' ratings change over time
  • Sentiment analysis — feed reviews into NLP models to spot trends
  • Lead generation — find companies with poor reviews in your niche (sales opportunity)

Trustpilot has over 1 million businesses listed with hundreds of millions of reviews. That's a goldmine of structured data — if you can access it.

The Challenge: Trustpilot's Anti-Bot Protections

Trustpilot uses Cloudflare protection and browser fingerprinting to block scrapers. A naive requests.get() will return a 403 or a CAPTCHA page almost immediately.

import requests

# This will FAIL on Trustpilot
response = requests.get('https://www.trustpilot.com/review/amazon.com')
print(response.status_code)  # 403 or CAPTCHA HTML
Enter fullscreen mode Exit fullscreen mode

You need to handle:

  1. JavaScript rendering (Cloudflare JS challenges)
  2. Browser fingerprinting
  3. Rate limiting and IP rotation

Approach 1: Using ScraperAPI (Recommended)

ScraperAPI handles all the anti-bot complexity for you. It rotates IPs, renders JavaScript, and manages browser fingerprints automatically.

import requests
from urllib.parse import quote

SCRAPER_API_KEY = 'YOUR_API_KEY'  # Sign up at scraperapi.com

def scrape_trustpilot_reviews(company_domain, page=1):
    url = f'https://www.trustpilot.com/review/{company_domain}?page={page}'
    api_url = f'http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={quote(url)}&render=true'

    response = requests.get(api_url, timeout=60)
    return response.text

# Scrape Amazon's Trustpilot reviews
html = scrape_trustpilot_reviews('amazon.com', page=1)
print(f'Got {len(html)} characters of HTML')
Enter fullscreen mode Exit fullscreen mode

Then parse the HTML with BeautifulSoup:

from bs4 import BeautifulSoup
import re

def parse_reviews(html):
    soup = BeautifulSoup(html, 'html.parser')
    reviews = []

    # Find all review cards
    review_cards = soup.find_all('article', {'data-service-review-card-paper': 'true'})

    for card in review_cards:
        # Extract rating (1-5 stars)
        rating_div = card.find('div', {'data-service-review-rating': True})
        rating = int(rating_div['data-service-review-rating']) if rating_div else None

        # Extract review text
        text_div = card.find('p', {'data-service-review-text-typography': 'true'})
        text = text_div.get_text(strip=True) if text_div else ''

        # Extract reviewer name
        name_div = card.find('span', {'data-consumer-name-typography': 'true'})
        name = name_div.get_text(strip=True) if name_div else 'Anonymous'

        # Extract date
        date_el = card.find('time')
        date = date_el['datetime'] if date_el else None

        reviews.append({
            'rating': rating,
            'text': text,
            'author': name,
            'date': date
        })

    return reviews

reviews = parse_reviews(html)
for r in reviews[:3]:
    print(f"{''*r['rating']}{''*(5-r['rating'])} {r['author']}: {r['text'][:100]}")
Enter fullscreen mode Exit fullscreen mode

Approach 2: Ready-Made API (No Code Required)

If you don't want to maintain a scraper, you can use the Trustpilot Scraper on Apify — a free actor that handles all the extraction for you.

from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')

# Run the Trustpilot scraper
run = client.actor('cryptosignals/trustpilot-scraper').call(run_input={
    'domain': 'amazon.com',
    'maxReviews': 100,
    'sortBy': 'recency'
})

# Get results
for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(f"{item['rating']} | {item['author']}: {item['text'][:80]}")
Enter fullscreen mode Exit fullscreen mode

This approach is fully managed — no Cloudflare to fight, no IP rotation to configure.

Approach 3: Trustpilot's (Limited) Public API

Trustpilot does have a public API, but it's heavily restricted:

  • Requires business account verification
  • Rate limits: 100 requests/hour
  • Can only access reviews for your own business
  • No competitor data access

For any real use case (competitor analysis, market research), you need to scrape.

Building a Reputation Monitor

Here's a complete reputation monitoring script that runs daily and alerts you to new negative reviews:

import requests
import json
import os
from datetime import datetime, timedelta
from bs4 import BeautifulSoup
from urllib.parse import quote

SCRAPER_API_KEY = os.environ['SCRAPER_API_KEY']
SLACK_WEBHOOK = os.environ.get('SLACK_WEBHOOK_URL', '')

def scrape_recent_reviews(domain, days_back=1):
    """Get reviews from the last N days."""
    all_reviews = []
    cutoff = datetime.now() - timedelta(days=days_back)

    for page in range(1, 6):  # Check up to 5 pages
        url = f'https://www.trustpilot.com/review/{domain}?page={page}&sort=recency'
        api_url = f'http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={quote(url)}&render=true'

        resp = requests.get(api_url, timeout=60)
        reviews = parse_reviews(resp.text)

        if not reviews:
            break

        for review in reviews:
            if review['date']:
                review_date = datetime.fromisoformat(review['date'].replace('Z', '+00:00'))
                if review_date.replace(tzinfo=None) < cutoff:
                    return all_reviews  # Older than cutoff, stop
                all_reviews.append(review)

    return all_reviews

def send_slack_alert(reviews, domain):
    """Send Slack alert for negative reviews."""
    negative = [r for r in reviews if r['rating'] and r['rating'] <= 2]

    if not negative:
        return

    message = f'⚠️ {len(negative)} new negative review(s) for {domain}:\n'
    for r in negative[:3]:  # Show first 3
        stars = '' * r['rating'] + '' * (5 - r['rating'])
        message += f'\n{stars} {r["author"]}: {r["text"][:200]}'

    requests.post(SLACK_WEBHOOK, json={'text': message})

# Run daily (add to cron: 0 9 * * * python3 reputation_monitor.py)
if __name__ == '__main__':
    domains = ['yourcompany.com', 'competitor1.com', 'competitor2.com']

    for domain in domains:
        reviews = scrape_recent_reviews(domain, days_back=1)
        print(f'{domain}: {len(reviews)} new reviews')
        send_slack_alert(reviews, domain)
Enter fullscreen mode Exit fullscreen mode

Performance Tips

  1. Cache aggressively — older reviews don't change. Only fetch recent pages.
  2. Use pagination wisely — sort by recency and stop when you hit reviews older than your cutoff.
  3. Respect rate limits — add time.sleep(2) between requests to avoid IP bans even with a proxy.
  4. Store in a database — SQLite works great for single-machine setups.

What You Can Build

With Trustpilot data, you can build:

Product Potential Market
Brand monitoring SaaS Marketing agencies
Competitor intelligence tool B2B SaaS companies
Review aggregator E-commerce brands
Sentiment dashboard Customer success teams
Lead gen tool Sales agencies

Conclusion

Trustpilot scraping in 2026 requires handling Cloudflare protection, but with the right tools it's entirely manageable. ScraperAPI is the easiest entry point for managed proxy rotation, while the Trustpilot Scraper on Apify gives you a no-code option.

Start with a simple reputation monitor, then build from there. The commercial use cases are real — businesses pay good money for competitive intelligence data.


Have questions or want to share what you built? Drop a comment below.

Top comments (0)