DEV Community

agenthustler
agenthustler

Posted on

How to Scrape G2 and Trustpilot Reviews in 2026 (With Python Examples)

If you work in B2B SaaS, you already know: review platforms like G2 and Trustpilot are goldmines of competitive intelligence. The problem? No official APIs for bulk review data. In this guide, I will show you how to programmatically collect review data from both platforms, compare their data structures, and build something useful with it.

Want to skip the DIY approach? G2 Reviews Scraper on Apify handles pagination, anti-bot measures, and outputs clean JSON — ready for your pipeline.


Why Scrape Review Data?

Review platforms contain structured sentiment data that is surprisingly hard to get any other way:

  • Competitor monitoring — Track how rivals ratings change quarter over quarter
  • Sales intelligence — Know a prospects pain points with their current vendor before the call
  • Market research — Identify feature gaps across an entire category
  • Content marketing — Build data-driven comparison pages that rank

A SaaS company selling project management tools, for example, could monitor G2 reviews for Asana, Monday.com, and Jira to spot recurring complaints (too complex for small teams) and position their product against those weaknesses.

G2 vs Trustpilot vs Capterra: Data Comparison

Not all review platforms are equal. Here is what you can actually extract from each:

Feature G2 Trustpilot Capterra
Star rating 0.5-5.0 1-5 0.5-5.0
Review text Pros/Cons/Summary Title + Body Pros/Cons
Reviewer info Company size, role, industry Name, location Job title, industry
Review date Yes Yes Yes
Vendor response Yes Yes No
Category ranking Grid scores No Shortlist
Verified reviews LinkedIn-verified Invitation-verified LinkedIn-verified
Volume per product 100-10,000+ 1,000-1,000,000+ 50-5,000+
Anti-scraping Moderate (JS rendering) Low-Moderate Moderate
Best for B2B software B2C + B2B services B2B software

Key takeaway: G2 is the richest source for B2B software intelligence (reviewer company size and role are invaluable for segmentation). Trustpilot has higher volume and covers B2C. Capterra overlaps heavily with G2 but has less structured metadata.

Parsing Review Data with Python

Once you have collected reviews (via your own scraper or a tool like the Apify G2 scraper), here is how to analyze them:

import json
from collections import Counter
from datetime import datetime
from itertools import groupby

# Load review data (JSON array from scraper output)
with open("g2_reviews.json") as f:
    reviews = json.load(f)

# Basic stats
ratings = [r["rating"] for r in reviews]
avg_rating = sum(ratings) / len(ratings)
print(f"Total reviews: {len(reviews)}")
print(f"Average rating: {avg_rating:.2f}")

# Rating distribution
dist = Counter(ratings)
for stars in sorted(dist.keys(), reverse=True):
    bar = "#" * dist[stars]
    print(f"  {stars} stars: {bar} ({dist[stars]})")

# Sentiment trend: average rating per quarter
def quarter_key(review):
    dt = datetime.fromisoformat(review["date"])
    return f"{dt.year}-Q{(dt.month - 1) // 3 + 1}"

reviews_sorted = sorted(reviews, key=lambda r: r["date"])
print("Rating trend by quarter:")
for qtr, group in groupby(reviews_sorted, key=quarter_key):
    group_ratings = [r["rating"] for r in group]
    avg = sum(group_ratings) / len(group_ratings)
    print(f"  {qtr}: {avg:.2f} ({len(group_ratings)} reviews)")

# Find common complaints
negative = [r for r in reviews if r["rating"] <= 2]
all_cons = " ".join(r.get("cons", "") for r in negative).lower()
words = [w for w in all_cons.split() if len(w) > 4]
common_complaints = Counter(words).most_common(10)
print("Top words in negative reviews:")
for word, count in common_complaints:
    print(f"  {word} - {count} mentions")
Enter fullscreen mode Exit fullscreen mode

This gives you a quick competitive dashboard. For a real pipeline, you would want to:

  1. Schedule scrapes weekly or monthly to track trends
  2. Use NLP (spaCy, OpenAI) for proper sentiment analysis instead of keyword counting
  3. Store in a database and build dashboards (Metabase, Grafana)
  4. Alert on changes — a competitor dropping 0.3 stars in a quarter is a signal

Real-World Architecture

Here is how a competitor monitoring pipeline typically looks:

Scheduled scraper (Apify/cron)
    -> JSON output
    -> ETL into PostgreSQL/BigQuery
    -> NLP sentiment scoring
    -> Dashboard (Metabase/Looker)
    -> Slack alerts on significant changes
Enter fullscreen mode Exit fullscreen mode

The scraping part is the hardest to maintain yourself. G2 uses JavaScript rendering and changes their DOM structure regularly. Trustpilot has rate limiting and occasional CAPTCHAs. This is why managed scrapers save significant engineering time — you get clean JSON output without maintaining browser automation code.

Legal Considerations

A quick note on compliance: scraping publicly available review data is generally legal (see hiQ Labs v. LinkedIn, 2022), but you should:

  • Respect robots.txt directives
  • Do not overload servers — use reasonable delays between requests
  • Do not scrape private or gated content
  • Check each platform ToS for your specific use case
  • Store and process data in compliance with GDPR if handling EU reviewer data

Getting Started

The fastest path from zero to structured review data:

  1. Pick your targets — Which competitors or products do you want to monitor?
  2. Start with G2 if you are in B2B software — the metadata (company size, role, industry) makes segmentation possible
  3. Add Trustpilot for higher volume and B2C coverage
  4. Automate collection on a weekly schedule
  5. Build your analysis layer — even the simple Python script above reveals patterns

Ready to try it? Both scrapers are available with a free tier on Apify:

No credit card required to start. Output is clean JSON, ready for your Python pipeline or database.

Top comments (0)