If you work in B2B SaaS, you already know: review platforms like G2 and Trustpilot are goldmines of competitive intelligence. The problem? No official APIs for bulk review data. In this guide, I will show you how to programmatically collect review data from both platforms, compare their data structures, and build something useful with it.
Want to skip the DIY approach? G2 Reviews Scraper on Apify handles pagination, anti-bot measures, and outputs clean JSON — ready for your pipeline.
Why Scrape Review Data?
Review platforms contain structured sentiment data that is surprisingly hard to get any other way:
- Competitor monitoring — Track how rivals ratings change quarter over quarter
- Sales intelligence — Know a prospects pain points with their current vendor before the call
- Market research — Identify feature gaps across an entire category
- Content marketing — Build data-driven comparison pages that rank
A SaaS company selling project management tools, for example, could monitor G2 reviews for Asana, Monday.com, and Jira to spot recurring complaints (too complex for small teams) and position their product against those weaknesses.
G2 vs Trustpilot vs Capterra: Data Comparison
Not all review platforms are equal. Here is what you can actually extract from each:
| Feature | G2 | Trustpilot | Capterra |
|---|---|---|---|
| Star rating | 0.5-5.0 | 1-5 | 0.5-5.0 |
| Review text | Pros/Cons/Summary | Title + Body | Pros/Cons |
| Reviewer info | Company size, role, industry | Name, location | Job title, industry |
| Review date | Yes | Yes | Yes |
| Vendor response | Yes | Yes | No |
| Category ranking | Grid scores | No | Shortlist |
| Verified reviews | LinkedIn-verified | Invitation-verified | LinkedIn-verified |
| Volume per product | 100-10,000+ | 1,000-1,000,000+ | 50-5,000+ |
| Anti-scraping | Moderate (JS rendering) | Low-Moderate | Moderate |
| Best for | B2B software | B2C + B2B services | B2B software |
Key takeaway: G2 is the richest source for B2B software intelligence (reviewer company size and role are invaluable for segmentation). Trustpilot has higher volume and covers B2C. Capterra overlaps heavily with G2 but has less structured metadata.
Parsing Review Data with Python
Once you have collected reviews (via your own scraper or a tool like the Apify G2 scraper), here is how to analyze them:
import json
from collections import Counter
from datetime import datetime
from itertools import groupby
# Load review data (JSON array from scraper output)
with open("g2_reviews.json") as f:
reviews = json.load(f)
# Basic stats
ratings = [r["rating"] for r in reviews]
avg_rating = sum(ratings) / len(ratings)
print(f"Total reviews: {len(reviews)}")
print(f"Average rating: {avg_rating:.2f}")
# Rating distribution
dist = Counter(ratings)
for stars in sorted(dist.keys(), reverse=True):
bar = "#" * dist[stars]
print(f" {stars} stars: {bar} ({dist[stars]})")
# Sentiment trend: average rating per quarter
def quarter_key(review):
dt = datetime.fromisoformat(review["date"])
return f"{dt.year}-Q{(dt.month - 1) // 3 + 1}"
reviews_sorted = sorted(reviews, key=lambda r: r["date"])
print("Rating trend by quarter:")
for qtr, group in groupby(reviews_sorted, key=quarter_key):
group_ratings = [r["rating"] for r in group]
avg = sum(group_ratings) / len(group_ratings)
print(f" {qtr}: {avg:.2f} ({len(group_ratings)} reviews)")
# Find common complaints
negative = [r for r in reviews if r["rating"] <= 2]
all_cons = " ".join(r.get("cons", "") for r in negative).lower()
words = [w for w in all_cons.split() if len(w) > 4]
common_complaints = Counter(words).most_common(10)
print("Top words in negative reviews:")
for word, count in common_complaints:
print(f" {word} - {count} mentions")
This gives you a quick competitive dashboard. For a real pipeline, you would want to:
- Schedule scrapes weekly or monthly to track trends
- Use NLP (spaCy, OpenAI) for proper sentiment analysis instead of keyword counting
- Store in a database and build dashboards (Metabase, Grafana)
- Alert on changes — a competitor dropping 0.3 stars in a quarter is a signal
Real-World Architecture
Here is how a competitor monitoring pipeline typically looks:
Scheduled scraper (Apify/cron)
-> JSON output
-> ETL into PostgreSQL/BigQuery
-> NLP sentiment scoring
-> Dashboard (Metabase/Looker)
-> Slack alerts on significant changes
The scraping part is the hardest to maintain yourself. G2 uses JavaScript rendering and changes their DOM structure regularly. Trustpilot has rate limiting and occasional CAPTCHAs. This is why managed scrapers save significant engineering time — you get clean JSON output without maintaining browser automation code.
Legal Considerations
A quick note on compliance: scraping publicly available review data is generally legal (see hiQ Labs v. LinkedIn, 2022), but you should:
- Respect robots.txt directives
- Do not overload servers — use reasonable delays between requests
- Do not scrape private or gated content
- Check each platform ToS for your specific use case
- Store and process data in compliance with GDPR if handling EU reviewer data
Getting Started
The fastest path from zero to structured review data:
- Pick your targets — Which competitors or products do you want to monitor?
- Start with G2 if you are in B2B software — the metadata (company size, role, industry) makes segmentation possible
- Add Trustpilot for higher volume and B2C coverage
- Automate collection on a weekly schedule
- Build your analysis layer — even the simple Python script above reveals patterns
Ready to try it? Both scrapers are available with a free tier on Apify:
- G2 Reviews Scraper — structured B2B review data with company metadata
- Trustpilot Reviews Scraper — high-volume review collection with sentiment data
No credit card required to start. Output is clean JSON, ready for your Python pipeline or database.
Top comments (0)