How to Scrape G2, Capterra, and Trustpilot Reviews for Competitive Analysis
Software buyers check G2 and Capterra before making purchase decisions. If your competitor gets a wave of negative reviews — you want to know immediately. If they're winning on specific features — you need to know what users are saying.
Here's how to extract B2B review data programmatically.
What You Can Extract From Review Platforms
| Platform | Stars | Review Text | Pros/Cons | Reviewer Role | Company Size |
|---|---|---|---|---|---|
| G2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Capterra | ✅ | ✅ | ✅ | ✅ | ✅ |
| Trustpilot | ✅ | ✅ | ❌ | ❌ | ❌ |
Why This Matters for B2B Companies
Before a sales call: Know exactly what prospects complain about in your competitor's reviews. Address those pain points in your pitch.
Product roadmap: Find the most commonly requested features in your category. Real user language, ranked by frequency.
Competitive monitoring: Alert when competitor review volume spikes (could mean a PR issue or big launch).
Market sizing: Count the number of verified reviewers to estimate user base without public metrics.
Method 1: Scraping G2 Reviews (Python + requests)
G2 loads reviews via a GraphQL API that's accessible without authentication for public data:
import requests
import json
def get_g2_reviews(product_slug, page=1, per_page=20):
"""
Fetch G2 reviews for a product.
product_slug: e.g., "hubspot-crm", "salesforce", "monday-com"
"""
url = "https://www.g2.com/graphql"
query = {
"query": """
query ProductReviews($slug: String!, $page: Int!, $perPage: Int!) {
product(slug: $slug) {
name
reviewsList(page: $page, perPage: $perPage) {
reviews {
id
title
body
pros
cons
starRating
createdAt
reviewer {
title
companySize
}
}
totalCount
}
}
}
""",
"variables": {
"slug": product_slug,
"page": page,
"perPage": per_page
}
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Content-Type": "application/json",
"Accept": "application/json",
"Referer": f"https://www.g2.com/products/{product_slug}/reviews",
}
r = requests.post(url, json=query, headers=headers, timeout=20)
if r.status_code != 200:
print(f"Error: {r.status_code}")
return []
data = r.json()
product_data = data.get("data", {}).get("product", {})
reviews_data = product_data.get("reviewsList", {}).get("reviews", [])
return [{
"title": rev.get("title"),
"body": rev.get("body"),
"pros": rev.get("pros"),
"cons": rev.get("cons"),
"rating": rev.get("starRating"),
"date": rev.get("createdAt"),
"reviewer_role": rev.get("reviewer", {}).get("title"),
"company_size": rev.get("reviewer", {}).get("companySize"),
} for rev in reviews_data]
# Example: Scrape HubSpot CRM reviews
reviews = get_g2_reviews("hubspot-crm", page=1, per_page=20)
for rev in reviews[:3]:
print(f"⭐{rev['rating']} | {rev['reviewer_role']}")
print(f"PROS: {rev['pros'][:100]}")
print(f"CONS: {rev['cons'][:100]}")
print()
Note: G2's GraphQL schema changes periodically. Test the query against their actual schema if this stops working.
Method 2: Capterra Reviews
Capterra uses a different structure but is equally accessible:
import requests
from bs4 import BeautifulSoup
import time
def scrape_capterra_reviews(product_url, max_pages=5):
"""
Scrape Capterra reviews for a software product.
product_url: e.g., "https://www.capterra.com/p/53360/HubSpot-CRM/"
"""
all_reviews = []
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0",
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.9",
}
for page in range(1, max_pages + 1):
url = f"{product_url}?page={page}" if page > 1 else product_url
r = requests.get(url, headers=headers, timeout=15)
if r.status_code != 200:
break
soup = BeautifulSoup(r.text, "html.parser")
# Find review cards
review_cards = soup.find_all("div", {"data-testid": "review-card"})
if not review_cards:
# Try alternative selector
review_cards = soup.find_all("div", class_=lambda c: c and "review" in c.lower())
if not review_cards:
break
for card in review_cards:
rating_el = card.find("span", class_=lambda c: c and "rating" in str(c).lower())
title_el = card.find("h3") or card.find("h2")
pros_el = card.find("div", {"data-testid": "pros"})
cons_el = card.find("div", {"data-testid": "cons"})
all_reviews.append({
"title": title_el.get_text(strip=True) if title_el else "",
"rating": rating_el.get_text(strip=True) if rating_el else "",
"pros": pros_el.get_text(strip=True) if pros_el else "",
"cons": cons_el.get_text(strip=True) if cons_el else "",
"page": page,
})
time.sleep(2) # Be respectful
return all_reviews
reviews = scrape_capterra_reviews("https://www.capterra.com/p/53360/HubSpot-CRM/")
print(f"Scraped {len(reviews)} reviews")
Method 3: Apify Actor (All Platforms in One Run)
The B2B Review Intelligence Actor aggregates G2, Capterra, and Trustpilot in one call:
import requests, time
run = requests.post(
"https://api.apify.com/v2/acts/lanky_quantifier~b2b-review-intelligence/runs",
headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"},
json={
"products": [
{"platform": "g2", "slug": "hubspot-crm"},
{"platform": "capterra", "url": "https://www.capterra.com/p/53360/HubSpot-CRM/"},
{"platform": "trustpilot", "domain": "hubspot.com"},
],
"maxReviewsPerPlatform": 100,
"includeSentimentAnalysis": True
}
).json()["data"]
while True:
status = requests.get(
f"https://api.apify.com/v2/actor-runs/{run['id']}",
headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"}
).json()["data"]["status"]
if status in ("SUCCEEDED", "FAILED"): break
time.sleep(5)
results = requests.get(
f"https://api.apify.com/v2/actor-runs/{run['id']}/dataset/items",
headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"}
).json()
for review in results[:3]:
print(f"[{review['platform']}] ⭐{review['rating']} - {review['title'][:60]}")
print(f" PROS: {review['pros'][:100]}")
print(f" CONS: {review['cons'][:100]}")
Analyzing Review Data with Python
Once you have the data, extract competitive insights:
from collections import Counter
import re
def extract_feature_mentions(reviews, competitor_features):
"""Find which features competitors get praised/criticized for."""
praise = Counter()
complaints = Counter()
for review in reviews:
pros_text = (review.get("pros", "") or "").lower()
cons_text = (review.get("cons", "") or "").lower()
for feature in competitor_features:
if feature.lower() in pros_text:
praise[feature] += 1
if feature.lower() in cons_text:
complaints[feature] += 1
return {
"most_praised": praise.most_common(5),
"most_complained": complaints.most_common(5)
}
# Example: What do users love/hate about HubSpot CRM?
features = ["reporting", "automation", "integrations", "pricing", "support",
"mobile app", "email", "pipeline", "ui", "onboarding"]
insights = extract_feature_mentions(reviews, features)
print("Most praised:", insights["most_praised"])
print("Most complained about:", insights["most_complained"])
Rate Limits and Best Practices
| Platform | Rate Limit Strategy |
|---|---|
| G2 | Max 1 req/sec, rotate user agents |
| Capterra | 2-3 second delays between pages |
| Trustpilot | Uses Cloudflare — use curl_cffi or residential proxies |
For monitoring (running daily), schedule via Apify or cron. For one-time research, run directly.
Key Use Cases
Pre-sales intelligence: Pull competitor reviews before a sales call, find the top 3 complaints to address in your pitch
Product roadmap: Export all feature requests from reviews, cluster by topic, prioritize what users actually want
Review response automation: Alert when new 1-star reviews appear (via webhook), route to support team within minutes
Market positioning: If competitor reviews consistently mention "too expensive" or "too complex" — that's your positioning opportunity
Review data is some of the highest-signal market research available. It's what customers say when they're not trying to be polite.
Save hours on scraping setup: The $29 Apify Scrapers Bundle includes 35+ production-ready actors — Google SERP, LinkedIn, Amazon, TikTok, contact info, and more. Pre-configured inputs, working on day one.
Top comments (0)