How to Scrape G2, Capterra, and Trustpilot Reviews for Competitive Analysis

#webscraping #python #marketing

How to Scrape G2, Capterra, and Trustpilot Reviews for Competitive Analysis

Software buyers check G2 and Capterra before making purchase decisions. If your competitor gets a wave of negative reviews — you want to know immediately. If they're winning on specific features — you need to know what users are saying.

Here's how to extract B2B review data programmatically.

What You Can Extract From Review Platforms

Platform	Stars	Review Text	Pros/Cons	Reviewer Role	Company Size
G2	✅	✅	✅	✅	✅
Capterra	✅	✅	✅	✅	✅
Trustpilot	✅	✅	❌	❌	❌

Why This Matters for B2B Companies

Before a sales call: Know exactly what prospects complain about in your competitor's reviews. Address those pain points in your pitch.

Product roadmap: Find the most commonly requested features in your category. Real user language, ranked by frequency.

Competitive monitoring: Alert when competitor review volume spikes (could mean a PR issue or big launch).

Market sizing: Count the number of verified reviewers to estimate user base without public metrics.

Method 1: Scraping G2 Reviews (Python + requests)

G2 loads reviews via a GraphQL API that's accessible without authentication for public data:

import requests
import json

def get_g2_reviews(product_slug, page=1, per_page=20):
    """
    Fetch G2 reviews for a product.
    product_slug: e.g., "hubspot-crm", "salesforce", "monday-com"
    """
    url = "https://www.g2.com/graphql"

    query = {
        "query": """
        query ProductReviews($slug: String!, $page: Int!, $perPage: Int!) {
          product(slug: $slug) {
            name
            reviewsList(page: $page, perPage: $perPage) {
              reviews {
                id
                title
                body
                pros
                cons
                starRating
                createdAt
                reviewer {
                  title
                  companySize
                }
              }
              totalCount
            }
          }
        }
        """,
        "variables": {
            "slug": product_slug,
            "page": page,
            "perPage": per_page
        }
    }

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Referer": f"https://www.g2.com/products/{product_slug}/reviews",
    }

    r = requests.post(url, json=query, headers=headers, timeout=20)

    if r.status_code != 200:
        print(f"Error: {r.status_code}")
        return []

    data = r.json()
    product_data = data.get("data", {}).get("product", {})
    reviews_data = product_data.get("reviewsList", {}).get("reviews", [])

    return [{
        "title": rev.get("title"),
        "body": rev.get("body"),
        "pros": rev.get("pros"),
        "cons": rev.get("cons"),
        "rating": rev.get("starRating"),
        "date": rev.get("createdAt"),
        "reviewer_role": rev.get("reviewer", {}).get("title"),
        "company_size": rev.get("reviewer", {}).get("companySize"),
    } for rev in reviews_data]

# Example: Scrape HubSpot CRM reviews
reviews = get_g2_reviews("hubspot-crm", page=1, per_page=20)
for rev in reviews[:3]:
    print(f"⭐{rev['rating']} | {rev['reviewer_role']}")
    print(f"PROS: {rev['pros'][:100]}")
    print(f"CONS: {rev['cons'][:100]}")
    print()

Note: G2's GraphQL schema changes periodically. Test the query against their actual schema if this stops working.

Method 2: Capterra Reviews

Capterra uses a different structure but is equally accessible:

import requests
from bs4 import BeautifulSoup
import time

def scrape_capterra_reviews(product_url, max_pages=5):
    """
    Scrape Capterra reviews for a software product.
    product_url: e.g., "https://www.capterra.com/p/53360/HubSpot-CRM/"
    """
    all_reviews = []

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0",
        "Accept": "text/html,application/xhtml+xml",
        "Accept-Language": "en-US,en;q=0.9",
    }

    for page in range(1, max_pages + 1):
        url = f"{product_url}?page={page}" if page > 1 else product_url
        r = requests.get(url, headers=headers, timeout=15)

        if r.status_code != 200:
            break

        soup = BeautifulSoup(r.text, "html.parser")

        # Find review cards
        review_cards = soup.find_all("div", {"data-testid": "review-card"})
        if not review_cards:
            # Try alternative selector
            review_cards = soup.find_all("div", class_=lambda c: c and "review" in c.lower())

        if not review_cards:
            break

        for card in review_cards:
            rating_el = card.find("span", class_=lambda c: c and "rating" in str(c).lower())
            title_el = card.find("h3") or card.find("h2")
            pros_el = card.find("div", {"data-testid": "pros"})
            cons_el = card.find("div", {"data-testid": "cons"})

            all_reviews.append({
                "title": title_el.get_text(strip=True) if title_el else "",
                "rating": rating_el.get_text(strip=True) if rating_el else "",
                "pros": pros_el.get_text(strip=True) if pros_el else "",
                "cons": cons_el.get_text(strip=True) if cons_el else "",
                "page": page,
            })

        time.sleep(2)  # Be respectful

    return all_reviews

reviews = scrape_capterra_reviews("https://www.capterra.com/p/53360/HubSpot-CRM/")
print(f"Scraped {len(reviews)} reviews")

Method 3: Apify Actor (All Platforms in One Run)

The B2B Review Intelligence Actor aggregates G2, Capterra, and Trustpilot in one call:

import requests, time

run = requests.post(
    "https://api.apify.com/v2/acts/lanky_quantifier~b2b-review-intelligence/runs",
    headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"},
    json={
        "products": [
            {"platform": "g2", "slug": "hubspot-crm"},
            {"platform": "capterra", "url": "https://www.capterra.com/p/53360/HubSpot-CRM/"},
            {"platform": "trustpilot", "domain": "hubspot.com"},
        ],
        "maxReviewsPerPlatform": 100,
        "includeSentimentAnalysis": True
    }
).json()["data"]

while True:
    status = requests.get(
        f"https://api.apify.com/v2/actor-runs/{run['id']}",
        headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"}
    ).json()["data"]["status"]
    if status in ("SUCCEEDED", "FAILED"): break
    time.sleep(5)

results = requests.get(
    f"https://api.apify.com/v2/actor-runs/{run['id']}/dataset/items",
    headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"}
).json()

for review in results[:3]:
    print(f"[{review['platform']}] ⭐{review['rating']} - {review['title'][:60]}")
    print(f"  PROS: {review['pros'][:100]}")
    print(f"  CONS: {review['cons'][:100]}")

Analyzing Review Data with Python

Once you have the data, extract competitive insights:

from collections import Counter
import re

def extract_feature_mentions(reviews, competitor_features):
    """Find which features competitors get praised/criticized for."""
    praise = Counter()
    complaints = Counter()

    for review in reviews:
        pros_text = (review.get("pros", "") or "").lower()
        cons_text = (review.get("cons", "") or "").lower()

        for feature in competitor_features:
            if feature.lower() in pros_text:
                praise[feature] += 1
            if feature.lower() in cons_text:
                complaints[feature] += 1

    return {
        "most_praised": praise.most_common(5),
        "most_complained": complaints.most_common(5)
    }

# Example: What do users love/hate about HubSpot CRM?
features = ["reporting", "automation", "integrations", "pricing", "support", 
            "mobile app", "email", "pipeline", "ui", "onboarding"]

insights = extract_feature_mentions(reviews, features)
print("Most praised:", insights["most_praised"])
print("Most complained about:", insights["most_complained"])

Rate Limits and Best Practices

Platform	Rate Limit Strategy
G2	Max 1 req/sec, rotate user agents
Capterra	2-3 second delays between pages
Trustpilot	Uses Cloudflare — use curl_cffi or residential proxies

For monitoring (running daily), schedule via Apify or cron. For one-time research, run directly.

Key Use Cases

Pre-sales intelligence: Pull competitor reviews before a sales call, find the top 3 complaints to address in your pitch

Product roadmap: Export all feature requests from reviews, cluster by topic, prioritize what users actually want

Review response automation: Alert when new 1-star reviews appear (via webhook), route to support team within minutes

Market positioning: If competitor reviews consistently mention "too expensive" or "too complex" — that's your positioning opportunity

Review data is some of the highest-signal market research available. It's what customers say when they're not trying to be polite.

Save hours on scraping setup: The $29 Apify Scrapers Bundle includes 35+ production-ready actors — Google SERP, LinkedIn, Amazon, TikTok, contact info, and more. Pre-configured inputs, working on day one.

Get the Bundle ($29) →