DEV Community

Vhub Systems
Vhub Systems

Posted on

How to Scrape G2, Capterra, and Trustpilot Reviews for Competitive Analysis

How to Scrape G2, Capterra, and Trustpilot Reviews for Competitive Analysis

Software buyers check G2 and Capterra before making purchase decisions. If your competitor gets a wave of negative reviews — you want to know immediately. If they're winning on specific features — you need to know what users are saying.

Here's how to extract B2B review data programmatically.

What You Can Extract From Review Platforms

Platform Stars Review Text Pros/Cons Reviewer Role Company Size
G2
Capterra
Trustpilot

Why This Matters for B2B Companies

Before a sales call: Know exactly what prospects complain about in your competitor's reviews. Address those pain points in your pitch.

Product roadmap: Find the most commonly requested features in your category. Real user language, ranked by frequency.

Competitive monitoring: Alert when competitor review volume spikes (could mean a PR issue or big launch).

Market sizing: Count the number of verified reviewers to estimate user base without public metrics.

Method 1: Scraping G2 Reviews (Python + requests)

G2 loads reviews via a GraphQL API that's accessible without authentication for public data:

import requests
import json

def get_g2_reviews(product_slug, page=1, per_page=20):
    """
    Fetch G2 reviews for a product.
    product_slug: e.g., "hubspot-crm", "salesforce", "monday-com"
    """
    url = "https://www.g2.com/graphql"

    query = {
        "query": """
        query ProductReviews($slug: String!, $page: Int!, $perPage: Int!) {
          product(slug: $slug) {
            name
            reviewsList(page: $page, perPage: $perPage) {
              reviews {
                id
                title
                body
                pros
                cons
                starRating
                createdAt
                reviewer {
                  title
                  companySize
                }
              }
              totalCount
            }
          }
        }
        """,
        "variables": {
            "slug": product_slug,
            "page": page,
            "perPage": per_page
        }
    }

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Referer": f"https://www.g2.com/products/{product_slug}/reviews",
    }

    r = requests.post(url, json=query, headers=headers, timeout=20)

    if r.status_code != 200:
        print(f"Error: {r.status_code}")
        return []

    data = r.json()
    product_data = data.get("data", {}).get("product", {})
    reviews_data = product_data.get("reviewsList", {}).get("reviews", [])

    return [{
        "title": rev.get("title"),
        "body": rev.get("body"),
        "pros": rev.get("pros"),
        "cons": rev.get("cons"),
        "rating": rev.get("starRating"),
        "date": rev.get("createdAt"),
        "reviewer_role": rev.get("reviewer", {}).get("title"),
        "company_size": rev.get("reviewer", {}).get("companySize"),
    } for rev in reviews_data]

# Example: Scrape HubSpot CRM reviews
reviews = get_g2_reviews("hubspot-crm", page=1, per_page=20)
for rev in reviews[:3]:
    print(f"{rev['rating']} | {rev['reviewer_role']}")
    print(f"PROS: {rev['pros'][:100]}")
    print(f"CONS: {rev['cons'][:100]}")
    print()
Enter fullscreen mode Exit fullscreen mode

Note: G2's GraphQL schema changes periodically. Test the query against their actual schema if this stops working.

Method 2: Capterra Reviews

Capterra uses a different structure but is equally accessible:

import requests
from bs4 import BeautifulSoup
import time

def scrape_capterra_reviews(product_url, max_pages=5):
    """
    Scrape Capterra reviews for a software product.
    product_url: e.g., "https://www.capterra.com/p/53360/HubSpot-CRM/"
    """
    all_reviews = []

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0",
        "Accept": "text/html,application/xhtml+xml",
        "Accept-Language": "en-US,en;q=0.9",
    }

    for page in range(1, max_pages + 1):
        url = f"{product_url}?page={page}" if page > 1 else product_url
        r = requests.get(url, headers=headers, timeout=15)

        if r.status_code != 200:
            break

        soup = BeautifulSoup(r.text, "html.parser")

        # Find review cards
        review_cards = soup.find_all("div", {"data-testid": "review-card"})
        if not review_cards:
            # Try alternative selector
            review_cards = soup.find_all("div", class_=lambda c: c and "review" in c.lower())

        if not review_cards:
            break

        for card in review_cards:
            rating_el = card.find("span", class_=lambda c: c and "rating" in str(c).lower())
            title_el = card.find("h3") or card.find("h2")
            pros_el = card.find("div", {"data-testid": "pros"})
            cons_el = card.find("div", {"data-testid": "cons"})

            all_reviews.append({
                "title": title_el.get_text(strip=True) if title_el else "",
                "rating": rating_el.get_text(strip=True) if rating_el else "",
                "pros": pros_el.get_text(strip=True) if pros_el else "",
                "cons": cons_el.get_text(strip=True) if cons_el else "",
                "page": page,
            })

        time.sleep(2)  # Be respectful

    return all_reviews

reviews = scrape_capterra_reviews("https://www.capterra.com/p/53360/HubSpot-CRM/")
print(f"Scraped {len(reviews)} reviews")
Enter fullscreen mode Exit fullscreen mode

Method 3: Apify Actor (All Platforms in One Run)

The B2B Review Intelligence Actor aggregates G2, Capterra, and Trustpilot in one call:

import requests, time

run = requests.post(
    "https://api.apify.com/v2/acts/lanky_quantifier~b2b-review-intelligence/runs",
    headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"},
    json={
        "products": [
            {"platform": "g2", "slug": "hubspot-crm"},
            {"platform": "capterra", "url": "https://www.capterra.com/p/53360/HubSpot-CRM/"},
            {"platform": "trustpilot", "domain": "hubspot.com"},
        ],
        "maxReviewsPerPlatform": 100,
        "includeSentimentAnalysis": True
    }
).json()["data"]

while True:
    status = requests.get(
        f"https://api.apify.com/v2/actor-runs/{run['id']}",
        headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"}
    ).json()["data"]["status"]
    if status in ("SUCCEEDED", "FAILED"): break
    time.sleep(5)

results = requests.get(
    f"https://api.apify.com/v2/actor-runs/{run['id']}/dataset/items",
    headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"}
).json()

for review in results[:3]:
    print(f"[{review['platform']}] ⭐{review['rating']} - {review['title'][:60]}")
    print(f"  PROS: {review['pros'][:100]}")
    print(f"  CONS: {review['cons'][:100]}")
Enter fullscreen mode Exit fullscreen mode

Analyzing Review Data with Python

Once you have the data, extract competitive insights:

from collections import Counter
import re

def extract_feature_mentions(reviews, competitor_features):
    """Find which features competitors get praised/criticized for."""
    praise = Counter()
    complaints = Counter()

    for review in reviews:
        pros_text = (review.get("pros", "") or "").lower()
        cons_text = (review.get("cons", "") or "").lower()

        for feature in competitor_features:
            if feature.lower() in pros_text:
                praise[feature] += 1
            if feature.lower() in cons_text:
                complaints[feature] += 1

    return {
        "most_praised": praise.most_common(5),
        "most_complained": complaints.most_common(5)
    }

# Example: What do users love/hate about HubSpot CRM?
features = ["reporting", "automation", "integrations", "pricing", "support", 
            "mobile app", "email", "pipeline", "ui", "onboarding"]

insights = extract_feature_mentions(reviews, features)
print("Most praised:", insights["most_praised"])
print("Most complained about:", insights["most_complained"])
Enter fullscreen mode Exit fullscreen mode

Rate Limits and Best Practices

Platform Rate Limit Strategy
G2 Max 1 req/sec, rotate user agents
Capterra 2-3 second delays between pages
Trustpilot Uses Cloudflare — use curl_cffi or residential proxies

For monitoring (running daily), schedule via Apify or cron. For one-time research, run directly.

Key Use Cases

Pre-sales intelligence: Pull competitor reviews before a sales call, find the top 3 complaints to address in your pitch

Product roadmap: Export all feature requests from reviews, cluster by topic, prioritize what users actually want

Review response automation: Alert when new 1-star reviews appear (via webhook), route to support team within minutes

Market positioning: If competitor reviews consistently mention "too expensive" or "too complex" — that's your positioning opportunity

Review data is some of the highest-signal market research available. It's what customers say when they're not trying to be polite.


Save hours on scraping setup: The $29 Apify Scrapers Bundle includes 35+ production-ready actors — Google SERP, LinkedIn, Amazon, TikTok, contact info, and more. Pre-configured inputs, working on day one.

Get the Bundle ($29) →

Top comments (0)