G2.com is the world's largest software review marketplace, hosting millions of verified user reviews across thousands of software categories. For product managers, marketers, sales teams, and competitive analysts, the data locked inside G2 is incredibly valuable — but manually browsing through hundreds of product pages is impractical.
In this guide, we'll explore how to scrape G2 reviews programmatically, covering the platform's structure, extraction techniques using Python and Node.js, and how to scale your data collection using Apify.
Understanding G2.com's Structure
Before writing any scraping code, you need to understand how G2 organizes its data. G2 has several distinct page types, each containing different data points.
Product Pages
Every software product on G2 has a dedicated page (e.g., g2.com/products/slack/reviews). A product page contains:
- Overall star rating — aggregate score from all reviews (out of 5)
- Total review count — how many users have reviewed the product
- Rating breakdown — distribution across 5-star to 1-star ratings
- Satisfaction scores — Ease of Use, Quality of Support, Ease of Setup, etc.
- Product description — vendor-provided overview
- Pricing information — when available
- Feature list — categorized feature descriptions
- Comparison links — "vs" pages for competitor comparisons
Individual Reviews
Each review on G2 is structured with rich metadata:
- Star rating — the reviewer's overall score
- Review title and body text
- Pros and Cons — clearly separated sections
- User demographics — company size, industry, role, region
- Verified status — whether the review was authenticated
- Review date — when it was submitted
- Helpful votes — community engagement signals
- Product usage duration — how long the reviewer has used the software
Category Pages
G2 organizes software into categories (e.g., g2.com/categories/crm). Each category page lists:
- Category leader grid — G2's quadrant ranking
- All products in the category — with summary ratings
- Subcategories — more granular groupings
- Market statistics — average ratings, review counts
Comparison Pages
G2 generates comparison pages (e.g., g2.com/compare/slack-vs-microsoft-teams) with:
- Side-by-side ratings
- Feature comparison tables
- Reviewer sentiment comparison
- Pricing comparison — when available
Setting Up Your Environment
Python Dependencies
pip install requests beautifulsoup4 apify-client pandas
Node.js Dependencies
npm install axios cheerio apify-client
Method 1: Scraping G2 Product Pages
G2 pages are server-rendered, making them accessible with simple HTTP requests. However, they do implement rate limiting and bot detection, so you'll need proper headers and potentially proxies.
Python Product Scraper
import requests
from bs4 import BeautifulSoup
import json
import time
class G2ProductScraper:
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
})
def scrape_product(self, product_slug):
"""Scrape a G2 product page for ratings and metadata."""
url = f"https://www.g2.com/products/{product_slug}/reviews"
response = self.session.get(url)
if response.status_code != 200:
print(f"Failed to fetch {url}: {response.status_code}")
return None
soup = BeautifulSoup(response.text, 'html.parser')
product_data = {
'slug': product_slug,
'url': url,
'name': self._extract_name(soup),
'overall_rating': self._extract_rating(soup),
'review_count': self._extract_review_count(soup),
'rating_breakdown': self._extract_rating_breakdown(soup),
'satisfaction_scores': self._extract_satisfaction(soup),
}
return product_data
def _extract_name(self, soup):
title = soup.find('h1')
return title.text.strip() if title else 'Unknown'
def _extract_rating(self, soup):
rating_el = soup.find('span', class_='fw-semibold')
if rating_el:
try:
return float(rating_el.text.strip())
except ValueError:
pass
return None
def _extract_review_count(self, soup):
count_el = soup.find('span', attrs={'itemprop': 'reviewCount'})
if count_el:
try:
return int(count_el.text.strip().replace(',', ''))
except ValueError:
pass
return 0
def _extract_rating_breakdown(self, soup):
breakdown = {}
stars_section = soup.find_all('div', class_='rating-bar')
for i, bar in enumerate(stars_section[:5], 1):
count_el = bar.find('span', class_='count')
if count_el:
breakdown[f'{6-i}_star'] = int(count_el.text.strip().replace(',', ''))
return breakdown
def _extract_satisfaction(self, soup):
scores = {}
satisfaction_items = soup.find_all('div', class_='satisfaction-score')
for item in satisfaction_items:
label = item.find('span', class_='label')
value = item.find('span', class_='value')
if label and value:
scores[label.text.strip()] = value.text.strip()
return scores
# Usage
scraper = G2ProductScraper()
product = scraper.scrape_product('slack')
if product:
print(json.dumps(product, indent=2))
Node.js Product Scraper
const axios = require('axios');
const cheerio = require('cheerio');
class G2ProductScraper {
constructor() {
this.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
};
}
async scrapeProduct(productSlug) {
const url = `https://www.g2.com/products/${productSlug}/reviews`;
try {
const response = await axios.get(url, { headers: this.headers });
const $ = cheerio.load(response.data);
return {
slug: productSlug,
url: url,
name: $('h1').first().text().trim(),
overallRating: this.extractRating($),
reviewCount: this.extractReviewCount($),
ratingBreakdown: this.extractBreakdown($),
};
} catch (error) {
console.error(`Failed to scrape ${productSlug}: ${error.message}`);
return null;
}
}
extractRating($) {
const ratingText = $('span.fw-semibold').first().text().trim();
return parseFloat(ratingText) || null;
}
extractReviewCount($) {
const countText = $('[itemprop="reviewCount"]').text().trim();
return parseInt(countText.replace(/,/g, ''), 10) || 0;
}
extractBreakdown($) {
const breakdown = {};
$('.rating-bar').each((i, el) => {
const count = $(el).find('.count').text().trim();
if (count && i < 5) {
breakdown[`${5 - i}_star`] = parseInt(count.replace(/,/g, ''), 10);
}
});
return breakdown;
}
}
// Usage
(async () => {
const scraper = new G2ProductScraper();
const product = await scraper.scrapeProduct('slack');
console.log(JSON.stringify(product, null, 2));
})();
Method 2: Extracting Individual Reviews
The real value lies in extracting individual review text, sentiment, and demographics. Here's how to paginate through all reviews for a product:
class G2ReviewExtractor:
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
})
def extract_reviews(self, product_slug, max_pages=10):
"""Extract all reviews for a product, paginating through results."""
all_reviews = []
for page in range(1, max_pages + 1):
url = f"https://www.g2.com/products/{product_slug}/reviews?page={page}"
response = self.session.get(url)
if response.status_code != 200:
break
soup = BeautifulSoup(response.text, 'html.parser')
reviews = self._parse_reviews(soup)
if not reviews:
break
all_reviews.extend(reviews)
print(f"Page {page}: {len(reviews)} reviews (total: {len(all_reviews)})")
time.sleep(2) # Respectful rate limiting
return all_reviews
def _parse_reviews(self, soup):
reviews = []
review_cards = soup.find_all('div', attrs={'itemprop': 'review'})
for card in review_cards:
review = {}
# Star rating
rating_el = card.find('span', class_='star-rating')
if rating_el:
stars = rating_el.find_all('svg', class_='filled')
review['rating'] = len(stars) if stars else None
# Review title
title_el = card.find('h3', class_='review-title')
review['title'] = title_el.text.strip() if title_el else ''
# Pros
pros_section = card.find('div', attrs={'data-testid': 'pros'})
review['pros'] = pros_section.text.strip() if pros_section else ''
# Cons
cons_section = card.find('div', attrs={'data-testid': 'cons'})
review['cons'] = cons_section.text.strip() if cons_section else ''
# Reviewer info
reviewer_el = card.find('span', class_='reviewer-name')
review['reviewer'] = reviewer_el.text.strip() if reviewer_el else 'Anonymous'
# Company size
company_el = card.find('span', class_='company-size')
review['company_size'] = company_el.text.strip() if company_el else ''
# Industry
industry_el = card.find('span', class_='industry')
review['industry'] = industry_el.text.strip() if industry_el else ''
# Date
date_el = card.find('time')
review['date'] = date_el.get('datetime', '') if date_el else ''
# Verified
verified_el = card.find('span', class_='verified')
review['verified'] = verified_el is not None
reviews.append(review)
return reviews
# Usage
extractor = G2ReviewExtractor()
reviews = extractor.extract_reviews('slack', max_pages=5)
print(f"\nExtracted {len(reviews)} total reviews")
# Analyze sentiment distribution
from collections import Counter
ratings = Counter(r.get('rating') for r in reviews if r.get('rating'))
for stars in sorted(ratings.keys(), reverse=True):
bar = '#' * ratings[stars]
print(f" {stars} stars: {bar} ({ratings[stars]})")
Method 3: Scraping Category Rankings
Category pages reveal market positioning and competitive landscapes:
def scrape_category(category_slug):
"""Scrape a G2 category page for product rankings."""
url = f"https://www.g2.com/categories/{category_slug}"
session = requests.Session()
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
})
response = session.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
products = []
product_cards = soup.find_all('div', class_='product-card')
for card in product_cards:
product = {}
name_el = card.find('a', class_='product-name')
product['name'] = name_el.text.strip() if name_el else ''
product['url'] = name_el.get('href', '') if name_el else ''
rating_el = card.find('span', class_='rating')
product['rating'] = float(rating_el.text.strip()) if rating_el else None
count_el = card.find('span', class_='review-count')
if count_el:
count_text = count_el.text.strip().replace('(', '').replace(')', '').replace(',', '')
product['review_count'] = int(count_text) if count_text.isdigit() else 0
products.append(product)
return products
# Example: Scrape CRM category
crm_products = scrape_category('crm')
print(f"Found {len(crm_products)} CRM products on G2")
for i, product in enumerate(crm_products[:10], 1):
print(f"{i}. {product['name']} - {product['rating']} ({product['review_count']} reviews)")
Method 4: Comparison Data Extraction
G2's comparison pages are a goldmine for competitive analysis:
const axios = require('axios');
const cheerio = require('cheerio');
async function scrapeComparison(product1, product2) {
const url = `https://www.g2.com/compare/${product1}-vs-${product2}`;
const response = await axios.get(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
},
});
const $ = cheerio.load(response.data);
const comparison = {
products: [product1, product2],
url: url,
ratings: {},
features: [],
reviewerPreference: {},
};
// Extract side-by-side ratings
$('.comparison-rating').each((i, el) => {
const label = $(el).find('.label').text().trim();
const scores = [];
$(el).find('.score').each((j, scoreEl) => {
scores.push(parseFloat($(scoreEl).text().trim()));
});
comparison.ratings[label] = scores;
});
// Extract feature comparison
$('.feature-row').each((i, el) => {
const feature = $(el).find('.feature-name').text().trim();
const product1Has = $(el).find('.product-1 .check').length > 0;
const product2Has = $(el).find('.product-2 .check').length > 0;
comparison.features.push({ feature, [product1]: product1Has, [product2]: product2Has });
});
return comparison;
}
// Usage
(async () => {
const comp = await scrapeComparison('slack', 'microsoft-teams');
console.log(JSON.stringify(comp, null, 2));
})();
Scaling G2 Scraping with Apify
For large-scale G2 data extraction, Apify provides the infrastructure to handle anti-bot measures, proxy rotation, and parallel execution:
from apify_client import ApifyClient
def scrape_g2_at_scale(product_slugs, max_reviews_per_product=100):
"""
Use Apify to scrape multiple G2 products at scale.
"""
client = ApifyClient("YOUR_APIFY_TOKEN")
all_results = {}
for slug in product_slugs:
run_input = {
"productUrl": f"https://www.g2.com/products/{slug}/reviews",
"maxReviews": max_reviews_per_product,
"includeReviewerDetails": True,
"includeRatingBreakdown": True,
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"],
},
}
run = client.actor("apify/g2-reviews-scraper").call(run_input=run_input)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
all_results[slug] = items
print(f"Scraped {len(items)} reviews for {slug}")
return all_results
# Scrape multiple competitors
products = ['slack', 'microsoft-teams', 'discord', 'zoom']
results = scrape_g2_at_scale(products, max_reviews_per_product=50)
Analyzing Extracted Review Data
Once you have the raw data, here's how to extract actionable insights:
import pandas as pd
from collections import Counter
def analyze_g2_reviews(reviews, product_name):
"""Comprehensive analysis of G2 review data."""
df = pd.DataFrame(reviews)
print(f"\n{'='*60}")
print(f"ANALYSIS: {product_name}")
print(f"{'='*60}")
# Rating distribution
if 'rating' in df.columns:
print(f"\nRating Distribution:")
print(f" Average: {df['rating'].mean():.2f}")
print(f" Median: {df['rating'].median():.1f}")
for rating in range(5, 0, -1):
count = len(df[df['rating'] == rating])
pct = count / len(df) * 100
bar = '#' * int(pct / 2)
print(f" {rating} star: {bar} {count} ({pct:.1f}%)")
# Company size distribution
if 'company_size' in df.columns:
print(f"\nReviewer Company Size:")
sizes = df['company_size'].value_counts()
for size, count in sizes.items():
if size:
print(f" {size}: {count}")
# Industry breakdown
if 'industry' in df.columns:
print(f"\nTop Industries:")
industries = df['industry'].value_counts().head(10)
for industry, count in industries.items():
if industry:
print(f" {industry}: {count}")
# Common themes in pros/cons
if 'pros' in df.columns:
print(f"\nMost mentioned in PROS:")
pros_words = extract_key_phrases(df['pros'].dropna().tolist())
for phrase, count in pros_words[:10]:
print(f" '{phrase}': mentioned {count} times")
if 'cons' in df.columns:
print(f"\nMost mentioned in CONS:")
cons_words = extract_key_phrases(df['cons'].dropna().tolist())
for phrase, count in cons_words[:10]:
print(f" '{phrase}': mentioned {count} times")
def extract_key_phrases(texts):
"""Simple keyword frequency analysis."""
word_freq = Counter()
stop_words = {'the', 'a', 'an', 'in', 'on', 'at', 'to', 'for', 'of', 'and',
'is', 'are', 'it', 'that', 'this', 'with', 'was', 'be', 'has',
'have', 'not', 'but', 'can', 'very', 'from', 'they', 'you', 'we'}
for text in texts:
words = text.lower().split()
for word in words:
word = word.strip('.,!?()[]{}":;')
if word not in stop_words and len(word) > 3:
word_freq[word] += 1
return word_freq.most_common(20)
# Example usage
analyze_g2_reviews(reviews, "Slack")
Building a Competitive Intelligence Dashboard
Combine all the scraped data into a competitive analysis framework:
def competitive_analysis(products_data):
"""Generate a competitive comparison report from G2 data."""
print("\n" + "=" * 80)
print("COMPETITIVE INTELLIGENCE REPORT")
print("=" * 80)
# Comparison table
print(f"\n{'Product':<20} {'Rating':<10} {'Reviews':<12} {'Satisfaction':<15}")
print("-" * 60)
for product in products_data:
name = product.get('name', 'Unknown')[:19]
rating = product.get('overall_rating', 'N/A')
reviews = product.get('review_count', 0)
satisfaction = product.get('satisfaction_scores', {})
ease = satisfaction.get('Ease of Use', 'N/A')
print(f"{name:<20} {str(rating):<10} {str(reviews):<12} {ease:<15}")
# Strengths and weaknesses
for product in products_data:
print(f"\n--- {product.get('name', 'Unknown')} ---")
breakdown = product.get('rating_breakdown', {})
if breakdown:
total = sum(breakdown.values())
five_star_pct = breakdown.get('5_star', 0) / total * 100 if total > 0 else 0
one_star_pct = breakdown.get('1_star', 0) / total * 100 if total > 0 else 0
print(f" 5-star rate: {five_star_pct:.1f}%")
print(f" 1-star rate: {one_star_pct:.1f}%")
if five_star_pct > 60:
print(f" STRENGTH: High satisfaction ({five_star_pct:.0f}% give 5 stars)")
if one_star_pct > 10:
print(f" WARNING: Notable dissatisfaction ({one_star_pct:.0f}% give 1 star)")
User Demographic Analysis
Understanding who reviews a product reveals its actual user base:
def demographic_analysis(reviews):
"""Analyze the demographic makeup of a product's reviewers."""
# Company size segments
size_segments = {
'Enterprise (1000+)': 0,
'Mid-Market (51-1000)': 0,
'Small Business (1-50)': 0,
}
for review in reviews:
size = review.get('company_size', '')
if '1000' in size or 'enterprise' in size.lower():
size_segments['Enterprise (1000+)'] += 1
elif '50' in size or 'mid' in size.lower():
size_segments['Mid-Market (51-1000)'] += 1
else:
size_segments['Small Business (1-50)'] += 1
total = sum(size_segments.values())
print("\nCompany Size Distribution:")
for segment, count in size_segments.items():
pct = count / total * 100 if total > 0 else 0
bar = '#' * int(pct / 2)
print(f" {segment}: {bar} {pct:.1f}%")
# Satisfaction by segment
print("\nRating by Company Size:")
for segment_name in size_segments:
segment_reviews = [r for r in reviews if segment_name.split('(')[0].strip().lower() in r.get('company_size', '').lower()]
if segment_reviews:
avg_rating = sum(r.get('rating', 0) for r in segment_reviews) / len(segment_reviews)
print(f" {segment_name}: {avg_rating:.2f} avg rating")
Best Practices for G2 Scraping
Implement rate limiting — G2 actively detects automated access. Space your requests 2-5 seconds apart for direct scraping, or use Apify's managed proxies.
Use residential proxies — G2's anti-bot measures are sophisticated. Datacenter IPs get blocked quickly.
Respect the Terms of Service — Review G2's ToS regarding automated data collection. Use the data responsibly and within legal boundaries.
Cache aggressively — G2 reviews don't change frequently. Cache product pages for 24 hours and individual reviews for a week.
Handle pagination carefully — G2 loads reviews in pages of 10-25. Don't skip pages or you'll miss data.
Validate data quality — Check for empty fields, malformed ratings, and duplicate reviews. G2 occasionally changes its HTML structure.
Monitor for structural changes — Set up alerts when your scraper returns empty results or unusual data patterns.
Store data efficiently — Use structured formats (JSON, CSV, or a database) with consistent schemas for easier analysis later.
Conclusion
G2 review data is one of the most valuable sources of competitive intelligence in the B2B software market. By scraping product pages, individual reviews, category rankings, and comparison tables, you can build a comprehensive understanding of your market landscape.
Start with single product scraping to validate your approach, then scale to category-wide extraction using Apify's infrastructure. The combination of structured review data with demographic analysis gives you insights that no amount of manual browsing could match.
Whether you're tracking competitor sentiment, identifying market gaps, or understanding your own product's perception, automated G2 scraping transforms scattered review data into actionable competitive intelligence.
Remember to scrape responsibly, respect rate limits, and use the data ethically. The goal is insight, not disruption — and the best insights come from clean, well-structured data collected at a sustainable pace.
Top comments (0)