Why Collect G2 Reviews Data?
G2 is the world's largest B2B software review marketplace with over 2 million reviews across 100K+ products. For developers building competitive intelligence tools, market research platforms, or sales enablement software, G2 data is gold.
Common use cases:
- Competitive analysis — Track how competitors are rated over time
- Lead generation — Identify companies using specific tools
- Product research — Understand feature gaps from reviewer feedback
- Market sizing — Estimate category adoption trends
API vs Scraping: What's Available?
G2 offers a limited partner API, but it's restricted to G2 customers with enterprise plans. For most developers, scraping is the practical option.
G2's Structure
G2 product pages follow a predictable URL pattern:
https://www.g2.com/products/{product-slug}/reviews
https://www.g2.com/products/{product-slug}/reviews?page={n}
https://www.g2.com/categories/{category-slug}
Building a G2 Reviews Scraper
G2 uses server-side rendering with moderate anti-bot protection. Here's a working approach using Python with ScraperAPI to handle JavaScript rendering and proxy rotation:
import requests
from bs4 import BeautifulSoup
import json
import time
import random
SCRAPER_API_KEY = 'your_api_key' # Get one at scraperapi.com
def scrape_g2_reviews(product_slug, max_pages=5):
all_reviews = []
base_url = f'https://www.g2.com/products/{product_slug}/reviews'
for page in range(1, max_pages + 1):
url = f'{base_url}?page={page}'
# Use ScraperAPI for rendering and proxy rotation
api_url = f'http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&render=true'
response = requests.get(api_url, timeout=60)
if response.status_code != 200:
print(f'Failed on page {page}: {response.status_code}')
break
soup = BeautifulSoup(response.text, 'html.parser')
review_cards = soup.select('[itemprop="review"]')
if not review_cards:
print(f'No reviews found on page {page}, stopping')
break
for card in review_cards:
review = extract_review(card)
if review:
all_reviews.append(review)
print(f'Page {page}: extracted {len(review_cards)} reviews')
time.sleep(random.uniform(3, 7))
return all_reviews
def extract_review(card):
try:
# Rating
rating_el = card.select_one('[class*="star-rating"]')
rating = None
if rating_el:
stars = rating_el.get('class', [])
for cls in stars:
if 'stars-' in cls:
rating = float(cls.split('stars-')[1]) / 2
# Title
title_el = card.select_one('[itemprop="name"]')
title = title_el.get_text(strip=True) if title_el else None
# Review body
likes = card.select_one('[data-test-id="review-likes"]')
dislikes = card.select_one('[data-test-id="review-dislikes"]')
# Reviewer info
reviewer = card.select_one('[itemprop="author"]')
reviewer_name = reviewer.get_text(strip=True) if reviewer else None
# Date
date_el = card.select_one('time')
review_date = date_el.get('datetime') if date_el else None
return {
'rating': rating,
'title': title,
'likes': likes.get_text(strip=True) if likes else None,
'dislikes': dislikes.get_text(strip=True) if dislikes else None,
'reviewer': reviewer_name,
'date': review_date,
}
except Exception as e:
print(f'Error extracting review: {e}')
return None
Extracting Category Data
G2 categories are useful for market research:
def scrape_g2_category(category_slug, max_pages=3):
products = []
for page in range(1, max_pages + 1):
url = f'https://www.g2.com/categories/{category_slug}?page={page}'
api_url = f'http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&render=true'
response = requests.get(api_url, timeout=60)
soup = BeautifulSoup(response.text, 'html.parser')
product_cards = soup.select('[data-test-id="product-card"]')
for card in product_cards:
name = card.select_one('[itemprop="name"]')
rating = card.select_one('[class*="star-rating"]')
review_count = card.select_one('[data-test-id="review-count"]')
description = card.select_one('[itemprop="description"]')
products.append({
'name': name.get_text(strip=True) if name else None,
'rating': rating.get_text(strip=True) if rating else None,
'review_count': review_count.get_text(strip=True) if review_count else None,
'description': description.get_text(strip=True) if description else None,
})
time.sleep(random.uniform(2, 5))
return products
Handling Common Challenges
1. Anti-Bot Detection
G2 uses Cloudflare and behavioral analysis. Using ScraperAPI handles most of this automatically with browser rendering and IP rotation.
2. Pagination
G2 pagination is straightforward with ?page=N parameters. Pages typically contain 10 reviews each.
3. Data Cleaning
import pandas as pd
def clean_reviews(reviews):
df = pd.DataFrame(reviews)
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df = df.dropna(subset=['title'])
return df
4. Structured Output
def export_reviews(reviews, product_name):
df = clean_reviews(reviews)
# CSV for spreadsheets
df.to_csv(f'{product_name}_reviews.csv', index=False)
# JSON for APIs
df.to_json(f'{product_name}_reviews.json', orient='records', indent=2)
# Summary stats
print(f'Total reviews: {len(df)}')
print(f'Average rating: {df["rating"].mean():.2f}')
print(f'Date range: {df["date"].min()} to {df["date"].max()}')
The Quick Way: Use a Managed Scraper
If you need G2 data without building and maintaining a scraper, try the G2 Reviews Scraper on Apify. It handles all the anti-bot challenges, provides structured JSON output, and supports scheduling for ongoing data collection.
Use Cases in Practice
Competitive Dashboard
competitors = ['slack', 'microsoft-teams', 'discord']
all_data = {}
for product in competitors:
reviews = scrape_g2_reviews(product, max_pages=10)
all_data[product] = clean_reviews(reviews)
print(f'{product}: {len(reviews)} reviews, avg {all_data[product]["rating"].mean():.1f}')
Sentiment Tracking
Combine G2 review text with NLP libraries like TextBlob or spaCy to track sentiment trends over time.
Conclusion
G2 review data is essential for B2B competitive intelligence. While G2's API is limited to enterprise partners, Python scraping with proper proxy support (ScraperAPI recommended) gets the job done reliably. For production-grade data collection, the G2 Reviews Scraper on Apify provides a managed alternative.
Remember to scrape responsibly — add delays, respect rate limits, and only collect data you have a legitimate use for.
Top comments (0)