DEV Community

agenthustler
agenthustler

Posted on

TrustRadius Scraping: Extract B2B Software Reviews and Verified User Data

Web scraping B2B software review platforms like TrustRadius opens the door to a goldmine of verified user opinions, product comparisons, and scoring data. Whether you're doing competitive intelligence, market research, or building a SaaS comparison tool, extracting structured data from TrustRadius gives you an edge that manual research can't match.

In this comprehensive guide, we'll walk through TrustRadius's data structure, what makes it unique among review platforms, and how to build scrapers that extract verified reviews, product scores, comparison tables, and category data — all using Python, Node.js, and Apify's cloud scraping infrastructure.

Why TrustRadius Data Matters for B2B Research

TrustRadius is not your average review site. Unlike G2 or Capterra, TrustRadius enforces strict verification on every reviewer. Each review goes through a multi-step authentication process that confirms the reviewer actually uses the product. This means the data you extract carries significantly more weight in analysis.

Here's what makes TrustRadius data valuable:

  • Verified reviews only: Every reviewer is authenticated via LinkedIn, corporate email, or other verification methods
  • No paid placements: TrustRadius doesn't let vendors buy their way to the top of rankings
  • TrustMaps: Proprietary scoring methodology that combines user satisfaction with research frequency
  • Detailed comparison tables: Feature-by-feature breakdowns between competing products
  • Buyer intent signals: Data about which products are being actively researched

For B2B marketers, product managers, and competitive intelligence teams, this data is invaluable — but manually collecting it is painfully slow.

Understanding TrustRadius Page Structure

Before writing any scraping code, you need to understand how TrustRadius organizes its content. The site follows a hierarchical structure that maps well to scraping workflows.

Product Pages

Each software product has a dedicated page at a URL like trustradius.com/products/[product-slug]/reviews. These pages contain:

  • Product overview: Name, vendor, description, deployment options, pricing tier
  • TrustRadius score (trScore): An algorithmic rating from 0-10 based on verified reviews
  • Review count: Total number of verified reviews
  • Review breakdown: Distribution of ratings across categories
  • Individual reviews: Full-text reviews with pros, cons, and detailed ratings

Category Pages

TrustRadius organizes products into categories like CRM, Marketing Automation, Project Management, etc. Category pages at trustradius.com/[category-slug] show:

  • Product listings: All products in that category with summary scores
  • TrustMaps: Visual quadrant charts mapping satisfaction vs. research frequency
  • Comparison links: Quick links to head-to-head product comparisons
  • Filter options: Company size, industry, use case filters

Comparison Pages

One of TrustRadius's most valuable features is the comparison page at trustradius.com/compare-products/[product-a]-vs-[product-b]. These contain:

  • Side-by-side ratings: Category-by-category score comparison
  • Feature comparison tables: Detailed feature availability matrices
  • Review excerpts: Relevant review snippets for each product
  • User demographics: Breakdown of reviewer company sizes and industries

Setting Up Your Scraping Environment

Let's start with the practical setup. We'll build scrapers in both Python and Node.js.

Python Setup

# requirements.txt
requests==2.31.0
beautifulsoup4==4.12.3
lxml==5.1.0
pandas==2.2.0

# Install dependencies
# pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Node.js Setup

// package.json dependencies
// npm install cheerio axios puppeteer
const cheerio = require('cheerio');
const axios = require('axios');
Enter fullscreen mode Exit fullscreen mode

Extracting Product Reviews with Python

Here's a complete Python scraper for extracting TrustRadius product reviews:

import requests
from bs4 import BeautifulSoup
import json
import time
import csv

class TrustRadiusScraper:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                          'AppleWebKit/537.36 (KHTML, like Gecko) '
                          'Chrome/120.0.0.0 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,'
                      'application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
        })
        self.base_url = 'https://www.trustradius.com'

    def get_product_reviews(self, product_slug, max_pages=5):
        """Extract all reviews for a given product."""
        reviews = []

        for page in range(1, max_pages + 1):
            url = (f'{self.base_url}/products/'
                   f'{product_slug}/reviews?page={page}')
            print(f'Scraping page {page}: {url}')

            response = self.session.get(url)
            if response.status_code != 200:
                print(f'Failed to fetch page {page}')
                break

            soup = BeautifulSoup(response.text, 'lxml')
            review_cards = soup.select('[data-testid="review-card"]')

            if not review_cards:
                # Try alternative selectors
                review_cards = soup.select('.review-card, '
                                          '.review-item, '
                                          '[class*="ReviewCard"]')

            if not review_cards:
                print(f'No reviews found on page {page}')
                break

            for card in review_cards:
                review = self.parse_review_card(card)
                if review:
                    reviews.append(review)

            # Respect rate limits
            time.sleep(2)

        return reviews

    def parse_review_card(self, card):
        """Parse individual review card HTML into structured data."""
        review = {}

        # Reviewer info
        reviewer_el = card.select_one(
            '[class*="reviewer"], .reviewer-name'
        )
        review['reviewer_name'] = (
            reviewer_el.get_text(strip=True) if reviewer_el
            else 'Anonymous'
        )

        # Company and role
        role_el = card.select_one(
            '[class*="role"], .reviewer-title'
        )
        review['reviewer_role'] = (
            role_el.get_text(strip=True) if role_el else ''
        )

        company_el = card.select_one(
            '[class*="company"], .reviewer-company'
        )
        review['company'] = (
            company_el.get_text(strip=True) if company_el else ''
        )

        # Rating
        rating_el = card.select_one(
            '[class*="rating"], [class*="score"]'
        )
        if rating_el:
            rating_text = rating_el.get_text(strip=True)
            try:
                review['rating'] = float(
                    rating_text.replace('/10', '').strip()
                )
            except ValueError:
                review['rating'] = None

        # Review text sections
        pros_el = card.select_one(
            '[class*="pros"], [data-section="pros"]'
        )
        review['pros'] = (
            pros_el.get_text(strip=True) if pros_el else ''
        )

        cons_el = card.select_one(
            '[class*="cons"], [data-section="cons"]'
        )
        review['cons'] = (
            cons_el.get_text(strip=True) if cons_el else ''
        )

        # Verification badge
        verified_el = card.select_one(
            '[class*="verified"], [class*="badge"]'
        )
        review['is_verified'] = verified_el is not None

        # Date
        date_el = card.select_one(
            'time, [class*="date"]'
        )
        review['review_date'] = (
            date_el.get_text(strip=True) if date_el else ''
        )

        return review

    def get_product_info(self, product_slug):
        """Extract product overview information."""
        url = f'{self.base_url}/products/{product_slug}/reviews'
        response = self.session.get(url)
        soup = BeautifulSoup(response.text, 'lxml')

        product = {}

        # Product name
        name_el = soup.select_one('h1, [class*="productName"]')
        product['name'] = (
            name_el.get_text(strip=True) if name_el else ''
        )

        # trScore
        score_el = soup.select_one(
            '[class*="trScore"], [class*="trust-score"]'
        )
        if score_el:
            try:
                product['tr_score'] = float(
                    score_el.get_text(strip=True)
                )
            except ValueError:
                product['tr_score'] = None

        # Review count
        count_el = soup.select_one(
            '[class*="reviewCount"], [class*="review-count"]'
        )
        if count_el:
            count_text = count_el.get_text(strip=True)
            product['review_count'] = int(
                ''.join(filter(str.isdigit, count_text))
            ) if any(c.isdigit() for c in count_text) else 0

        # Categories
        category_els = soup.select(
            '[class*="category"] a, .breadcrumb a'
        )
        product['categories'] = [
            el.get_text(strip=True) for el in category_els
        ]

        return product

    def export_to_csv(self, reviews, filename='trustradius_reviews.csv'):
        """Export scraped reviews to CSV."""
        if not reviews:
            print('No reviews to export')
            return

        keys = reviews[0].keys()
        with open(filename, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=keys)
            writer.writeheader()
            writer.writerows(reviews)
        print(f'Exported {len(reviews)} reviews to {filename}')


# Usage
scraper = TrustRadiusScraper()
product_info = scraper.get_product_info('salesforce-crm')
print(json.dumps(product_info, indent=2))

reviews = scraper.get_product_reviews('salesforce-crm', max_pages=3)
scraper.export_to_csv(reviews)
Enter fullscreen mode Exit fullscreen mode

Node.js Approach with Puppeteer

TrustRadius uses client-side rendering for some content, so a headless browser approach is sometimes necessary:

const puppeteer = require('puppeteer');
const fs = require('fs');

class TrustRadiusNodeScraper {
    constructor() {
        this.browser = null;
        this.page = null;
    }

    async init() {
        this.browser = await puppeteer.launch({
            headless: 'new',
            args: ['--no-sandbox', '--disable-setuid-sandbox']
        });
        this.page = await this.browser.newPage();
        await this.page.setUserAgent(
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
            'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
        );
    }

    async scrapeProductReviews(productSlug, maxPages = 5) {
        const reviews = [];
        const baseUrl = 'https://www.trustradius.com';

        for (let page = 1; page <= maxPages; page++) {
            const url = `${baseUrl}/products/${productSlug}`
                      + `/reviews?page=${page}`;
            console.log(`Scraping: ${url}`);

            await this.page.goto(url, {
                waitUntil: 'networkidle2',
                timeout: 30000
            });

            // Wait for review content to load
            await this.page.waitForSelector(
                '[class*="review"], [data-testid*="review"]',
                { timeout: 10000 }
            ).catch(() => {
                console.log('No review elements found');
            });

            const pageReviews = await this.page.evaluate(() => {
                const cards = document.querySelectorAll(
                    '[class*="ReviewCard"], ' +
                    '[data-testid="review-card"], ' +
                    '.review-card'
                );
                const results = [];

                cards.forEach(card => {
                    const getText = (sel) => {
                        const el = card.querySelector(sel);
                        return el
                            ? el.textContent.trim()
                            : '';
                    };

                    results.push({
                        reviewer: getText(
                            '[class*="reviewer"], .reviewer-name'
                        ),
                        role: getText(
                            '[class*="role"], .reviewer-title'
                        ),
                        company: getText(
                            '[class*="company"]'
                        ),
                        rating: getText(
                            '[class*="rating"], [class*="score"]'
                        ),
                        pros: getText(
                            '[class*="pros"]'
                        ),
                        cons: getText(
                            '[class*="cons"]'
                        ),
                        date: getText(
                            'time, [class*="date"]'
                        ),
                        verified: !!card.querySelector(
                            '[class*="verified"]'
                        )
                    });
                });

                return results;
            });

            reviews.push(...pageReviews);

            // Rate limiting delay
            await new Promise(r => setTimeout(r, 2000));
        }

        return reviews;
    }

    async scrapeComparison(productA, productB) {
        const url = 'https://www.trustradius.com/compare-products/'
                  + `${productA}-vs-${productB}`;
        console.log(`Scraping comparison: ${url}`);

        await this.page.goto(url, {
            waitUntil: 'networkidle2'
        });

        const comparison = await this.page.evaluate(() => {
            const data = { features: [], ratings: [] };

            // Extract feature comparison rows
            const featureRows = document.querySelectorAll(
                '[class*="featureRow"], ' +
                '[class*="comparison-row"], ' +
                'tr[class*="feature"]'
            );

            featureRows.forEach(row => {
                const cells = row.querySelectorAll('td, [class*="cell"]');
                if (cells.length >= 3) {
                    data.features.push({
                        feature: cells[0].textContent.trim(),
                        productA: cells[1].textContent.trim(),
                        productB: cells[2].textContent.trim()
                    });
                }
            });

            // Extract rating comparisons
            const ratingRows = document.querySelectorAll(
                '[class*="ratingCompare"], ' +
                '[class*="score-row"]'
            );

            ratingRows.forEach(row => {
                const label = row.querySelector(
                    '[class*="label"], [class*="category"]'
                );
                const scores = row.querySelectorAll(
                    '[class*="score"], [class*="rating"]'
                );

                if (label && scores.length >= 2) {
                    data.ratings.push({
                        category: label.textContent.trim(),
                        scoreA: scores[0].textContent.trim(),
                        scoreB: scores[1].textContent.trim()
                    });
                }
            });

            return data;
        });

        return comparison;
    }

    async close() {
        if (this.browser) {
            await this.browser.close();
        }
    }
}

// Usage
(async () => {
    const scraper = new TrustRadiusNodeScraper();
    await scraper.init();

    try {
        const reviews = await scraper.scrapeProductReviews(
            'hubspot-crm', 3
        );
        console.log(`Found ${reviews.length} reviews`);

        const comparison = await scraper.scrapeComparison(
            'salesforce-crm', 'hubspot-crm'
        );
        console.log('Comparison data:', comparison);

        fs.writeFileSync(
            'trustradius_data.json',
            JSON.stringify({ reviews, comparison }, null, 2)
        );
    } finally {
        await scraper.close();
    }
})();
Enter fullscreen mode Exit fullscreen mode

Scaling with Apify Cloud Infrastructure

While local scrapers work for small jobs, production-scale TrustRadius scraping needs cloud infrastructure. Apify provides managed actors that handle proxy rotation, browser management, and anti-bot protection.

Why Use Apify for TrustRadius Scraping

  1. Managed proxy pools: TrustRadius has rate limiting. Apify's residential proxies rotate IPs automatically.
  2. Browser fingerprinting: Apify's browser pool randomizes fingerprints to avoid detection.
  3. Scheduling: Set up recurring scrapes to track review changes over time.
  4. Storage: Results go directly to Apify datasets — no database management needed.
  5. Scalability: Run hundreds of concurrent browser instances without managing servers.

Building an Apify Actor for TrustRadius

const { Actor } = require('apify');
const { PuppeteerCrawler } = require('crawlee');

Actor.main(async () => {
    const input = await Actor.getInput();
    const {
        productSlugs = ['salesforce-crm'],
        maxPagesPerProduct = 5,
        includeComparisons = false
    } = input;

    const crawler = new PuppeteerCrawler({
        maxConcurrency: 3,
        navigationTimeoutSecs: 60,
        requestHandlerTimeoutSecs: 120,

        launchContext: {
            launchOptions: {
                headless: true,
                args: ['--no-sandbox']
            }
        },

        async requestHandler({ request, page, log }) {
            const { label, productSlug } = request.userData;

            if (label === 'PRODUCT_REVIEWS') {
                log.info(`Processing reviews: ${productSlug}`);

                await page.waitForSelector(
                    '[class*="review"], [class*="Review"]',
                    { timeout: 15000 }
                ).catch(() => log.warning('No review selector'));

                // Auto-scroll to load lazy content
                await autoScroll(page);

                const reviews = await page.evaluate(() => {
                    const cards = document.querySelectorAll(
                        '[class*="ReviewCard"], .review-card'
                    );
                    return Array.from(cards).map(card => {
                        const getText = (s) => {
                            const el = card.querySelector(s);
                            return el
                                ? el.textContent.trim()
                                : '';
                        };
                        return {
                            reviewer: getText('[class*="reviewer"]'),
                            role: getText('[class*="role"]'),
                            company: getText('[class*="company"]'),
                            rating: getText('[class*="score"]'),
                            pros: getText('[class*="pros"]'),
                            cons: getText('[class*="cons"]'),
                            verified: !!card.querySelector(
                                '[class*="verified"]'
                            ),
                            date: getText('time')
                        };
                    });
                });

                for (const review of reviews) {
                    await Actor.pushData({
                        ...review,
                        product: productSlug,
                        source: 'trustradius',
                        scrapedAt: new Date().toISOString()
                    });
                }
            }
        }
    });

    // Build request list
    const requests = [];
    for (const slug of productSlugs) {
        for (let p = 1; p <= maxPagesPerProduct; p++) {
            requests.push({
                url: 'https://www.trustradius.com/products/'
                   + `${slug}/reviews?page=${p}`,
                userData: {
                    label: 'PRODUCT_REVIEWS',
                    productSlug: slug
                }
            });
        }
    }

    await crawler.run(requests);
});

async function autoScroll(page) {
    await page.evaluate(async () => {
        await new Promise((resolve) => {
            let totalHeight = 0;
            const distance = 300;
            const timer = setInterval(() => {
                window.scrollBy(0, distance);
                totalHeight += distance;
                if (totalHeight >= document.body.scrollHeight) {
                    clearInterval(timer);
                    resolve();
                }
            }, 200);
        });
    });
}
Enter fullscreen mode Exit fullscreen mode

Understanding TrustRadius Scoring Methodology

When scraping TrustRadius, understanding the trScore methodology helps you build better data pipelines. The trScore is calculated from:

  1. Recency weighting: Recent reviews carry more weight than older ones
  2. Review depth: Longer, more detailed reviews get higher influence
  3. Reviewer verification level: Higher verification = more weight
  4. Category-specific ratings: Individual scores for features like ease of use, support quality, and value for money
  5. Comparison frequency: How often the product is compared against competitors

When you extract these scores, store the component parts separately so you can do your own weighted analysis later.

Scraping TrustRadius Category Pages

Category pages are gold for market research. Here's how to extract category-level data:

def scrape_category(self, category_slug):
    """Scrape all products in a TrustRadius category."""
    url = f'{self.base_url}/{category_slug}'
    response = self.session.get(url)
    soup = BeautifulSoup(response.text, 'lxml')

    products = []
    product_cards = soup.select(
        '[class*="productCard"], '
        '[class*="ProductListing"], '
        '.product-item'
    )

    for card in product_cards:
        product = {}

        name_el = card.select_one(
            'h3, h2, [class*="productName"]'
        )
        product['name'] = (
            name_el.get_text(strip=True) if name_el else ''
        )

        score_el = card.select_one('[class*="score"]')
        product['tr_score'] = (
            score_el.get_text(strip=True) if score_el else ''
        )

        reviews_el = card.select_one('[class*="reviewCount"]')
        product['review_count'] = (
            reviews_el.get_text(strip=True) if reviews_el else ''
        )

        link_el = card.select_one('a[href*="/products/"]')
        if link_el:
            product['url'] = self.base_url + link_el['href']
            product['slug'] = (
                link_el['href'].split('/products/')[1]
                               .split('/')[0]
            )

        desc_el = card.select_one(
            '[class*="description"], p'
        )
        product['description'] = (
            desc_el.get_text(strip=True) if desc_el else ''
        )

        products.append(product)

    return products
Enter fullscreen mode Exit fullscreen mode

Data Quality and Verification

TrustRadius data quality is generally high because of their verification process, but you should still validate scraped data:

  1. Check for empty fields: Some reviews may have incomplete profiles
  2. Verify rating ranges: trScores should be 0-10, individual ratings 1-10
  3. Deduplicate: Users can update reviews; check for duplicate reviewer names
  4. Validate dates: Ensure review dates parse correctly for time-series analysis
  5. Cross-reference: Compare review counts on product pages vs. actual scraped reviews

Handling Anti-Scraping Measures

TrustRadius employs several anti-scraping measures you need to handle:

  • Rate limiting: Keep requests under 1 per second per IP
  • JavaScript rendering: Many elements load via client-side JS — use headless browsers
  • Session cookies: Some pages require valid session state
  • CAPTCHA challenges: Rotate IPs and use residential proxies to minimize triggers

Apify handles most of these automatically through its proxy management and browser fingerprinting features.

Practical Use Cases for TrustRadius Data

Here are real-world applications for the data you extract:

Competitive Intelligence Dashboard

Scrape reviews for your product and all competitors in your category. Track sentiment trends, feature gap mentions, and satisfaction scores over time.

Sales Enablement

Extract positive reviews and comparison wins to arm your sales team with verified proof points.

Product Roadmap Input

Mine review cons and feature requests to identify the most common pain points users report.

Market Sizing

Use category page data to understand how many products compete in each segment and their relative review volumes.

Content Marketing

Identify trending topics and common questions in reviews to fuel blog posts, case studies, and comparison guides.

Conclusion

TrustRadius offers some of the highest-quality B2B software review data available online. Its verification process means the data you scrape is inherently more trustworthy than other review platforms. By combining Python for quick extractions, Node.js with Puppeteer for JavaScript-heavy pages, and Apify for production-scale operations, you can build a comprehensive B2B intelligence pipeline.

The key is respecting rate limits, handling dynamic content properly, and validating your extracted data. Start with a single product, validate your selectors, then scale to entire categories using Apify's cloud infrastructure.

Whether you're building a competitive intelligence tool, enriching your CRM with review data, or conducting market research, TrustRadius scraping gives you access to verified, unbiased B2B software opinions that drive better business decisions.

Top comments (0)