DEV Community

agenthustler
agenthustler

Posted on

Clutch.co Scraping: Extract Agency Reviews and Service Provider Data

Clutch.co is the go-to platform for finding and evaluating B2B service providers — from web development agencies to marketing firms, IT consultancies, and design studios. With over 300,000 client reviews and 280,000+ company listings, Clutch holds an enormous dataset of verified agency profiles, detailed project portfolios, and authenticated client feedback.

Scraping Clutch.co gives you structured access to agency ratings, service capabilities, pricing tiers, client reviews, and industry specializations. In this guide, we'll break down Clutch's data architecture, build scrapers in Python and Node.js, and show you how to scale the operation using Apify's cloud platform.

Why Clutch.co Data Is Valuable

Clutch occupies a unique position in the B2B services ecosystem. Here's why its data matters:

  • Verified client reviews: Clutch's team personally interviews clients via phone to verify every review
  • Project portfolios: Real project details with budgets, timelines, and outcomes
  • Detailed company profiles: Team size, hourly rates, location, founding year, service lines
  • Industry-specific rankings: Clutch ranks agencies by category, location, and industry focus
  • Client demographics: Company sizes and industries of the agencies' actual clients
  • Service line breakdowns: Detailed capability matrices showing exactly what each agency offers

For procurement teams, agency comparison platforms, market researchers, and even agencies themselves doing competitive analysis, this data is extremely actionable.

Understanding Clutch.co's Data Structure

Clutch organizes its content across several distinct page types. Understanding this hierarchy is essential for efficient scraping.

Company Profile Pages

Each agency has a detailed profile at clutch.co/profile/[company-slug]. These pages contain:

  • Company overview: Name, tagline, founding year, team size, location(s)
  • Service focus: Percentage breakdown of services offered (e.g., 40% Web Development, 30% UX/UI Design, 30% Mobile App Development)
  • Client focus: Distribution by company size (small business, midmarket, enterprise)
  • Industry focus: Which industries the agency primarily serves
  • Pricing: Minimum project size and hourly rate ranges
  • Portfolio projects: Case studies with project details, budgets, and outcomes
  • Client reviews: Verified reviews with individual category ratings
  • Overall Clutch rating: Aggregate score on a 5-point scale

Category/Directory Pages

Clutch's directory pages at clutch.co/[service-category] and clutch.co/[service-category]/[location] list agencies filtered by service and geography:

  • Ranked agency listings: Ordered by Clutch's proprietary ranking algorithm
  • Quick-view summaries: Rating, review count, min project size, hourly rate, team size
  • Filter options: Location, budget, company size, industry specialization
  • Sponsored listings: Clearly marked promoted placements

Review Detail Pages

Individual reviews accessible from company profiles contain:

  • Reviewer identity: Name, title, company, industry
  • Project details: Service provided, project length, budget range
  • Category ratings: Quality (1-5), Schedule (1-5), Cost (1-5), Willingness to Refer (1-5)
  • Review narrative: Detailed feedback including background, challenge, solution, and results
  • Verification status: Clutch-verified badge

Leader Matrix Pages

Clutch publishes annual "Leaders Matrix" reports that rank top agencies in specific categories. These pages contain:

  • Matrix visualization: Ability to deliver vs. focus area quadrant
  • Ranked lists: Top agencies with summary metrics
  • Methodology notes: How rankings were calculated

Building a Python Clutch.co Scraper

Let's build a comprehensive scraper that extracts agency profiles, reviews, and directory listings.

import requests
from bs4 import BeautifulSoup
import json
import time
import csv
from urllib.parse import urljoin

class ClutchScraper:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                          'AppleWebKit/537.36 (KHTML, like Gecko) '
                          'Chrome/120.0.0.0 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,'
                      'application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
        })
        self.base_url = 'https://clutch.co'

    def scrape_company_profile(self, company_slug):
        """Extract full company profile data."""
        url = f'{self.base_url}/profile/{company_slug}'
        response = self.session.get(url)

        if response.status_code != 200:
            print(f'Failed to fetch {url}: {response.status_code}')
            return None

        soup = BeautifulSoup(response.text, 'lxml')
        profile = {}

        # Company name
        name_el = soup.select_one(
            'h1, [class*="company-name"], '
            '[class*="CompanyName"]'
        )
        profile['name'] = (
            name_el.get_text(strip=True) if name_el else ''
        )

        # Tagline
        tagline_el = soup.select_one(
            '[class*="tagline"], [class*="motto"]'
        )
        profile['tagline'] = (
            tagline_el.get_text(strip=True) if tagline_el else ''
        )

        # Overall rating
        rating_el = soup.select_one(
            '[class*="overall-rating"], '
            '[class*="rating-score"], '
            '[class*="ReviewScore"]'
        )
        if rating_el:
            try:
                profile['rating'] = float(
                    rating_el.get_text(strip=True)
                )
            except ValueError:
                profile['rating'] = None

        # Review count
        review_count_el = soup.select_one(
            '[class*="review-count"], '
            '[class*="reviews-number"]'
        )
        if review_count_el:
            text = review_count_el.get_text(strip=True)
            digits = ''.join(filter(str.isdigit, text))
            profile['review_count'] = (
                int(digits) if digits else 0
            )

        # Company details
        profile['details'] = self._extract_company_details(soup)

        # Service focus
        profile['service_focus'] = self._extract_focus_chart(
            soup, 'service'
        )

        # Client focus
        profile['client_focus'] = self._extract_focus_chart(
            soup, 'client'
        )

        # Industry focus
        profile['industry_focus'] = self._extract_focus_chart(
            soup, 'industry'
        )

        # Portfolio items
        profile['portfolio'] = self._extract_portfolio(soup)

        profile['url'] = url
        profile['slug'] = company_slug

        return profile

    def _extract_company_details(self, soup):
        """Extract structured company details sidebar."""
        details = {}

        # Location
        location_el = soup.select_one(
            '[class*="location"], [class*="locality"]'
        )
        details['location'] = (
            location_el.get_text(strip=True) if location_el else ''
        )

        # Team size
        size_el = soup.select_one(
            '[class*="team-size"], [class*="employees"]'
        )
        details['team_size'] = (
            size_el.get_text(strip=True) if size_el else ''
        )

        # Hourly rate
        rate_el = soup.select_one(
            '[class*="hourly-rate"], [class*="pricing"]'
        )
        details['hourly_rate'] = (
            rate_el.get_text(strip=True) if rate_el else ''
        )

        # Min project size
        project_el = soup.select_one(
            '[class*="project-size"], [class*="min-project"]'
        )
        details['min_project_size'] = (
            project_el.get_text(strip=True) if project_el else ''
        )

        # Founded year
        founded_el = soup.select_one(
            '[class*="founded"], [class*="year"]'
        )
        details['founded'] = (
            founded_el.get_text(strip=True) if founded_el else ''
        )

        return details

    def _extract_focus_chart(self, soup, focus_type):
        """Extract focus percentage breakdowns."""
        focus_data = []
        chart_section = soup.select_one(
            f'[class*="{focus_type}-focus"], '
            f'[class*="{focus_type}Focus"], '
            f'[data-section="{focus_type}"]'
        )
        if not chart_section:
            return focus_data

        items = chart_section.select(
            '[class*="chart-item"], '
            '[class*="focus-item"], li'
        )
        for item in items:
            label_el = item.select_one(
                '[class*="label"], span'
            )
            pct_el = item.select_one(
                '[class*="percent"], [class*="value"]'
            )
            if label_el and pct_el:
                focus_data.append({
                    'category': label_el.get_text(strip=True),
                    'percentage': pct_el.get_text(strip=True)
                })

        return focus_data

    def _extract_portfolio(self, soup):
        """Extract portfolio/case study items."""
        portfolio = []
        items = soup.select(
            '[class*="portfolio-item"], '
            '[class*="ProjectCard"], '
            '[class*="case-study"]'
        )

        for item in items:
            project = {}
            title_el = item.select_one('h3, h4, [class*="title"]')
            project['title'] = (
                title_el.get_text(strip=True) if title_el else ''
            )

            desc_el = item.select_one(
                '[class*="description"], p'
            )
            project['description'] = (
                desc_el.get_text(strip=True) if desc_el else ''
            )

            budget_el = item.select_one(
                '[class*="budget"], [class*="cost"]'
            )
            project['budget'] = (
                budget_el.get_text(strip=True) if budget_el else ''
            )

            portfolio.append(project)

        return portfolio

    def scrape_reviews(self, company_slug, max_pages=5):
        """Scrape all reviews for a company."""
        all_reviews = []

        for page in range(1, max_pages + 1):
            url = (f'{self.base_url}/profile/{company_slug}'
                   f'#reviews?page={page}')
            print(f'Scraping reviews page {page}')

            response = self.session.get(
                f'{self.base_url}/profile/{company_slug}',
                params={'page': page}
            )
            if response.status_code != 200:
                break

            soup = BeautifulSoup(response.text, 'lxml')
            reviews = soup.select(
                '[class*="review-item"], '
                '[class*="ReviewCard"], '
                '.client-review'
            )

            if not reviews:
                break

            for review_el in reviews:
                review = self._parse_review(review_el)
                if review:
                    review['company_slug'] = company_slug
                    all_reviews.append(review)

            time.sleep(2)

        return all_reviews

    def _parse_review(self, review_el):
        """Parse a single review element."""
        review = {}

        # Reviewer info
        name_el = review_el.select_one(
            '[class*="reviewer-name"], '
            '[class*="client-name"]'
        )
        review['reviewer_name'] = (
            name_el.get_text(strip=True) if name_el else ''
        )

        title_el = review_el.select_one(
            '[class*="reviewer-title"], '
            '[class*="client-title"]'
        )
        review['reviewer_title'] = (
            title_el.get_text(strip=True) if title_el else ''
        )

        company_el = review_el.select_one(
            '[class*="reviewer-company"], '
            '[class*="client-company"]'
        )
        review['reviewer_company'] = (
            company_el.get_text(strip=True) if company_el else ''
        )

        # Industry
        industry_el = review_el.select_one(
            '[class*="industry"]'
        )
        review['industry'] = (
            industry_el.get_text(strip=True) if industry_el else ''
        )

        # Project details
        service_el = review_el.select_one(
            '[class*="service"], [class*="project-type"]'
        )
        review['service_provided'] = (
            service_el.get_text(strip=True) if service_el else ''
        )

        # Individual ratings
        review['ratings'] = {}
        rating_items = review_el.select(
            '[class*="rating-item"], '
            '[class*="score-item"]'
        )
        for item in rating_items:
            label = item.select_one('[class*="label"]')
            score = item.select_one('[class*="score"], [class*="value"]')
            if label and score:
                review['ratings'][
                    label.get_text(strip=True)
                ] = score.get_text(strip=True)

        # Overall rating
        overall_el = review_el.select_one(
            '[class*="overall"], [class*="total-score"]'
        )
        if overall_el:
            try:
                review['overall_rating'] = float(
                    overall_el.get_text(strip=True)
                )
            except ValueError:
                review['overall_rating'] = None

        # Review text
        text_el = review_el.select_one(
            '[class*="review-text"], '
            '[class*="review-body"], '
            '[class*="feedback"]'
        )
        review['review_text'] = (
            text_el.get_text(strip=True) if text_el else ''
        )

        # Verification
        verified_el = review_el.select_one(
            '[class*="verified"], [class*="badge"]'
        )
        review['is_verified'] = verified_el is not None

        return review

    def scrape_directory(self, category, location=None,
                         max_pages=3):
        """Scrape agency listings from directory pages."""
        agencies = []

        for page in range(0, max_pages):
            if location:
                url = (f'{self.base_url}/{category}/{location}'
                       f'?page={page}')
            else:
                url = f'{self.base_url}/{category}?page={page}'

            print(f'Scraping directory: {url}')
            response = self.session.get(url)

            if response.status_code != 200:
                break

            soup = BeautifulSoup(response.text, 'lxml')
            listings = soup.select(
                '[class*="provider-row"], '
                '[class*="CompanyCard"], '
                '.directory-list li'
            )

            if not listings:
                break

            for listing in listings:
                agency = self._parse_directory_listing(listing)
                if agency:
                    agencies.append(agency)

            time.sleep(2)

        return agencies

    def _parse_directory_listing(self, listing):
        """Parse a directory listing card."""
        agency = {}

        name_el = listing.select_one(
            'h3 a, [class*="company-name"] a'
        )
        if name_el:
            agency['name'] = name_el.get_text(strip=True)
            href = name_el.get('href', '')
            agency['profile_url'] = urljoin(
                self.base_url, href
            )
            agency['slug'] = href.rstrip('/').split('/')[-1]

        rating_el = listing.select_one(
            '[class*="rating"]'
        )
        agency['rating'] = (
            rating_el.get_text(strip=True) if rating_el else ''
        )

        reviews_el = listing.select_one(
            '[class*="reviews"]'
        )
        agency['review_count'] = (
            reviews_el.get_text(strip=True) if reviews_el else ''
        )

        location_el = listing.select_one(
            '[class*="location"], [class*="locality"]'
        )
        agency['location'] = (
            location_el.get_text(strip=True) if location_el else ''
        )

        rate_el = listing.select_one(
            '[class*="hourly"], [class*="rate"]'
        )
        agency['hourly_rate'] = (
            rate_el.get_text(strip=True) if rate_el else ''
        )

        size_el = listing.select_one(
            '[class*="size"], [class*="employees"]'
        )
        agency['team_size'] = (
            size_el.get_text(strip=True) if size_el else ''
        )

        min_proj_el = listing.select_one(
            '[class*="project-size"], [class*="budget"]'
        )
        agency['min_project_size'] = (
            min_proj_el.get_text(strip=True) if min_proj_el else ''
        )

        tagline_el = listing.select_one(
            '[class*="tagline"], [class*="motto"]'
        )
        agency['tagline'] = (
            tagline_el.get_text(strip=True) if tagline_el else ''
        )

        return agency

    def export_results(self, data, filename):
        """Export to JSON file."""
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(data, f, indent=2, ensure_ascii=False)
        print(f'Exported to {filename}')


# Usage example
scraper = ClutchScraper()

# Scrape a company profile
profile = scraper.scrape_company_profile('toptal')
print(json.dumps(profile, indent=2))

# Scrape reviews
reviews = scraper.scrape_reviews('toptal', max_pages=3)
print(f'Found {len(reviews)} reviews')

# Scrape directory
agencies = scraper.scrape_directory(
    'web-developers', 'united-states', max_pages=2
)
print(f'Found {len(agencies)} agencies')

# Export
scraper.export_results(
    {'profile': profile, 'reviews': reviews, 'agencies': agencies},
    'clutch_data.json'
)
Enter fullscreen mode Exit fullscreen mode

Node.js Scraper with Puppeteer for Dynamic Content

Clutch.co loads some content dynamically, especially reviews and portfolio items. Here's a Puppeteer-based approach:

const puppeteer = require('puppeteer');
const fs = require('fs');

class ClutchNodeScraper {
    constructor() {
        this.browser = null;
        this.page = null;
        this.baseUrl = 'https://clutch.co';
    }

    async init() {
        this.browser = await puppeteer.launch({
            headless: 'new',
            args: [
                '--no-sandbox',
                '--disable-setuid-sandbox',
                '--disable-dev-shm-usage'
            ]
        });
        this.page = await this.browser.newPage();
        await this.page.setUserAgent(
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
            'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
        );
        await this.page.setViewport({
            width: 1920, height: 1080
        });
    }

    async scrapeCompanyProfile(companySlug) {
        const url = `${this.baseUrl}/profile/${companySlug}`;
        console.log(`Scraping profile: ${url}`);

        await this.page.goto(url, {
            waitUntil: 'networkidle2',
            timeout: 30000
        });

        // Wait for main content
        await this.page.waitForSelector(
            'h1, [class*="company"]',
            { timeout: 10000 }
        ).catch(() => console.log('Timeout waiting for content'));

        const profile = await this.page.evaluate((base) => {
            const getText = (sel) => {
                const el = document.querySelector(sel);
                return el ? el.textContent.trim() : '';
            };

            const data = {
                name: getText('h1'),
                tagline: getText(
                    '[class*="tagline"], [class*="motto"]'
                ),
                rating: getText(
                    '[class*="overall-rating"], ' +
                    '[class*="rating-score"]'
                ),
                location: getText(
                    '[class*="location"], [class*="locality"]'
                ),
                teamSize: getText(
                    '[class*="team-size"], [class*="employees"]'
                ),
                hourlyRate: getText(
                    '[class*="hourly-rate"], [class*="pricing"]'
                ),
                minProjectSize: getText(
                    '[class*="project-size"]'
                ),
                url: window.location.href,
            };

            // Extract service focus percentages
            data.serviceFocus = [];
            const serviceItems = document.querySelectorAll(
                '[class*="service-focus"] li, ' +
                '[class*="serviceFocus"] [class*="item"]'
            );
            serviceItems.forEach(item => {
                const label = item.querySelector(
                    '[class*="label"], span:first-child'
                );
                const pct = item.querySelector(
                    '[class*="percent"], [class*="value"]'
                );
                if (label && pct) {
                    data.serviceFocus.push({
                        service: label.textContent.trim(),
                        percentage: pct.textContent.trim()
                    });
                }
            });

            // Extract reviews summary
            data.reviews = [];
            const reviewCards = document.querySelectorAll(
                '[class*="review-item"], ' +
                '[class*="ReviewCard"], ' +
                '.client-review'
            );
            reviewCards.forEach(card => {
                const cardText = (sel) => {
                    const el = card.querySelector(sel);
                    return el ? el.textContent.trim() : '';
                };

                data.reviews.push({
                    reviewer: cardText(
                        '[class*="name"], [class*="client-name"]'
                    ),
                    title: cardText(
                        '[class*="title"], [class*="position"]'
                    ),
                    company: cardText(
                        '[class*="company"]'
                    ),
                    rating: cardText(
                        '[class*="overall"], [class*="score"]'
                    ),
                    service: cardText(
                        '[class*="service"], [class*="project"]'
                    ),
                    text: cardText(
                        '[class*="body"], [class*="feedback"], p'
                    ),
                    verified: !!card.querySelector(
                        '[class*="verified"]'
                    )
                });
            });

            return data;
        }, this.baseUrl);

        return profile;
    }

    async scrapeDirectory(category, options = {}) {
        const {
            location = null,
            maxPages = 3,
        } = options;

        const agencies = [];

        for (let page = 0; page < maxPages; page++) {
            let url = location
                ? `${this.baseUrl}/${category}/${location}?page=${page}`
                : `${this.baseUrl}/${category}?page=${page}`;

            console.log(`Scraping directory page: ${url}`);

            await this.page.goto(url, {
                waitUntil: 'networkidle2',
                timeout: 30000
            });

            const pageAgencies = await this.page.evaluate(() => {
                const listings = document.querySelectorAll(
                    '[class*="provider-row"], ' +
                    '[class*="CompanyCard"], ' +
                    '[class*="directory"] li[class*="provider"]'
                );
                const results = [];

                listings.forEach(listing => {
                    const getText = (sel) => {
                        const el = listing.querySelector(sel);
                        return el ? el.textContent.trim() : '';
                    };
                    const getHref = (sel) => {
                        const el = listing.querySelector(sel);
                        return el ? el.href || '' : '';
                    };

                    results.push({
                        name: getText('h3 a, [class*="name"] a'),
                        profileUrl: getHref('h3 a, [class*="name"] a'),
                        rating: getText('[class*="rating"]'),
                        reviewCount: getText('[class*="reviews"]'),
                        location: getText('[class*="location"]'),
                        hourlyRate: getText('[class*="rate"]'),
                        teamSize: getText('[class*="size"]'),
                        minProject: getText('[class*="project"]'),
                        tagline: getText('[class*="tagline"]'),
                    });
                });

                return results;
            });

            agencies.push(...pageAgencies);
            await new Promise(r => setTimeout(r, 2000));
        }

        return agencies;
    }

    async close() {
        if (this.browser) await this.browser.close();
    }
}

// Usage
(async () => {
    const scraper = new ClutchNodeScraper();
    await scraper.init();

    try {
        const profile = await scraper.scrapeCompanyProfile('toptal');
        console.log('Profile:', JSON.stringify(profile, null, 2));

        const agencies = await scraper.scrapeDirectory(
            'web-developers',
            { location: 'new-york', maxPages: 2 }
        );
        console.log(`Found ${agencies.length} agencies`);

        fs.writeFileSync(
            'clutch_results.json',
            JSON.stringify({ profile, agencies }, null, 2)
        );
    } finally {
        await scraper.close();
    }
})();
Enter fullscreen mode Exit fullscreen mode

Scaling with Apify Actors

For production workloads — scraping thousands of agency profiles or tracking review changes over time — Apify provides the infrastructure you need.

Apify Actor for Clutch.co

const { Actor } = require('apify');
const { PuppeteerCrawler } = require('crawlee');

Actor.main(async () => {
    const input = await Actor.getInput();
    const {
        startUrls = [],
        category = 'web-developers',
        location = null,
        maxProfiles = 100,
        scrapeReviews = true,
        maxPagesPerProfile = 3
    } = input;

    const profileQueue = [];
    let profilesScraped = 0;

    const crawler = new PuppeteerCrawler({
        maxConcurrency: 5,
        navigationTimeoutSecs: 60,

        launchContext: {
            launchOptions: {
                headless: true,
                args: ['--no-sandbox']
            }
        },

        async requestHandler({ request, page, log, enqueueLinks }) {
            const { label } = request.userData;

            if (label === 'DIRECTORY') {
                log.info(`Processing directory: ${request.url}`);

                await page.waitForSelector(
                    '[class*="provider"], [class*="Company"]',
                    { timeout: 15000 }
                ).catch(() => {});

                const profileLinks = await page.evaluate(() => {
                    const links = document.querySelectorAll(
                        '[class*="provider"] h3 a, '
                        + '[class*="CompanyCard"] a[href*="/profile/"]'
                    );
                    return Array.from(links).map(a => ({
                        url: a.href,
                        name: a.textContent.trim()
                    }));
                });

                for (const link of profileLinks) {
                    if (profilesScraped >= maxProfiles) break;
                    await crawler.addRequests([{
                        url: link.url,
                        userData: {
                            label: 'PROFILE',
                            companyName: link.name
                        }
                    }]);
                }

                // Enqueue next directory page
                const nextBtn = await page.$(
                    'a[rel="next"], [class*="next-page"]'
                );
                if (nextBtn) {
                    const nextUrl = await page.evaluate(
                        el => el.href, nextBtn
                    );
                    if (nextUrl) {
                        await crawler.addRequests([{
                            url: nextUrl,
                            userData: { label: 'DIRECTORY' }
                        }]);
                    }
                }

            } else if (label === 'PROFILE') {
                if (profilesScraped >= maxProfiles) return;

                log.info(`Processing profile: ${request.url}`);

                await page.waitForSelector('h1', { timeout: 10000 })
                    .catch(() => {});

                const profileData = await page.evaluate(() => {
                    const getText = (s) => {
                        const e = document.querySelector(s);
                        return e ? e.textContent.trim() : '';
                    };

                    return {
                        name: getText('h1'),
                        rating: getText('[class*="rating-score"]'),
                        location: getText('[class*="location"]'),
                        teamSize: getText('[class*="team-size"]'),
                        hourlyRate: getText('[class*="hourly"]'),
                        minProject: getText('[class*="project-size"]'),
                        tagline: getText('[class*="tagline"]'),
                        reviewCount: getText('[class*="review-count"]'),
                        url: window.location.href
                    };
                });

                // Extract reviews if enabled
                let reviews = [];
                if (scrapeReviews) {
                    reviews = await page.evaluate(() => {
                        const cards = document.querySelectorAll(
                            '[class*="review-item"], .client-review'
                        );
                        return Array.from(cards).map(card => {
                            const t = (s) => {
                                const e = card.querySelector(s);
                                return e ? e.textContent.trim() : '';
                            };
                            return {
                                reviewer: t('[class*="name"]'),
                                title: t('[class*="title"]'),
                                rating: t('[class*="overall"]'),
                                service: t('[class*="service"]'),
                                text: t('[class*="body"], p'),
                                verified: !!card.querySelector(
                                    '[class*="verified"]'
                                )
                            };
                        });
                    });
                }

                await Actor.pushData({
                    ...profileData,
                    reviews,
                    scrapedAt: new Date().toISOString()
                });

                profilesScraped++;
            }
        }
    });

    // Build initial request list
    const requests = startUrls.length > 0
        ? startUrls.map(url => ({
            url,
            userData: { label: 'PROFILE' }
        }))
        : [{
            url: location
                ? `https://clutch.co/${category}/${location}`
                : `https://clutch.co/${category}`,
            userData: { label: 'DIRECTORY' }
        }];

    await crawler.run(requests);

    log.info(
        `Scraping complete. ${profilesScraped} profiles processed.`
    );
});
Enter fullscreen mode Exit fullscreen mode

Understanding Clutch's Rating Breakdown

When scraping Clutch reviews, understanding how ratings are structured helps you build better analysis. Each Clutch review includes four sub-ratings:

Category Description Scale
Quality Overall quality of deliverables 1.0 - 5.0
Schedule Adherence to timeline and deadlines 1.0 - 5.0
Cost Value for money and budget adherence 1.0 - 5.0
Willing to Refer Likelihood to recommend the agency 1.0 - 5.0

The overall Clutch rating is a weighted average of these four categories, with verification status and review recency also factoring in. When building analytical tools, store each sub-rating separately to enable deeper analysis, like identifying agencies that deliver great quality but struggle with schedules.

Handling Anti-Scraping Protections

Clutch.co uses several protective measures:

  1. Rate limiting: Aggressive rate limits on rapid sequential requests
  2. Cloudflare protection: Bot detection that may serve challenge pages
  3. JavaScript rendering: Key data loaded via AJAX calls after initial page load
  4. Session validation: Some content requires maintaining valid session cookies

To handle these effectively:

  • Use residential proxies: Datacenter IPs get blocked quickly
  • Implement delays: 2-3 seconds between requests minimum
  • Use headless browsers: Required for JavaScript-rendered content
  • Rotate user agents: Vary browser fingerprints across requests
  • Monitor for blocks: Check response content for challenge pages

Apify's managed infrastructure handles proxy rotation and browser fingerprinting automatically, making it the easiest path to reliable Clutch scraping at scale.

Practical Applications for Clutch Data

Agency Comparison Tools

Build tools that let procurement teams compare agencies side-by-side with normalized ratings, pricing, and review sentiment analysis.

Market Intelligence

Track how agencies rise and fall in Clutch rankings over time. Identify emerging firms and declining incumbents.

Lead Generation for Agencies

If you're an agency, scrape competitor profiles to identify their client types, then target similar companies.

Procurement Optimization

Extract pricing data (hourly rates, minimum project sizes) across hundreds of agencies to benchmark costs by service type and location.

Review Sentiment Analysis

Export review text and run NLP sentiment analysis to identify common praise themes and complaint patterns across the industry.

Geographic Market Analysis

Map agency density, pricing, and ratings by city or country to identify underserved markets or pricing opportunities.

Data Export and Integration

After scraping, you'll typically want to pipe the data into analytical tools. Here's a quick export to pandas for analysis:

import pandas as pd

# From scraped data
agencies_df = pd.DataFrame(agencies)
reviews_df = pd.DataFrame(reviews)

# Clean and transform
agencies_df['rating_float'] = pd.to_numeric(
    agencies_df['rating'], errors='coerce'
)
agencies_df['review_count_int'] = pd.to_numeric(
    agencies_df['review_count']
    .str.extract(r'(\d+)')[0],
    errors='coerce'
)

# Analysis examples
print("Average rating by location:")
print(
    agencies_df.groupby('location')['rating_float']
    .mean()
    .sort_values(ascending=False)
    .head(10)
)

print("\nTop agencies by review count:")
print(
    agencies_df.nlargest(10, 'review_count_int')[
        ['name', 'rating', 'review_count', 'location']
    ]
)
Enter fullscreen mode Exit fullscreen mode

Conclusion

Clutch.co is one of the richest sources of B2B service provider data on the web. Its verified reviews, detailed agency profiles, and structured rating system make it an invaluable target for competitive intelligence, market research, and procurement optimization.

By combining Python for quick data extraction, Node.js with Puppeteer for JavaScript-heavy content, and Apify for scalable cloud scraping, you can build comprehensive pipelines that keep you updated on the entire B2B services landscape.

Start small — scrape a single category in one location, validate your data quality, then expand to full directory coverage using Apify actors. The key is maintaining respectful scraping practices while building robust selectors that adapt to page structure changes.

Whether you're building an agency comparison platform, conducting procurement research, or doing competitive analysis, structured Clutch data transforms hours of manual research into automated, repeatable intelligence gathering.

Top comments (0)