DEV Community

agenthustler
agenthustler

Posted on

Yelp Scraping: Extract Local Business Data, Reviews, and Ratings

Yelp is one of the most comprehensive sources of local business data on the internet. With over 200 million reviews covering restaurants, shops, services, and more across major cities worldwide, scraping Yelp data opens up opportunities for market research, lead generation, competitive analysis, and location intelligence.

In this guide, we'll explore the structure of Yelp, what data you can extract, how to build scrapers, and how to leverage Apify Store actors to make the process efficient and reliable.

Understanding Yelp's Structure

Yelp organizes its data around several core concepts that are important to understand before you start scraping.

Business Listings

Every business on Yelp has a detailed profile page containing:

  • Business name, address, and phone number
  • Category classifications (e.g., "Italian Restaurant", "Auto Repair")
  • Star rating (1-5, in half-star increments)
  • Review count
  • Price range ($ to $$$$)
  • Hours of operation
  • Photos uploaded by users and the business
  • Attributes (outdoor seating, delivery, parking, etc.)
  • Owner responses to reviews

Search and Discovery

Yelp's search functionality combines location-based queries with category filters:

  • Location search: "Restaurants near San Francisco, CA"
  • Category browse: Browse all businesses in a specific category
  • Keyword search: Free-text search with location context
  • Filter options: Price, distance, open now, ratings, attributes

Review System

Yelp's review system is the heart of the platform:

  • Individual reviews: Star rating, text content, date, author
  • Review highlights: AI-extracted key phrases
  • Photos within reviews: User-uploaded images tied to reviews
  • Reactions: Useful, Funny, Cool counts on each review
  • Owner responses: Business owners can reply to reviews

What Data Can You Extract?

Business Profile Data

Here's the structure of data available from a typical Yelp business listing:

const businessData = {
    // Basic Information
    name: "Joe's Pizza",
    url: "https://www.yelp.com/biz/joes-pizza-new-york",
    phone: "+1-212-555-0123",
    address: {
        street: "7 Carmine St",
        city: "New York",
        state: "NY",
        zip: "10014",
        country: "US"
    },
    coordinates: {
        latitude: 40.7306,
        longitude: -74.0023
    },

    // Ratings and Reviews
    rating: 4.5,
    reviewCount: 12847,
    priceRange: "$",

    // Categories
    categories: ["Pizza", "Italian", "Fast Food"],

    // Attributes
    attributes: {
        delivery: true,
        takeout: true,
        outdoorSeating: true,
        parking: "street",
        wifi: "free",
        goodForGroups: true,
        reservations: false,
        wheelchairAccessible: true
    },

    // Hours
    hours: {
        monday: "10:00 AM - 2:00 AM",
        tuesday: "10:00 AM - 2:00 AM",
        // ...
    },

    // Media
    photoCount: 3456,
    photos: ["url1.jpg", "url2.jpg"]
};
Enter fullscreen mode Exit fullscreen mode

Review Data

Each review contains valuable structured and unstructured data:

const reviewData = {
    author: {
        name: "Sarah M.",
        location: "Brooklyn, NY",
        reviewCount: 47,
        photoCount: 123,
        friends: 89,
        elite: true
    },
    rating: 5,
    date: "2026-02-15",
    text: "Best pizza in NYC, hands down. The classic slice is perfection...",
    photos: ["review_photo1.jpg"],
    reactions: {
        useful: 23,
        funny: 5,
        cool: 8
    },
    businessResponse: {
        text: "Thank you Sarah! Come back anytime.",
        date: "2026-02-16"
    }
};
Enter fullscreen mode Exit fullscreen mode

Search Results Data

When scraping search results, you get a summary view of multiple businesses:

const searchResult = {
    query: "best tacos",
    location: "Austin, TX",
    totalResults: 847,
    businesses: [
        {
            rank: 1,
            name: "Taco Joint",
            rating: 4.5,
            reviewCount: 2341,
            priceRange: "$",
            categories: ["Mexican", "Tacos"],
            neighborhood: "East Austin",
            snippet: "Known for their breakfast tacos and...",
            distance: "0.3 mi"
        }
        // ... more results
    ]
};
Enter fullscreen mode Exit fullscreen mode

Building a Yelp Scraper

Using Crawlee for Structured Scraping

Crawlee provides a powerful framework for building Yelp scrapers that handle pagination, retries, and data extraction:

// yelp-scraper.js
const { CheerioCrawler, Dataset } = require('crawlee');

const crawler = new CheerioCrawler({
    maxConcurrency: 2,
    sameDomainDelaySecs: 3,  // Respect Yelp's servers

    async requestHandler({ request, $, enqueueLinks }) {
        const url = request.url;

        if (request.label === 'SEARCH') {
            // Parse search results page
            const businesses = [];

            $('[data-testid="serp-ia-card"]').each((i, el) => {
                const $el = $(el);
                businesses.push({
                    name: $el.find('a[href*="/biz/"]').first().text().trim(),
                    url: 'https://www.yelp.com' + $el.find('a[href*="/biz/"]').attr('href'),
                    rating: parseFloat($el.find('[aria-label*="star rating"]').attr('aria-label')),
                    reviewCount: parseInt($el.find('.reviewCount').text()),
                    categories: $el.find('.priceCategory span').map((i, s) => $(s).text()).get(),
                });
            });

            await Dataset.pushData(businesses);

            // Follow pagination
            const nextPage = $('a[aria-label="Next"]').attr('href');
            if (nextPage) {
                await crawler.addRequests([{
                    url: 'https://www.yelp.com' + nextPage,
                    label: 'SEARCH'
                }]);
            }
        }

        if (request.label === 'DETAIL') {
            // Parse business detail page
            const businessDetail = {
                name: $('h1').text().trim(),
                rating: parseFloat($('[aria-label*="star rating"]').first().attr('aria-label')),
                reviewCount: parseInt($('[data-testid="review-count"]').text()),
                address: $('address').text().trim(),
                phone: $('[href^="tel:"]').text().trim(),
                categories: $('[data-testid="categories"] a').map((i, el) => $(el).text()).get(),
                priceRange: $('[data-testid="price-range"]').text().trim(),
                website: $('a[href*="biz_redir"]').attr('href'),
                hours: extractHours($),
                photos: $('img[src*="bphoto"]').map((i, el) => $(el).attr('src')).get(),
                url: request.url,
                scrapedAt: new Date().toISOString()
            };

            await Dataset.pushData(businessDetail);
        }
    },
});

function extractHours($) {
    const hours = {};
    $('table.hours-table tr').each((i, row) => {
        const day = $(row).find('th').text().trim();
        const time = $(row).find('td').text().trim();
        hours[day] = time;
    });
    return hours;
}

// Start with a search
await crawler.run([{
    url: 'https://www.yelp.com/search?find_desc=restaurants&find_loc=San+Francisco%2C+CA',
    label: 'SEARCH'
}]);
Enter fullscreen mode Exit fullscreen mode

Extracting Reviews

Reviews require special handling since they're often loaded dynamically and paginated:

const { PlaywrightCrawler, Dataset } = require('crawlee');

const reviewCrawler = new PlaywrightCrawler({
    maxConcurrency: 1,
    navigationTimeoutSecs: 60,

    async requestHandler({ page, request }) {
        // Wait for reviews to load
        await page.waitForSelector('[data-testid="review"]', { timeout: 15000 });

        const reviews = await page.$$eval('[data-testid="review"]', (elements) => {
            return elements.map(el => ({
                author: el.querySelector('.user-name')?.textContent?.trim(),
                rating: el.querySelector('[aria-label*="star"]')?.getAttribute('aria-label'),
                date: el.querySelector('.date')?.textContent?.trim(),
                text: el.querySelector('.comment p')?.textContent?.trim(),
                useful: parseInt(el.querySelector('[data-testid="useful"] span')?.textContent || '0'),
                funny: parseInt(el.querySelector('[data-testid="funny"] span')?.textContent || '0'),
                cool: parseInt(el.querySelector('[data-testid="cool"] span')?.textContent || '0'),
            }));
        });

        await Dataset.pushData(reviews);

        // Check for next page of reviews
        const nextButton = await page.$('a.next-link');
        if (nextButton) {
            const nextUrl = await nextButton.getAttribute('href');
            await reviewCrawler.addRequests([{
                url: 'https://www.yelp.com' + nextUrl,
                label: 'REVIEWS'
            }]);
        }
    },
});
Enter fullscreen mode Exit fullscreen mode

Location-Based Scraping

One of Yelp's most powerful features is location-based search. Here's how to scrape businesses across multiple locations:

async function scrapeMultipleLocations(category, locations) {
    const allResults = [];

    for (const location of locations) {
        const searchUrl = `https://www.yelp.com/search?find_desc=${encodeURIComponent(category)}&find_loc=${encodeURIComponent(location)}`;

        console.log(`Scraping ${category} in ${location}...`);

        const results = await scrapeSearchResults(searchUrl);
        allResults.push(...results.map(r => ({ ...r, searchLocation: location })));

        // Respectful delay between location searches
        await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return allResults;
}

// Usage
const locations = [
    'New York, NY',
    'Los Angeles, CA',
    'Chicago, IL',
    'Houston, TX',
    'Phoenix, AZ'
];

const restaurants = await scrapeMultipleLocations('restaurants', locations);
console.log(`Found ${restaurants.length} restaurants across ${locations.length} cities`);
Enter fullscreen mode Exit fullscreen mode

Using Apify Store Actors for Yelp Scraping

Building and maintaining your own Yelp scraper can be time-consuming, especially as Yelp frequently updates its frontend. Pre-built actors from the Apify Store solve this by providing maintained, reliable scraping solutions.

Benefits of Using Apify Actors

  1. No maintenance burden: Actor developers update their code when Yelp changes its website
  2. Built-in proxy rotation: Automatic residential proxy rotation to avoid blocks
  3. Structured output: Clean JSON data ready for analysis
  4. Scalability: Run across hundreds of locations simultaneously
  5. Scheduling: Set up daily or weekly automated runs
  6. Webhooks and integrations: Push data to your systems automatically

Running a Yelp Actor via the Apify API

const { ApifyClient } = require('apify-client');

const client = new ApifyClient({
    token: 'YOUR_APIFY_TOKEN',
});

// Run a Yelp scraper actor
async function scrapeYelp(searchTerms, locations) {
    const run = await client.actor('actor-name/yelp-scraper').call({
        searchTerms: searchTerms,
        locations: locations,
        maxResults: 100,
        includeReviews: true,
        maxReviews: 20,
        proxy: {
            useApifyProxy: true,
            apifyProxyGroups: ['RESIDENTIAL']
        }
    });

    // Fetch results
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    return items;
}

// Example usage
const data = await scrapeYelp(
    ['plumbers', 'electricians'],
    ['Denver, CO', 'Boulder, CO']
);

console.log(`Found ${data.length} businesses`);
data.forEach(biz => {
    console.log(`${biz.name} - ${biz.rating}/5 (${biz.reviewCount} reviews) - ${biz.phone}`);
});
Enter fullscreen mode Exit fullscreen mode

Scheduling Regular Scraping Runs

// Create a scheduled task for weekly Yelp monitoring
const schedule = await client.schedules().create({
    name: 'weekly-competitor-monitoring',
    cronExpression: '0 6 * * MON',  // Every Monday at 6 AM
    actions: [{
        type: 'RUN_ACTOR',
        actorId: 'your-yelp-actor-id',
        runInput: {
            searchTerms: ['your-business-category'],
            locations: ['your-city'],
            maxResults: 200,
            includeReviews: true
        }
    }]
});
Enter fullscreen mode Exit fullscreen mode

Practical Use Cases

1. Lead Generation for Local Services

Extract business contact information for sales outreach:

function generateLeads(businesses, criteria) {
    return businesses
        .filter(b => b.rating >= criteria.minRating)
        .filter(b => b.reviewCount >= criteria.minReviews)
        .filter(b => !b.website || criteria.includeWithWebsite)
        .map(b => ({
            businessName: b.name,
            phone: b.phone,
            address: b.address,
            category: b.categories.join(', '),
            rating: b.rating,
            reviews: b.reviewCount,
            hasWebsite: !!b.website,
            opportunity: !b.website ? 'Needs website' :
                         b.rating < 3 ? 'Reputation management' :
                         'General outreach'
        }));
}

// Find restaurants without websites (potential web design leads)
const leads = generateLeads(scrapedBusinesses, {
    minRating: 3.5,
    minReviews: 10,
    includeWithWebsite: false
});
Enter fullscreen mode Exit fullscreen mode

2. Competitive Analysis Dashboard

Build a monitoring system to track your competitors:

async function competitorDashboard(competitorUrls) {
    const competitors = [];

    for (const url of competitorUrls) {
        const data = await scrapeBusinessDetail(url);
        const recentReviews = await scrapeReviews(url, { limit: 20 });

        competitors.push({
            name: data.name,
            currentRating: data.rating,
            totalReviews: data.reviewCount,
            recentSentiment: calculateSentiment(recentReviews),
            avgRecentRating: recentReviews.reduce((s, r) => s + r.rating, 0) / recentReviews.length,
            reviewTrend: calculateTrend(recentReviews),
            commonComplaints: extractComplaints(recentReviews),
            commonPraise: extractPraise(recentReviews)
        });
    }

    return competitors;
}

function calculateSentiment(reviews) {
    const positive = reviews.filter(r => r.rating >= 4).length;
    const negative = reviews.filter(r => r.rating <= 2).length;
    return {
        positive: (positive / reviews.length * 100).toFixed(1) + '%',
        negative: (negative / reviews.length * 100).toFixed(1) + '%',
        trend: positive > negative ? 'improving' : 'declining'
    };
}
Enter fullscreen mode Exit fullscreen mode

3. Market Research and Location Intelligence

Analyze business density and competition across geographic areas:

async function marketAnalysis(category, cities) {
    const analysis = {};

    for (const city of cities) {
        const businesses = await scrapeYelpSearch(category, city);

        analysis[city] = {
            totalBusinesses: businesses.length,
            avgRating: (businesses.reduce((s, b) => s + b.rating, 0) / businesses.length).toFixed(2),
            avgReviews: Math.round(businesses.reduce((s, b) => s + b.reviewCount, 0) / businesses.length),
            priceDistribution: {
                budget: businesses.filter(b => b.priceRange === '$').length,
                moderate: businesses.filter(b => b.priceRange === '$$').length,
                upscale: businesses.filter(b => b.priceRange === '$$$').length,
                luxury: businesses.filter(b => b.priceRange === '$$$$').length,
            },
            topRated: businesses
                .sort((a, b) => b.rating - a.rating || b.reviewCount - a.reviewCount)
                .slice(0, 5)
                .map(b => `${b.name} (${b.rating}/5, ${b.reviewCount} reviews)`),
            saturation: businesses.length > 100 ? 'High' :
                        businesses.length > 50 ? 'Medium' : 'Low'
        };
    }

    return analysis;
}
Enter fullscreen mode Exit fullscreen mode

4. Review Mining for Product Development

Extract insights from reviews to inform product or service development:

function mineReviews(reviews) {
    // Group reviews by rating
    const byRating = {};
    for (let i = 1; i <= 5; i++) {
        byRating[i] = reviews.filter(r => r.rating === i);
    }

    // Extract frequently mentioned terms in negative reviews
    const negativeTerms = extractFrequentTerms(
        byRating[1].concat(byRating[2]).map(r => r.text)
    );

    // Extract what people love from positive reviews
    const positiveTerms = extractFrequentTerms(
        byRating[4].concat(byRating[5]).map(r => r.text)
    );

    // Find reviews mentioning specific topics
    const topicAnalysis = {
        service: reviews.filter(r => /service|staff|waiter|server/i.test(r.text)),
        food: reviews.filter(r => /food|taste|flavor|dish|meal/i.test(r.text)),
        ambiance: reviews.filter(r => /ambiance|atmosphere|decor|vibe/i.test(r.text)),
        value: reviews.filter(r => /price|expensive|cheap|worth|value/i.test(r.text)),
        wait: reviews.filter(r => /wait|slow|fast|quick|time/i.test(r.text)),
    };

    return {
        totalAnalyzed: reviews.length,
        ratingDistribution: Object.fromEntries(
            Object.entries(byRating).map(([k, v]) => [k, v.length])
        ),
        topComplaints: negativeTerms.slice(0, 15),
        topPraise: positiveTerms.slice(0, 15),
        topicBreakdown: Object.fromEntries(
            Object.entries(topicAnalysis).map(([topic, revs]) => [
                topic,
                {
                    mentions: revs.length,
                    avgRating: (revs.reduce((s, r) => s + r.rating, 0) / revs.length).toFixed(1)
                }
            ])
        )
    };
}
Enter fullscreen mode Exit fullscreen mode

Data Export and Integration

Exporting to Multiple Formats

const { stringify } = require('csv-stringify/sync');
const fs = require('fs');

// Export businesses to CSV
function exportBusinessesCSV(businesses, filename) {
    const csv = stringify(businesses, {
        header: true,
        columns: [
            'name', 'rating', 'reviewCount', 'priceRange',
            'phone', 'address', 'city', 'state', 'zip',
            'categories', 'website', 'latitude', 'longitude'
        ]
    });
    fs.writeFileSync(filename, csv);
    console.log(`Exported ${businesses.length} businesses to ${filename}`);
}

// Export to Google Sheets via API
async function exportToGoogleSheets(businesses, spreadsheetId) {
    const { google } = require('googleapis');
    const sheets = google.sheets({ version: 'v4' });

    const rows = businesses.map(b => [
        b.name, b.rating, b.reviewCount, b.priceRange,
        b.phone, b.address, b.categories.join('; ')
    ]);

    await sheets.spreadsheets.values.update({
        spreadsheetId,
        range: 'Sheet1!A2',
        valueInputOption: 'RAW',
        resource: { values: rows }
    });
}
Enter fullscreen mode Exit fullscreen mode

Webhook Integration for Real-Time Updates

// Send scraped data to your application via webhook
async function notifyWebhook(data, webhookUrl) {
    const response = await fetch(webhookUrl, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'X-Source': 'yelp-scraper'
        },
        body: JSON.stringify({
            businesses: data,
            scrapedAt: new Date().toISOString(),
            count: data.length
        })
    });

    if (!response.ok) {
        console.error(`Webhook failed: ${response.status}`);
    }
}
Enter fullscreen mode Exit fullscreen mode

Legal and Ethical Considerations

Yelp scraping comes with important legal and ethical responsibilities:

  1. Review Yelp's Terms of Service: Yelp's ToS restricts automated access. Understand the legal landscape before scraping.
  2. Respect robots.txt: Check and follow Yelp's robots.txt directives.
  3. Rate limiting: Never overwhelm Yelp's servers. Use generous delays between requests (3-5 seconds minimum).
  4. Data privacy: Review text and reviewer profiles contain personal information. Handle this data responsibly under GDPR, CCPA, and other privacy laws.
  5. Commercial use: If using scraped data commercially, ensure your use case complies with applicable laws.
  6. Consider the Yelp Fusion API: Yelp offers an official API (Yelp Fusion) that provides structured access to business data. For many use cases, this is the preferred approach.
  7. Attribution: If displaying Yelp data publicly, provide appropriate attribution.

Yelp Fusion API as an Alternative

Before scraping, consider whether the official Yelp Fusion API meets your needs:

const yelp = require('yelp-fusion');
const client = yelp.client('YOUR_API_KEY');

// Search for businesses
const response = await client.search({
    term: 'pizza',
    location: 'New York, NY',
    limit: 50,
    sort_by: 'rating'
});

// Access business details
const businesses = response.jsonBody.businesses;
businesses.forEach(biz => {
    console.log(`${biz.name}: ${biz.rating}/5 (${biz.review_count} reviews)`);
});
Enter fullscreen mode Exit fullscreen mode

The Fusion API provides up to 5,000 API calls per day for free, which may be sufficient for smaller projects.

Conclusion

Yelp scraping is a powerful capability for anyone working with local business data. Whether you're generating leads, conducting competitive analysis, performing market research, or building location intelligence tools, the depth of data available on Yelp makes it an invaluable source.

For production use cases, the most efficient approach is to combine the official Yelp Fusion API for basic business data with specialized scraping actors from the Apify Store for deeper data like full review text, photos, and business attributes that the API doesn't expose.

By using pre-built Apify actors, you eliminate the maintenance burden of keeping up with Yelp's frequent frontend changes, benefit from built-in proxy rotation and anti-detection measures, and can scale your data collection across hundreds of locations with minimal effort.

Start with a focused use case — perhaps monitoring your own business's competitors in a single city — and expand from there as you discover what insights the data can provide. The combination of structured API access, cloud scraping infrastructure, and thoughtful data analysis can give you a significant advantage in understanding and navigating local markets.

Top comments (0)