agenthustler

Posted on Apr 9 • Edited on Apr 19

Yelp Scraping: Extract Local Business Data, Reviews, and Ratings

#webdev #javascript #programming #webscraping

Yelp is one of the most comprehensive sources of local business data on the internet. With over 200 million reviews covering restaurants, shops, services, and more across major cities worldwide, scraping Yelp data opens up opportunities for market research, lead generation, competitive analysis, and location intelligence.

In this guide, we'll explore the structure of Yelp, what data you can extract, how to build scrapers, and how to leverage Apify Store actors to make the process efficient and reliable.

Understanding Yelp's Structure

Yelp organizes its data around several core concepts that are important to understand before you start scraping.

Business Listings

Every business on Yelp has a detailed profile page containing:

Business name, address, and phone number
Category classifications (e.g., "Italian Restaurant", "Auto Repair")
Star rating (1-5, in half-star increments)
Review count
Price range ($ to $$$$)
Hours of operation
Photos uploaded by users and the business
Attributes (outdoor seating, delivery, parking, etc.)
Owner responses to reviews

Search and Discovery

Yelp's search functionality combines location-based queries with category filters:

Location search: "Restaurants near San Francisco, CA"
Category browse: Browse all businesses in a specific category
Keyword search: Free-text search with location context
Filter options: Price, distance, open now, ratings, attributes

Review System

Yelp's review system is the heart of the platform:

Individual reviews: Star rating, text content, date, author
Review highlights: AI-extracted key phrases
Photos within reviews: User-uploaded images tied to reviews
Reactions: Useful, Funny, Cool counts on each review
Owner responses: Business owners can reply to reviews

What Data Can You Extract?

Business Profile Data

Here's the structure of data available from a typical Yelp business listing:

const businessData = {
    // Basic Information
    name: "Joe's Pizza",
    url: "https://www.yelp.com/biz/joes-pizza-new-york",
    phone: "+1-212-555-0123",
    address: {
        street: "7 Carmine St",
        city: "New York",
        state: "NY",
        zip: "10014",
        country: "US"
    },
    coordinates: {
        latitude: 40.7306,
        longitude: -74.0023
    },

    // Ratings and Reviews
    rating: 4.5,
    reviewCount: 12847,
    priceRange: "$",

    // Categories
    categories: ["Pizza", "Italian", "Fast Food"],

    // Attributes
    attributes: {
        delivery: true,
        takeout: true,
        outdoorSeating: true,
        parking: "street",
        wifi: "free",
        goodForGroups: true,
        reservations: false,
        wheelchairAccessible: true
    },

    // Hours
    hours: {
        monday: "10:00 AM - 2:00 AM",
        tuesday: "10:00 AM - 2:00 AM",
        // ...
    },

    // Media
    photoCount: 3456,
    photos: ["url1.jpg", "url2.jpg"]
};

Review Data

Each review contains valuable structured and unstructured data:

const reviewData = {
    author: {
        name: "Sarah M.",
        location: "Brooklyn, NY",
        reviewCount: 47,
        photoCount: 123,
        friends: 89,
        elite: true
    },
    rating: 5,
    date: "2026-02-15",
    text: "Best pizza in NYC, hands down. The classic slice is perfection...",
    photos: ["review_photo1.jpg"],
    reactions: {
        useful: 23,
        funny: 5,
        cool: 8
    },
    businessResponse: {
        text: "Thank you Sarah! Come back anytime.",
        date: "2026-02-16"
    }
};

Search Results Data

When scraping search results, you get a summary view of multiple businesses:

const searchResult = {
    query: "best tacos",
    location: "Austin, TX",
    totalResults: 847,
    businesses: [
        {
            rank: 1,
            name: "Taco Joint",
            rating: 4.5,
            reviewCount: 2341,
            priceRange: "$",
            categories: ["Mexican", "Tacos"],
            neighborhood: "East Austin",
            snippet: "Known for their breakfast tacos and...",
            distance: "0.3 mi"
        }
        // ... more results
    ]
};

Building a Yelp Scraper

Using Crawlee for Structured Scraping

Crawlee provides a powerful framework for building Yelp scrapers that handle pagination, retries, and data extraction:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Extracting Reviews

Reviews require special handling since they're often loaded dynamically and paginated:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Location-Based Scraping

One of Yelp's most powerful features is location-based search. Here's how to scrape businesses across multiple locations:

async function scrapeMultipleLocations(category, locations) {
    const allResults = [];

    for (const location of locations) {
        const searchUrl = `https://www.yelp.com/search?find_desc=${encodeURIComponent(category)}&find_loc=${encodeURIComponent(location)}`;

        console.log(`Scraping ${category} in ${location}...`);

        const results = await scrapeSearchResults(searchUrl);
        allResults.push(...results.map(r => ({ ...r, searchLocation: location })));

        // Respectful delay between location searches
        await new Promise(resolve => setTimeout(resolve, 5000));
    }

    return allResults;
}

// Usage
const locations = [
    'New York, NY',
    'Los Angeles, CA',
    'Chicago, IL',
    'Houston, TX',
    'Phoenix, AZ'
];

const restaurants = await scrapeMultipleLocations('restaurants', locations);
console.log(`Found ${restaurants.length} restaurants across ${locations.length} cities`);

Using Apify Store Actors for Yelp Scraping

Building and maintaining your own Yelp scraper can be time-consuming, especially as Yelp frequently updates its frontend. Pre-built actors from the Apify Store solve this by providing maintained, reliable scraping solutions.

Benefits of Using Apify Actors

No maintenance burden: Actor developers update their code when Yelp changes its website
Built-in proxy rotation: Automatic residential proxy rotation to avoid blocks
Structured output: Clean JSON data ready for analysis
Scalability: Run across hundreds of locations simultaneously
Scheduling: Set up daily or weekly automated runs
Webhooks and integrations: Push data to your systems automatically

Running a Yelp Actor via the Apify API

const { ApifyClient } = require('apify-client');

const client = new ApifyClient({
    token: 'YOUR_APIFY_TOKEN',
});

// Run a Yelp scraper actor
async function scrapeYelp(searchTerms, locations) {
    const run = await client.actor('actor-name/yelp-scraper').call({
        searchTerms: searchTerms,
        locations: locations,
        maxResults: 100,
        includeReviews: true,
        maxReviews: 20,
        proxy: {
            useApifyProxy: true,
            apifyProxyGroups: ['RESIDENTIAL']
        }
    });

    // Fetch results
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    return items;
}

// Example usage
const data = await scrapeYelp(
    ['plumbers', 'electricians'],
    ['Denver, CO', 'Boulder, CO']
);

console.log(`Found ${data.length} businesses`);
data.forEach(biz => {
    console.log(`${biz.name} - ${biz.rating}/5 (${biz.reviewCount} reviews) - ${biz.phone}`);
});

Scheduling Regular Scraping Runs

// Create a scheduled task for weekly Yelp monitoring
const schedule = await client.schedules().create({
    name: 'weekly-competitor-monitoring',
    cronExpression: '0 6 * * MON',  // Every Monday at 6 AM
    actions: [{
        type: 'RUN_ACTOR',
        actorId: 'your-yelp-actor-id',
        runInput: {
            searchTerms: ['your-business-category'],
            locations: ['your-city'],
            maxResults: 200,
            includeReviews: true
        }
    }]
});

Practical Use Cases

1. Lead Generation for Local Services

Extract business contact information for sales outreach:

function generateLeads(businesses, criteria) {
    return businesses
        .filter(b => b.rating >= criteria.minRating)
        .filter(b => b.reviewCount >= criteria.minReviews)
        .filter(b => !b.website || criteria.includeWithWebsite)
        .map(b => ({
            businessName: b.name,
            phone: b.phone,
            address: b.address,
            category: b.categories.join(', '),
            rating: b.rating,
            reviews: b.reviewCount,
            hasWebsite: !!b.website,
            opportunity: !b.website ? 'Needs website' :
                         b.rating < 3 ? 'Reputation management' :
                         'General outreach'
        }));
}

// Find restaurants without websites (potential web design leads)
const leads = generateLeads(scrapedBusinesses, {
    minRating: 3.5,
    minReviews: 10,
    includeWithWebsite: false
});

2. Competitive Analysis Dashboard

Build a monitoring system to track your competitors:

async function competitorDashboard(competitorUrls) {
    const competitors = [];

    for (const url of competitorUrls) {
        const data = await scrapeBusinessDetail(url);
        const recentReviews = await scrapeReviews(url, { limit: 20 });

        competitors.push({
            name: data.name,
            currentRating: data.rating,
            totalReviews: data.reviewCount,
            recentSentiment: calculateSentiment(recentReviews),
            avgRecentRating: recentReviews.reduce((s, r) => s + r.rating, 0) / recentReviews.length,
            reviewTrend: calculateTrend(recentReviews),
            commonComplaints: extractComplaints(recentReviews),
            commonPraise: extractPraise(recentReviews)
        });
    }

    return competitors;
}

function calculateSentiment(reviews) {
    const positive = reviews.filter(r => r.rating >= 4).length;
    const negative = reviews.filter(r => r.rating <= 2).length;
    return {
        positive: (positive / reviews.length * 100).toFixed(1) + '%',
        negative: (negative / reviews.length * 100).toFixed(1) + '%',
        trend: positive > negative ? 'improving' : 'declining'
    };
}

3. Market Research and Location Intelligence

Analyze business density and competition across geographic areas:

async function marketAnalysis(category, cities) {
    const analysis = {};

    for (const city of cities) {
        const businesses = await scrapeYelpSearch(category, city);

        analysis[city] = {
            totalBusinesses: businesses.length,
            avgRating: (businesses.reduce((s, b) => s + b.rating, 0) / businesses.length).toFixed(2),
            avgReviews: Math.round(businesses.reduce((s, b) => s + b.reviewCount, 0) / businesses.length),
            priceDistribution: {
                budget: businesses.filter(b => b.priceRange === '$').length,
                moderate: businesses.filter(b => b.priceRange === '$$').length,
                upscale: businesses.filter(b => b.priceRange === '$$$').length,
                luxury: businesses.filter(b => b.priceRange === '$$$$').length,
            },
            topRated: businesses
                .sort((a, b) => b.rating - a.rating || b.reviewCount - a.reviewCount)
                .slice(0, 5)
                .map(b => `${b.name} (${b.rating}/5, ${b.reviewCount} reviews)`),
            saturation: businesses.length > 100 ? 'High' :
                        businesses.length > 50 ? 'Medium' : 'Low'
        };
    }

    return analysis;
}

4. Review Mining for Product Development

Extract insights from reviews to inform product or service development:

function mineReviews(reviews) {
    // Group reviews by rating
    const byRating = {};
    for (let i = 1; i <= 5; i++) {
        byRating[i] = reviews.filter(r => r.rating === i);
    }

    // Extract frequently mentioned terms in negative reviews
    const negativeTerms = extractFrequentTerms(
        byRating[1].concat(byRating[2]).map(r => r.text)
    );

    // Extract what people love from positive reviews
    const positiveTerms = extractFrequentTerms(
        byRating[4].concat(byRating[5]).map(r => r.text)
    );

    // Find reviews mentioning specific topics
    const topicAnalysis = {
        service: reviews.filter(r => /service|staff|waiter|server/i.test(r.text)),
        food: reviews.filter(r => /food|taste|flavor|dish|meal/i.test(r.text)),
        ambiance: reviews.filter(r => /ambiance|atmosphere|decor|vibe/i.test(r.text)),
        value: reviews.filter(r => /price|expensive|cheap|worth|value/i.test(r.text)),
        wait: reviews.filter(r => /wait|slow|fast|quick|time/i.test(r.text)),
    };

    return {
        totalAnalyzed: reviews.length,
        ratingDistribution: Object.fromEntries(
            Object.entries(byRating).map(([k, v]) => [k, v.length])
        ),
        topComplaints: negativeTerms.slice(0, 15),
        topPraise: positiveTerms.slice(0, 15),
        topicBreakdown: Object.fromEntries(
            Object.entries(topicAnalysis).map(([topic, revs]) => [
                topic,
                {
                    mentions: revs.length,
                    avgRating: (revs.reduce((s, r) => s + r.rating, 0) / revs.length).toFixed(1)
                }
            ])
        )
    };
}

Data Export and Integration

Exporting to Multiple Formats

const { stringify } = require('csv-stringify/sync');
const fs = require('fs');

// Export businesses to CSV
function exportBusinessesCSV(businesses, filename) {
    const csv = stringify(businesses, {
        header: true,
        columns: [
            'name', 'rating', 'reviewCount', 'priceRange',
            'phone', 'address', 'city', 'state', 'zip',
            'categories', 'website', 'latitude', 'longitude'
        ]
    });
    fs.writeFileSync(filename, csv);
    console.log(`Exported ${businesses.length} businesses to ${filename}`);
}

// Export to Google Sheets via API
async function exportToGoogleSheets(businesses, spreadsheetId) {
    const { google } = require('googleapis');
    const sheets = google.sheets({ version: 'v4' });

    const rows = businesses.map(b => [
        b.name, b.rating, b.reviewCount, b.priceRange,
        b.phone, b.address, b.categories.join('; ')
    ]);

    await sheets.spreadsheets.values.update({
        spreadsheetId,
        range: 'Sheet1!A2',
        valueInputOption: 'RAW',
        resource: { values: rows }
    });
}

Webhook Integration for Real-Time Updates

// Send scraped data to your application via webhook
async function notifyWebhook(data, webhookUrl) {
    const response = await fetch(webhookUrl, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'X-Source': 'yelp-scraper'
        },
        body: JSON.stringify({
            businesses: data,
            scrapedAt: new Date().toISOString(),
            count: data.length
        })
    });

    if (!response.ok) {
        console.error(`Webhook failed: ${response.status}`);
    }
}

Legal and Ethical Considerations

Yelp scraping comes with important legal and ethical responsibilities:

Review Yelp's Terms of Service: Yelp's ToS restricts automated access. Understand the legal landscape before scraping.
Respect robots.txt: Check and follow Yelp's robots.txt directives.
Rate limiting: Never overwhelm Yelp's servers. Use generous delays between requests (3-5 seconds minimum).
Data privacy: Review text and reviewer profiles contain personal information. Handle this data responsibly under GDPR, CCPA, and other privacy laws.
Commercial use: If using scraped data commercially, ensure your use case complies with applicable laws.
Consider the Yelp Fusion API: Yelp offers an official API (Yelp Fusion) that provides structured access to business data. For many use cases, this is the preferred approach.
Attribution: If displaying Yelp data publicly, provide appropriate attribution.

Yelp Fusion API as an Alternative

Before scraping, consider whether the official Yelp Fusion API meets your needs:

const yelp = require('yelp-fusion');
const client = yelp.client('YOUR_API_KEY');

// Search for businesses
const response = await client.search({
    term: 'pizza',
    location: 'New York, NY',
    limit: 50,
    sort_by: 'rating'
});

// Access business details
const businesses = response.jsonBody.businesses;
businesses.forEach(biz => {
    console.log(`${biz.name}: ${biz.rating}/5 (${biz.review_count} reviews)`);
});

The Fusion API provides up to 5,000 API calls per day for free, which may be sufficient for smaller projects.

Conclusion

Yelp scraping is a powerful capability for anyone working with local business data. Whether you're generating leads, conducting competitive analysis, performing market research, or building location intelligence tools, the depth of data available on Yelp makes it an invaluable source.

For production use cases, the most efficient approach is to combine the official Yelp Fusion API for basic business data with specialized scraping actors from the Apify Store for deeper data like full review text, photos, and business attributes that the API doesn't expose.

By using pre-built Apify actors, you eliminate the maintenance burden of keeping up with Yelp's frequent frontend changes, benefit from built-in proxy rotation and anti-detection measures, and can scale your data collection across hundreds of locations with minimal effort.

Start with a focused use case — perhaps monitoring your own business's competitors in a single city — and expand from there as you discover what insights the data can provide. The combination of structured API access, cloud scraping infrastructure, and thoughtful data analysis can give you a significant advantage in understanding and navigating local markets.

DEV Community