Yelp is one of the most comprehensive sources of local business data on the internet. With over 200 million reviews covering restaurants, shops, services, and more across major cities worldwide, scraping Yelp data opens up opportunities for market research, lead generation, competitive analysis, and location intelligence.
In this guide, we'll explore the structure of Yelp, what data you can extract, how to build scrapers, and how to leverage Apify Store actors to make the process efficient and reliable.
Understanding Yelp's Structure
Yelp organizes its data around several core concepts that are important to understand before you start scraping.
Business Listings
Every business on Yelp has a detailed profile page containing:
- Business name, address, and phone number
- Category classifications (e.g., "Italian Restaurant", "Auto Repair")
- Star rating (1-5, in half-star increments)
- Review count
- Price range ($ to $$$$)
- Hours of operation
- Photos uploaded by users and the business
- Attributes (outdoor seating, delivery, parking, etc.)
- Owner responses to reviews
Search and Discovery
Yelp's search functionality combines location-based queries with category filters:
- Location search: "Restaurants near San Francisco, CA"
- Category browse: Browse all businesses in a specific category
- Keyword search: Free-text search with location context
- Filter options: Price, distance, open now, ratings, attributes
Review System
Yelp's review system is the heart of the platform:
- Individual reviews: Star rating, text content, date, author
- Review highlights: AI-extracted key phrases
- Photos within reviews: User-uploaded images tied to reviews
- Reactions: Useful, Funny, Cool counts on each review
- Owner responses: Business owners can reply to reviews
What Data Can You Extract?
Business Profile Data
Here's the structure of data available from a typical Yelp business listing:
const businessData = {
// Basic Information
name: "Joe's Pizza",
url: "https://www.yelp.com/biz/joes-pizza-new-york",
phone: "+1-212-555-0123",
address: {
street: "7 Carmine St",
city: "New York",
state: "NY",
zip: "10014",
country: "US"
},
coordinates: {
latitude: 40.7306,
longitude: -74.0023
},
// Ratings and Reviews
rating: 4.5,
reviewCount: 12847,
priceRange: "$",
// Categories
categories: ["Pizza", "Italian", "Fast Food"],
// Attributes
attributes: {
delivery: true,
takeout: true,
outdoorSeating: true,
parking: "street",
wifi: "free",
goodForGroups: true,
reservations: false,
wheelchairAccessible: true
},
// Hours
hours: {
monday: "10:00 AM - 2:00 AM",
tuesday: "10:00 AM - 2:00 AM",
// ...
},
// Media
photoCount: 3456,
photos: ["url1.jpg", "url2.jpg"]
};
Review Data
Each review contains valuable structured and unstructured data:
const reviewData = {
author: {
name: "Sarah M.",
location: "Brooklyn, NY",
reviewCount: 47,
photoCount: 123,
friends: 89,
elite: true
},
rating: 5,
date: "2026-02-15",
text: "Best pizza in NYC, hands down. The classic slice is perfection...",
photos: ["review_photo1.jpg"],
reactions: {
useful: 23,
funny: 5,
cool: 8
},
businessResponse: {
text: "Thank you Sarah! Come back anytime.",
date: "2026-02-16"
}
};
Search Results Data
When scraping search results, you get a summary view of multiple businesses:
const searchResult = {
query: "best tacos",
location: "Austin, TX",
totalResults: 847,
businesses: [
{
rank: 1,
name: "Taco Joint",
rating: 4.5,
reviewCount: 2341,
priceRange: "$",
categories: ["Mexican", "Tacos"],
neighborhood: "East Austin",
snippet: "Known for their breakfast tacos and...",
distance: "0.3 mi"
}
// ... more results
]
};
Building a Yelp Scraper
Using Crawlee for Structured Scraping
Crawlee provides a powerful framework for building Yelp scrapers that handle pagination, retries, and data extraction:
// yelp-scraper.js
const { CheerioCrawler, Dataset } = require('crawlee');
const crawler = new CheerioCrawler({
maxConcurrency: 2,
sameDomainDelaySecs: 3, // Respect Yelp's servers
async requestHandler({ request, $, enqueueLinks }) {
const url = request.url;
if (request.label === 'SEARCH') {
// Parse search results page
const businesses = [];
$('[data-testid="serp-ia-card"]').each((i, el) => {
const $el = $(el);
businesses.push({
name: $el.find('a[href*="/biz/"]').first().text().trim(),
url: 'https://www.yelp.com' + $el.find('a[href*="/biz/"]').attr('href'),
rating: parseFloat($el.find('[aria-label*="star rating"]').attr('aria-label')),
reviewCount: parseInt($el.find('.reviewCount').text()),
categories: $el.find('.priceCategory span').map((i, s) => $(s).text()).get(),
});
});
await Dataset.pushData(businesses);
// Follow pagination
const nextPage = $('a[aria-label="Next"]').attr('href');
if (nextPage) {
await crawler.addRequests([{
url: 'https://www.yelp.com' + nextPage,
label: 'SEARCH'
}]);
}
}
if (request.label === 'DETAIL') {
// Parse business detail page
const businessDetail = {
name: $('h1').text().trim(),
rating: parseFloat($('[aria-label*="star rating"]').first().attr('aria-label')),
reviewCount: parseInt($('[data-testid="review-count"]').text()),
address: $('address').text().trim(),
phone: $('[href^="tel:"]').text().trim(),
categories: $('[data-testid="categories"] a').map((i, el) => $(el).text()).get(),
priceRange: $('[data-testid="price-range"]').text().trim(),
website: $('a[href*="biz_redir"]').attr('href'),
hours: extractHours($),
photos: $('img[src*="bphoto"]').map((i, el) => $(el).attr('src')).get(),
url: request.url,
scrapedAt: new Date().toISOString()
};
await Dataset.pushData(businessDetail);
}
},
});
function extractHours($) {
const hours = {};
$('table.hours-table tr').each((i, row) => {
const day = $(row).find('th').text().trim();
const time = $(row).find('td').text().trim();
hours[day] = time;
});
return hours;
}
// Start with a search
await crawler.run([{
url: 'https://www.yelp.com/search?find_desc=restaurants&find_loc=San+Francisco%2C+CA',
label: 'SEARCH'
}]);
Extracting Reviews
Reviews require special handling since they're often loaded dynamically and paginated:
const { PlaywrightCrawler, Dataset } = require('crawlee');
const reviewCrawler = new PlaywrightCrawler({
maxConcurrency: 1,
navigationTimeoutSecs: 60,
async requestHandler({ page, request }) {
// Wait for reviews to load
await page.waitForSelector('[data-testid="review"]', { timeout: 15000 });
const reviews = await page.$$eval('[data-testid="review"]', (elements) => {
return elements.map(el => ({
author: el.querySelector('.user-name')?.textContent?.trim(),
rating: el.querySelector('[aria-label*="star"]')?.getAttribute('aria-label'),
date: el.querySelector('.date')?.textContent?.trim(),
text: el.querySelector('.comment p')?.textContent?.trim(),
useful: parseInt(el.querySelector('[data-testid="useful"] span')?.textContent || '0'),
funny: parseInt(el.querySelector('[data-testid="funny"] span')?.textContent || '0'),
cool: parseInt(el.querySelector('[data-testid="cool"] span')?.textContent || '0'),
}));
});
await Dataset.pushData(reviews);
// Check for next page of reviews
const nextButton = await page.$('a.next-link');
if (nextButton) {
const nextUrl = await nextButton.getAttribute('href');
await reviewCrawler.addRequests([{
url: 'https://www.yelp.com' + nextUrl,
label: 'REVIEWS'
}]);
}
},
});
Location-Based Scraping
One of Yelp's most powerful features is location-based search. Here's how to scrape businesses across multiple locations:
async function scrapeMultipleLocations(category, locations) {
const allResults = [];
for (const location of locations) {
const searchUrl = `https://www.yelp.com/search?find_desc=${encodeURIComponent(category)}&find_loc=${encodeURIComponent(location)}`;
console.log(`Scraping ${category} in ${location}...`);
const results = await scrapeSearchResults(searchUrl);
allResults.push(...results.map(r => ({ ...r, searchLocation: location })));
// Respectful delay between location searches
await new Promise(resolve => setTimeout(resolve, 5000));
}
return allResults;
}
// Usage
const locations = [
'New York, NY',
'Los Angeles, CA',
'Chicago, IL',
'Houston, TX',
'Phoenix, AZ'
];
const restaurants = await scrapeMultipleLocations('restaurants', locations);
console.log(`Found ${restaurants.length} restaurants across ${locations.length} cities`);
Using Apify Store Actors for Yelp Scraping
Building and maintaining your own Yelp scraper can be time-consuming, especially as Yelp frequently updates its frontend. Pre-built actors from the Apify Store solve this by providing maintained, reliable scraping solutions.
Benefits of Using Apify Actors
- No maintenance burden: Actor developers update their code when Yelp changes its website
- Built-in proxy rotation: Automatic residential proxy rotation to avoid blocks
- Structured output: Clean JSON data ready for analysis
- Scalability: Run across hundreds of locations simultaneously
- Scheduling: Set up daily or weekly automated runs
- Webhooks and integrations: Push data to your systems automatically
Running a Yelp Actor via the Apify API
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN',
});
// Run a Yelp scraper actor
async function scrapeYelp(searchTerms, locations) {
const run = await client.actor('actor-name/yelp-scraper').call({
searchTerms: searchTerms,
locations: locations,
maxResults: 100,
includeReviews: true,
maxReviews: 20,
proxy: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL']
}
});
// Fetch results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
return items;
}
// Example usage
const data = await scrapeYelp(
['plumbers', 'electricians'],
['Denver, CO', 'Boulder, CO']
);
console.log(`Found ${data.length} businesses`);
data.forEach(biz => {
console.log(`${biz.name} - ${biz.rating}/5 (${biz.reviewCount} reviews) - ${biz.phone}`);
});
Scheduling Regular Scraping Runs
// Create a scheduled task for weekly Yelp monitoring
const schedule = await client.schedules().create({
name: 'weekly-competitor-monitoring',
cronExpression: '0 6 * * MON', // Every Monday at 6 AM
actions: [{
type: 'RUN_ACTOR',
actorId: 'your-yelp-actor-id',
runInput: {
searchTerms: ['your-business-category'],
locations: ['your-city'],
maxResults: 200,
includeReviews: true
}
}]
});
Practical Use Cases
1. Lead Generation for Local Services
Extract business contact information for sales outreach:
function generateLeads(businesses, criteria) {
return businesses
.filter(b => b.rating >= criteria.minRating)
.filter(b => b.reviewCount >= criteria.minReviews)
.filter(b => !b.website || criteria.includeWithWebsite)
.map(b => ({
businessName: b.name,
phone: b.phone,
address: b.address,
category: b.categories.join(', '),
rating: b.rating,
reviews: b.reviewCount,
hasWebsite: !!b.website,
opportunity: !b.website ? 'Needs website' :
b.rating < 3 ? 'Reputation management' :
'General outreach'
}));
}
// Find restaurants without websites (potential web design leads)
const leads = generateLeads(scrapedBusinesses, {
minRating: 3.5,
minReviews: 10,
includeWithWebsite: false
});
2. Competitive Analysis Dashboard
Build a monitoring system to track your competitors:
async function competitorDashboard(competitorUrls) {
const competitors = [];
for (const url of competitorUrls) {
const data = await scrapeBusinessDetail(url);
const recentReviews = await scrapeReviews(url, { limit: 20 });
competitors.push({
name: data.name,
currentRating: data.rating,
totalReviews: data.reviewCount,
recentSentiment: calculateSentiment(recentReviews),
avgRecentRating: recentReviews.reduce((s, r) => s + r.rating, 0) / recentReviews.length,
reviewTrend: calculateTrend(recentReviews),
commonComplaints: extractComplaints(recentReviews),
commonPraise: extractPraise(recentReviews)
});
}
return competitors;
}
function calculateSentiment(reviews) {
const positive = reviews.filter(r => r.rating >= 4).length;
const negative = reviews.filter(r => r.rating <= 2).length;
return {
positive: (positive / reviews.length * 100).toFixed(1) + '%',
negative: (negative / reviews.length * 100).toFixed(1) + '%',
trend: positive > negative ? 'improving' : 'declining'
};
}
3. Market Research and Location Intelligence
Analyze business density and competition across geographic areas:
async function marketAnalysis(category, cities) {
const analysis = {};
for (const city of cities) {
const businesses = await scrapeYelpSearch(category, city);
analysis[city] = {
totalBusinesses: businesses.length,
avgRating: (businesses.reduce((s, b) => s + b.rating, 0) / businesses.length).toFixed(2),
avgReviews: Math.round(businesses.reduce((s, b) => s + b.reviewCount, 0) / businesses.length),
priceDistribution: {
budget: businesses.filter(b => b.priceRange === '$').length,
moderate: businesses.filter(b => b.priceRange === '$$').length,
upscale: businesses.filter(b => b.priceRange === '$$$').length,
luxury: businesses.filter(b => b.priceRange === '$$$$').length,
},
topRated: businesses
.sort((a, b) => b.rating - a.rating || b.reviewCount - a.reviewCount)
.slice(0, 5)
.map(b => `${b.name} (${b.rating}/5, ${b.reviewCount} reviews)`),
saturation: businesses.length > 100 ? 'High' :
businesses.length > 50 ? 'Medium' : 'Low'
};
}
return analysis;
}
4. Review Mining for Product Development
Extract insights from reviews to inform product or service development:
function mineReviews(reviews) {
// Group reviews by rating
const byRating = {};
for (let i = 1; i <= 5; i++) {
byRating[i] = reviews.filter(r => r.rating === i);
}
// Extract frequently mentioned terms in negative reviews
const negativeTerms = extractFrequentTerms(
byRating[1].concat(byRating[2]).map(r => r.text)
);
// Extract what people love from positive reviews
const positiveTerms = extractFrequentTerms(
byRating[4].concat(byRating[5]).map(r => r.text)
);
// Find reviews mentioning specific topics
const topicAnalysis = {
service: reviews.filter(r => /service|staff|waiter|server/i.test(r.text)),
food: reviews.filter(r => /food|taste|flavor|dish|meal/i.test(r.text)),
ambiance: reviews.filter(r => /ambiance|atmosphere|decor|vibe/i.test(r.text)),
value: reviews.filter(r => /price|expensive|cheap|worth|value/i.test(r.text)),
wait: reviews.filter(r => /wait|slow|fast|quick|time/i.test(r.text)),
};
return {
totalAnalyzed: reviews.length,
ratingDistribution: Object.fromEntries(
Object.entries(byRating).map(([k, v]) => [k, v.length])
),
topComplaints: negativeTerms.slice(0, 15),
topPraise: positiveTerms.slice(0, 15),
topicBreakdown: Object.fromEntries(
Object.entries(topicAnalysis).map(([topic, revs]) => [
topic,
{
mentions: revs.length,
avgRating: (revs.reduce((s, r) => s + r.rating, 0) / revs.length).toFixed(1)
}
])
)
};
}
Data Export and Integration
Exporting to Multiple Formats
const { stringify } = require('csv-stringify/sync');
const fs = require('fs');
// Export businesses to CSV
function exportBusinessesCSV(businesses, filename) {
const csv = stringify(businesses, {
header: true,
columns: [
'name', 'rating', 'reviewCount', 'priceRange',
'phone', 'address', 'city', 'state', 'zip',
'categories', 'website', 'latitude', 'longitude'
]
});
fs.writeFileSync(filename, csv);
console.log(`Exported ${businesses.length} businesses to ${filename}`);
}
// Export to Google Sheets via API
async function exportToGoogleSheets(businesses, spreadsheetId) {
const { google } = require('googleapis');
const sheets = google.sheets({ version: 'v4' });
const rows = businesses.map(b => [
b.name, b.rating, b.reviewCount, b.priceRange,
b.phone, b.address, b.categories.join('; ')
]);
await sheets.spreadsheets.values.update({
spreadsheetId,
range: 'Sheet1!A2',
valueInputOption: 'RAW',
resource: { values: rows }
});
}
Webhook Integration for Real-Time Updates
// Send scraped data to your application via webhook
async function notifyWebhook(data, webhookUrl) {
const response = await fetch(webhookUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Source': 'yelp-scraper'
},
body: JSON.stringify({
businesses: data,
scrapedAt: new Date().toISOString(),
count: data.length
})
});
if (!response.ok) {
console.error(`Webhook failed: ${response.status}`);
}
}
Legal and Ethical Considerations
Yelp scraping comes with important legal and ethical responsibilities:
- Review Yelp's Terms of Service: Yelp's ToS restricts automated access. Understand the legal landscape before scraping.
- Respect robots.txt: Check and follow Yelp's robots.txt directives.
- Rate limiting: Never overwhelm Yelp's servers. Use generous delays between requests (3-5 seconds minimum).
- Data privacy: Review text and reviewer profiles contain personal information. Handle this data responsibly under GDPR, CCPA, and other privacy laws.
- Commercial use: If using scraped data commercially, ensure your use case complies with applicable laws.
- Consider the Yelp Fusion API: Yelp offers an official API (Yelp Fusion) that provides structured access to business data. For many use cases, this is the preferred approach.
- Attribution: If displaying Yelp data publicly, provide appropriate attribution.
Yelp Fusion API as an Alternative
Before scraping, consider whether the official Yelp Fusion API meets your needs:
const yelp = require('yelp-fusion');
const client = yelp.client('YOUR_API_KEY');
// Search for businesses
const response = await client.search({
term: 'pizza',
location: 'New York, NY',
limit: 50,
sort_by: 'rating'
});
// Access business details
const businesses = response.jsonBody.businesses;
businesses.forEach(biz => {
console.log(`${biz.name}: ${biz.rating}/5 (${biz.review_count} reviews)`);
});
The Fusion API provides up to 5,000 API calls per day for free, which may be sufficient for smaller projects.
Conclusion
Yelp scraping is a powerful capability for anyone working with local business data. Whether you're generating leads, conducting competitive analysis, performing market research, or building location intelligence tools, the depth of data available on Yelp makes it an invaluable source.
For production use cases, the most efficient approach is to combine the official Yelp Fusion API for basic business data with specialized scraping actors from the Apify Store for deeper data like full review text, photos, and business attributes that the API doesn't expose.
By using pre-built Apify actors, you eliminate the maintenance burden of keeping up with Yelp's frequent frontend changes, benefit from built-in proxy rotation and anti-detection measures, and can scale your data collection across hundreds of locations with minimal effort.
Start with a focused use case — perhaps monitoring your own business's competitors in a single city — and expand from there as you discover what insights the data can provide. The combination of structured API access, cloud scraping infrastructure, and thoughtful data analysis can give you a significant advantage in understanding and navigating local markets.
Top comments (0)