AlterLab

Posted on Jun 24 • Originally published at alterlab.io

How to Scrape Yelp Data: Complete Guide for 2026

#javascript #headlessbrowsers #antibot #automation

How to Scrape Yelp Data: Complete Guide for 2026

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

To scrape Yelp with Python, use AlterLab’s API to render JavaScript, extract public business details via CSS selectors, and respect rate limits. A single request returns clean HTML you can parse with BeautifulSoup or lxml.

Why collect local data from Yelp?

Yelp hosts a wealth of public business information useful for several engineering workflows:

Market research: Track competitor listings, review counts, and rating trends across categories.
Price monitoring: Extract menu items or service prices from restaurant and salon pages for dynamic pricing models.
Data enrichment: Augment internal databases with business hours, location coordinates, and category tags for local search features.

These use cases rely solely on data visible on public pages—no login or private data required.

Technical challenges

Yelp’s modern site presents three core obstacles for scrapers:

JavaScript‑heavy rendering: Business details load client‑side, so a plain requests.get returns an empty container.
Rate limiting & IP bans: Exceeding a modest request threshold triggers temporary blocks or CAPTCHAs.
Bot detection headers: The server checks for typical automation signatures (missing user‑agent, lack of TLS fingerprinting).

Raw HTTP clients fail because they cannot execute the page’s React hydrate cycle. AlterLab’s Smart Rendering API solves this by launching a headless browser, applying rotating proxies, and waiting for network idle before returning the fully rendered DOM.

Quick start with AlterLab API

First, install the official Python SDK (see the Getting started guide for full setup). Then authenticate and scrape a public Yelp page.

```python title="scrape_yelp-com.py" {3-5}

client = alterlab.Client("YOUR_API_KEY")

Target a public business page – no login required

response = client.scrape(
url="https://www.yelp.com/biz/example-restaurant-san-francisco",
params={"render": True, "wait_for": "networkidle"}
)
print(response.status_code) # 200 if successful
html = response.text




The equivalent cURL request looks like this:



```bash title="Terminal"
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.yelp.com/biz/example-restaurant-san-francisco",
    "render": true,
    "wait_for": "networkidle"
  }'

Both examples ask AlterLab to render the page (render: true) and wait until network activity settles, ensuring the business name, rating, and address are present in the returned HTML.

Extracting structured data

Once you have the HTML, use a parser to pull the fields you need. Below are CSS selectors for common public data points on a Yelp business page (as of 2026). Adjust if the class names change.

```python title="parse_yelp.py" {4-10}
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

Business name – typically in an h1 with a specific data‑test attribute

name_tag = soup.select_one('h1[data-testid="business-name"]')
business_name = name_tag.get_text(strip=True) if name_tag else None

Rating – often stored in a div with aria-label

rating_tag = soup.select_one('div[role="img"][aria-label*="star rating"]')
rating = rating_tag["aria-label"].split()[0] if rating_tag else None

Review count – adjacent to the rating

review_tag = soup.select_one('p[class*="review-count"]')
review_count = review_tag.get_text(strip=True).split()[0] if review_tag else None

Address – first line of the address block

address_tag = soup.select_one('address p')
address = address_tag.get_text(strip=True) if address_tag else None

print({
"business_name": business_name,
"rating": rating,
"review_count": review_count,
"address": address
})




If you prefer JSON‑style extraction, AlterLab can return structured data directly via its Cortex AI add‑on, but the CSS approach works for pure HTML output.

## Best practices
Scraping responsibly keeps your pipelines running smoothly and respects the target site:
- **Rate limit yourself**: Even with AlterLab’s proxy pool, send no more than 2–3 requests per second per IP to avoid triggering Yelp’s anti‑bot thresholds.
- **Honor robots.txt**: Check `https://www.yelp.com/robots.txt` for disallowed paths (e.g., `/ajax/*`, `/user/*`). Stick to `/biz/*` and `/search/*` for public data.
- **Handle dynamic content**: Use AlterLab’s `wait_for` parameter (`networkidle` or a specific selector) to ensure the DOM is ready before extracting.
- **Rotate user‑agents**: Though AlterLab does this automatically, if you build a custom scraper, rotate a list of realistic browser strings.
- **Log failures**: Capture HTTP 429 or 503 responses and implement exponential backoff.

Following these rules reduces the chance of temporary bans and keeps your data fresh.

## Scaling up
When you need to scrape hundreds or thousands of Yelp pages, consider these patterns:
- **Batch requests**: Send multiple URLs in a single API call using AlterLab’s `batch` endpoint (up to 20 URLs per request) to cut connection overhead.
- **Scheduling**: Use the platform’s cron feature to run a nightly scrape of a changing dataset (e.g., new restaurant openings).
- **Cost awareness**: Review the [pricing](/pricing) page to estimate monthly spend based on your request volume and rendering tier. AlterLab’s pay‑as‑you‑go model means you only pay for successful scrapes.
- **Storage**: Stream results directly to a data warehouse or object store; avoid holding large HTML strings in memory longer than necessary.

A typical scaling workflow might look like:

<div data-infographic="steps">
  <div data-step data-number="1" data-title="Prepare URL list" data-description="Generate Yelp biz URLs from a CSV of categories or zip codes."/>
  <div data-step data-number="2" data-title="Batch scrape" data-description="Send groups of 20 URLs to AlterLab with render:true."/>
  <div data-step data-number="3" data-title="Parse & store" data-description="Extract name, rating, address; insert into Postgres or BigQuery."/>
  <div data-step data-number="4" data-title="Handle errors" data-description="Retry failed items with a longer backoff; alert on persistent 429s."/>
</div>

## Key takeaways
- Use AlterLab’s headless browser rendering to bypass Yelp’s JavaScript and anti‑bot measures.
- Extract only publicly visible fields with reliable CSS selectors; avoid scraping behind login walls.
- Apply polite rate limits, respect robots.txt, and log errors to maintain a sustainable scraper.
- Leverage batching and scheduling to scale efficiently while monitoring cost via AlterLab’s pricing page.

Hit reply if you have questions.

DEV Community

How to Scrape Yelp Data: Complete Guide for 2026

How to Scrape Yelp Data: Complete Guide for 2026

TL;DR

Why collect local data from Yelp?

Technical challenges

Quick start with AlterLab API

Target a public business page – no login required

Extracting structured data

Business name – typically in an h1 with a specific data‑test attribute

Rating – often stored in a div with aria-label

Review count – adjacent to the rating

Address – first line of the address block

Top comments (0)