Max Klein

Posted on Mar 2

How to Scrape Yelp with Python: Step-by-Step Guide 2026

#python #webscraping #tutorial #beginners

How to Scrape Yelp with Python (Step-by-Step Guide 2026)

In 2026, the world of web scraping has evolved dramatically. With more businesses relying on platforms like Yelp for reviews, listings, and customer insights, the ability to extract this data programmatically has become a valuable skill. Whether you're a developer building a competitive analysis tool, a data scientist gathering market trends, or a small business owner optimizing your Yelp presence, scraping Yelp with Python can unlock a treasure trove of information. However, modern websites like Yelp are increasingly complex, using JavaScript to dynamically load content and sophisticated anti-scraping measures. This tutorial will guide you through the process of scraping Yelp with Python in 2026, covering everything from setup to advanced techniques.

Step 1: Setting Up Your Python Environment

Start by creating a new Python file (e.g., yelp_scraper.py) and importing the necessary libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

Warning: Selenium requires a compatible browser driver (e.g., ChromeDriver). Ensure your browser and driver versions match, and keep them updated.

Step 3: Parsing HTML with BeautifulSoup

Once you have the HTML content, use BeautifulSoup to extract data. Here's how to extract restaurant names, ratings, and addresses:

soup = BeautifulSoup(html_content, "lxml")

businesses = soup.find_all("div", class_="business-identifier")

data = []
for business in businesses:
    name = business.find("h2", class_="business-name").text.strip()
    rating = business.find("div", class_="i-stars")
    if rating:
        rating = rating["aria-label"].split(" ")[0]
    else:
        rating = "N/A"
    address = business.find("span", class_="address").text.strip()

    data.append({
        "Name": name,
        "Rating": rating,
        "Address": address
    })

# Convert to a pandas DataFrame
df = pd.DataFrame(data)
print(df.head())

Best Practice: Use lxml as the parser for faster performance. If lxml is unavailable, html.parser is a fallback option.

Step 5: Scraping Multiple Pages and Pagination

Yelp search results are paginated. To scrape all pages, extract the "Next" button's URL and loop through each page:

base_url = "https://www.yelp.com/search?find_desc=Restaurants&find_loc=San+Francisco"
page_number = 1

all_data = []

while True:
    url = f"{base_url}&start={page_number * 10}"
    response = requests.get(url, headers=headers)

    if response.status_code != 200:
        break  # Stop if the page is not found

    soup = BeautifulSoup(response.text, "lxml")
    businesses = soup.find_all("div", class_="business-identifier")

    for business in businesses:
        # Extract data as before...
        pass

    # Check for "Next" button
    next_button = soup.find("a", class_="next")
    if not next_button:
        break

    page_number += 1

# Save all_data to a DataFrame or file

Best Practice: Add a time.sleep(2) between requests to avoid being rate-limited. Use proxies or user-agent rotation for large-scale scraping.

Step 7: Advanced Techniques (Optional)

Using Proxies

To avoid IP blocking, use proxy services like BrightData or Luminati. Example with requests:

proxies = {
    "http": "http://username:password@proxy_ip:port",
    "https": "http://username:password@proxy_ip:port"
}
response = requests.get(url, headers=headers, proxies=proxies)

Rotating User-Agents

Use a list of user-agents and rotate them for each request:

import random

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    # Add more user-agents...
]

headers = {
    "User-Agent": random.choice(user_agents)
}

Next Steps

Now that you've mastered the basics, here are some ideas for further exploration:

Automate Data Collection: Build a script that runs daily to monitor Yelp for new reviews or competitors.
Integrate with a Database: Store scraped data in a PostgreSQL or MongoDB database for advanced analysis.
Use Scrapy: Explore the Scrapy framework for large-scale web scraping projects.
Learn API Usage: Switch to Yelp's official API for legal and efficient data access.
Explore Rate Limiting: Implement advanced techniques like token bucket algorithms to manage request rates.

By continuing to refine your skills, you'll be well-equipped to tackle any web scraping challenge in 2026 and beyond. Happy coding! 🚀

🚀 Need professional web scraping? N3X1S INTELLIGENCE on Fiverr — fast, reliable data extraction from any website.

DEV Community