Tripadvisor has over 1 billion reviews across 8 million listings. Whether you're building a travel aggregator, running competitive analysis for a hotel chain, or doing sentiment research on tourism trends, that data is incredibly valuable.
In this guide, I'll show you how to scrape Tripadvisor hotel listings, reviews, and attraction data using Python in 2026 — including how to handle their aggressive anti-bot protection.
Why Scrape Tripadvisor?
- Competitive intelligence: Track competitor hotel ratings, review counts, and pricing across markets
- Sentiment analysis: Aggregate guest feedback to identify common complaints or praise
- Market research: Analyze attraction popularity and seasonal trends in specific destinations
- Price monitoring: Track nightly rates across hotels in a region
What You'll Need
pip install requests beautifulsoup4 lxml
You'll also need a proxy solution — Tripadvisor uses Cloudflare and aggressive fingerprinting. I recommend ScraperAPI which handles rotation, CAPTCHAs, and headers automatically, or ThorData residential proxies if you want more control.
Step 1: Scrape Hotel Search Results
Tripadvisor's search URLs follow a predictable pattern. Here's how to grab hotel listings for a given city:
import requests
from bs4 import BeautifulSoup
import json
import time
SCRAPER_API_KEY = "YOUR_SCRAPERAPI_KEY"
def scrape_hotels(city_url: str, pages: int = 3) -> list[dict]:
"""Scrape hotel listings from Tripadvisor search results."""
hotels = []
for page in range(pages):
offset = page * 30
url = city_url.replace("-Hotels-", f"-Hotels-oa{offset}-")
# Use ScraperAPI to bypass anti-bot
api_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&render=true"
response = requests.get(api_url, timeout=60)
soup = BeautifulSoup(response.text, "lxml")
# Extract structured data from JSON-LD
for script in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(script.string)
if isinstance(data, dict) and data.get("@type") == "Hotel":
hotels.append({
"name": data.get("name"),
"rating": data.get("aggregateRating", {}).get("ratingValue"),
"review_count": data.get("aggregateRating", {}).get("reviewCount"),
"price_range": data.get("priceRange"),
"address": data.get("address", {}).get("streetAddress"),
})
except json.JSONDecodeError:
continue
time.sleep(2) # Be respectful with request timing
return hotels
# Example: Hotels in Barcelona
hotels = scrape_hotels(
"https://www.tripadvisor.com/Hotels-g187497-Barcelona_Catalonia-Hotels.html"
)
for h in hotels[:5]:
print(f"{h['name']} — {h['rating']}⭐ ({h['review_count']} reviews)")
The key insight here is using JSON-LD structured data that Tripadvisor embeds for SEO. It's cleaner and more reliable than parsing HTML classes that change frequently.
Step 2: Scrape Individual Hotel Reviews
Once you have hotel URLs, you can drill into individual reviews:
def scrape_reviews(hotel_url: str, max_pages: int = 5) -> list[dict]:
"""Scrape reviews for a specific hotel."""
reviews = []
for page in range(max_pages):
offset = page * 10
url = hotel_url.replace("-Reviews-", f"-Reviews-or{offset}-")
api_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&render=true"
response = requests.get(api_url, timeout=60)
soup = BeautifulSoup(response.text, "lxml")
review_cards = soup.select("[data-reviewid]")
for card in review_cards:
title_el = card.select_one(".yCeTE")
body_el = card.select_one(".QewHA span")
rating_el = card.select_one("svg.UctUV title")
date_el = card.select_one(".teHYY")
reviews.append({
"title": title_el.text.strip() if title_el else None,
"body": body_el.text.strip() if body_el else None,
"rating": rating_el.text.split()[0] if rating_el else None,
"date": date_el.text.strip() if date_el else None,
})
time.sleep(2)
return reviews
reviews = scrape_reviews(
"https://www.tripadvisor.com/Hotel_Review-g187497-d228489-Reviews-Hotel_Arts_Barcelona.html"
)
print(f"Scraped {len(reviews)} reviews")
Step 3: Scrape Attractions and Things to Do
Tripadvisor's attraction pages follow a similar structure. Here's how to pull the top things to do in a city:
def scrape_attractions(city_url: str) -> list[dict]:
"""Scrape top attractions for a city."""
api_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={city_url}&render=true"
response = requests.get(api_url, timeout=60)
soup = BeautifulSoup(response.text, "lxml")
attractions = []
for card in soup.select("div.alPVI"):
name_el = card.select_one("header.aJiSb")
rating_el = card.select_one("svg.UctUV title")
count_el = card.select_one("span.biGQs")
attractions.append({
"name": name_el.text.strip() if name_el else None,
"rating": rating_el.text.split()[0] if rating_el else None,
"review_count": count_el.text.strip() if count_el else None,
})
return attractions
attractions = scrape_attractions(
"https://www.tripadvisor.com/Attractions-g187497-Activities-Barcelona_Catalonia.html"
)
for a in attractions[:10]:
print(f"{a['name']} — {a['rating']}⭐ ({a['review_count']})")
Handling Anti-Bot Protection
Tripadvisor is one of the harder sites to scrape in 2026. Here's what you'll encounter:
- Cloudflare challenges — JavaScript challenges that block simple HTTP requests
- Fingerprint detection — They track TLS fingerprints, canvas hashes, and WebGL data
- Rate limiting — Too many requests from one IP will get you blocked fast
- Dynamic selectors — CSS class names change with each deployment
Solutions
- ScraperAPI handles all of this automatically — Cloudflare bypass, residential proxy rotation, and JavaScript rendering. It's the fastest way to get started.
- ThorData residential proxies give you a pool of real residential IPs. Pair them with Playwright for JavaScript rendering if you want full control.
- Pre-built scrapers: If you don't want to maintain your own code, check out ready-made Tripadvisor actors on Apify that handle all the edge cases.
Scaling with Async Requests
For production workloads scraping hundreds of pages, use async to speed things up:
import asyncio
import aiohttp
async def scrape_batch(urls: list[str], concurrency: int = 5):
"""Scrape multiple pages concurrently with rate limiting."""
semaphore = asyncio.Semaphore(concurrency)
results = []
async def fetch(session, url):
async with semaphore:
api_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&render=true"
async with session.get(api_url) as resp:
html = await resp.text()
results.append(html)
await asyncio.sleep(1)
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
await asyncio.gather(*tasks)
return results
Keep concurrency at 3-5 for Tripadvisor. Going higher risks triggering their rate limiters even with rotating proxies.
Legal Considerations
Tripadvisor's Terms of Service prohibit scraping. That said, courts have generally ruled that scraping publicly available data is legal (see hiQ Labs v. LinkedIn). A few ground rules:
- Don't scrape private user data (emails, full names with identifying info)
- Respect robots.txt rate limits
- Don't overload their servers — add delays between requests
- Use the data for analysis, not for republishing their content verbatim
Wrapping Up
Tripadvisor scraping in 2026 is very doable with the right tools. The JSON-LD approach is the cleanest path for structured data, and a proxy service like ScraperAPI or ThorData will save you hours of fighting anti-bot systems.
If you want to skip the code entirely, Apify has pre-built scrapers that run in the cloud with zero infrastructure to manage.
Questions? Drop them in the comments.
Top comments (0)