Travel sites like Booking.com and Hotels.com display thousands of hotel listings with prices that change constantly. Scraping this data lets you build price trackers, comparison tools, and travel analytics dashboards. Here's how to do it with Python.
Why Scrape Travel Sites?
- Price monitoring — track hotel rates over time to find the best deals
- Market research — analyze pricing patterns across regions and seasons
- Comparison tools — build apps that show the cheapest option across platforms
- Revenue management — hotels use competitor data to optimize their own pricing
The Challenge
Travel sites are heavily protected. They use JavaScript rendering, CAPTCHAs, rate limiting, and bot detection. You'll need proxies and potentially a headless browser.
Setting Up
pip install requests beautifulsoup4 pandas
Scraping Booking.com Search Results
import requests
from bs4 import BeautifulSoup
import json
def scrape_booking(city, checkin, checkout, api_key):
url = f"https://www.booking.com/searchresults.html?ss={city}&checkin={checkin}&checkout={checkout}&group_adults=2"
# Use proxy to handle anti-bot measures
proxy_url = f"http://api.scraperapi.com?api_key={api_key}&url={url}&render=true"
response = requests.get(proxy_url)
soup = BeautifulSoup(response.text, "html.parser")
hotels = []
property_cards = soup.find_all("div", {"data-testid": "property-card"})
for card in property_cards:
name_el = card.find("div", {"data-testid": "title"})
price_el = card.find("span", {"data-testid": "price-and-discounted-price"})
rating_el = card.find("div", {"data-testid": "review-score"})
hotels.append({
"name": name_el.get_text(strip=True) if name_el else "N/A",
"price": price_el.get_text(strip=True) if price_el else "N/A",
"rating": rating_el.get_text(strip=True) if rating_el else "N/A",
"source": "booking.com"
})
return hotels
results = scrape_booking("Paris", "2026-04-01", "2026-04-03", "YOUR_API_KEY")
for hotel in results[:5]:
print(f"{hotel['name']} - {hotel['price']} - Rating: {hotel['rating']}")
Scraping Hotels.com
def scrape_hotels_com(city, checkin, checkout, api_key):
url = f"https://www.hotels.com/Hotel-Search?destination={city}&startDate={checkin}&endDate={checkout}&adults=2"
proxy_url = f"http://api.scraperapi.com?api_key={api_key}&url={url}&render=true"
response = requests.get(proxy_url)
soup = BeautifulSoup(response.text, "html.parser")
hotels = []
listings = soup.find_all("div", {"data-testid": "lodging-card-responsive"})
for listing in listings:
name = listing.find("h3")
price = listing.find("div", class_=lambda c: c and "price" in c.lower() if c else False)
rating = listing.find("span", class_=lambda c: c and "rating" in c.lower() if c else False)
hotels.append({
"name": name.get_text(strip=True) if name else "N/A",
"price": price.get_text(strip=True) if price else "N/A",
"rating": rating.get_text(strip=True) if rating else "N/A",
"source": "hotels.com"
})
return hotels
Building a Price Comparison Engine
import pandas as pd
from datetime import datetime
class TravelPriceTracker:
def __init__(self, api_key):
self.api_key = api_key
self.history = []
def compare_prices(self, city, checkin, checkout):
booking_results = scrape_booking(city, checkin, checkout, self.api_key)
hotels_results = scrape_hotels_com(city, checkin, checkout, self.api_key)
all_results = booking_results + hotels_results
for result in all_results:
result["scraped_at"] = datetime.now().isoformat()
result["city"] = city
result["checkin"] = checkin
result["checkout"] = checkout
self.history.extend(all_results)
return all_results
def find_best_deal(self, results):
def parse_price(price_str):
try:
return float(price_str.replace("$", "").replace(",", "").replace("US", "").strip())
except (ValueError, AttributeError):
return float("inf")
sorted_results = sorted(results, key=lambda x: parse_price(x["price"]))
return sorted_results[0] if sorted_results else None
def save_history(self, filename="travel_prices.csv"):
df = pd.DataFrame(self.history)
df.to_csv(filename, index=False)
tracker = TravelPriceTracker(api_key="YOUR_KEY")
results = tracker.compare_prices("London", "2026-05-01", "2026-05-03")
best = tracker.find_best_deal(results)
print(f"Best deal: {best['name']} at {best['price']} on {best['source']}")
Price History Tracking
import schedule
import time
def daily_price_check():
cities = ["Paris", "London", "Tokyo", "New York"]
tracker = TravelPriceTracker(api_key="YOUR_KEY")
for city in cities:
results = tracker.compare_prices(city, "2026-06-01", "2026-06-03")
print(f"{city}: {len(results)} hotels found")
time.sleep(5)
tracker.save_history()
schedule.every().day.at("06:00").do(daily_price_check)
while True:
schedule.run_pending()
time.sleep(60)
Handling Anti-Bot Protection
Travel sites invest heavily in bot detection. Here's what you need:
-
Proxy rotation — ScraperAPI handles this automatically with their
render=trueparameter for JavaScript-heavy pages - Residential proxies — ThorData provides residential IPs that look like real users
- Request spacing — add 3-5 second delays between requests
- User-Agent rotation — cycle through real browser user agents
- Session management — use cookies to maintain realistic browsing sessions
Data Analysis
import pandas as pd
df = pd.read_csv("travel_prices.csv")
df["price_num"] = df["price"].str.extract(r"(\d+)").astype(float)
avg_by_source = df.groupby("source")["price_num"].mean()
print("Average prices by platform:")
print(avg_by_source)
avg_by_city = df.groupby("city")["price_num"].mean().sort_values()
print("\nCheapest cities:")
print(avg_by_city)
Monitoring with ScrapeOps
For production scrapers, use ScrapeOps to monitor success rates, response times, and costs across your scraping jobs. Their dashboard shows you exactly which scrapers need attention.
Legal Considerations
Always check each site's robots.txt and Terms of Service. Use scraped data for personal research and analysis. Don't republish proprietary content or overload servers with requests.
Conclusion
Scraping travel sites requires more sophisticated tools than typical web scraping, but the data is incredibly valuable. Whether you're building a personal price alert system or a full comparison platform, the patterns shown here will get you started. The key is using reliable proxies and respecting rate limits.
Happy scraping!
Top comments (0)