agenthustler

Posted on Mar 26

Scraping Booking.com and Hotels.com for Travel Price Data

#python #tutorial #webdev #programming

Travel sites like Booking.com and Hotels.com display thousands of hotel listings with prices that change constantly. Scraping this data lets you build price trackers, comparison tools, and travel analytics dashboards. Here's how to do it with Python.

Why Scrape Travel Sites?

Price monitoring — track hotel rates over time to find the best deals
Market research — analyze pricing patterns across regions and seasons
Comparison tools — build apps that show the cheapest option across platforms
Revenue management — hotels use competitor data to optimize their own pricing

The Challenge

Travel sites are heavily protected. They use JavaScript rendering, CAPTCHAs, rate limiting, and bot detection. You'll need proxies and potentially a headless browser.

Setting Up

pip install requests beautifulsoup4 pandas

Scraping Booking.com Search Results

import requests
from bs4 import BeautifulSoup
import json

def scrape_booking(city, checkin, checkout, api_key):
    url = f"https://www.booking.com/searchresults.html?ss={city}&checkin={checkin}&checkout={checkout}&group_adults=2"

    # Use proxy to handle anti-bot measures
    proxy_url = f"http://api.scraperapi.com?api_key={api_key}&url={url}&render=true"
    response = requests.get(proxy_url)
    soup = BeautifulSoup(response.text, "html.parser")

    hotels = []
    property_cards = soup.find_all("div", {"data-testid": "property-card"})

    for card in property_cards:
        name_el = card.find("div", {"data-testid": "title"})
        price_el = card.find("span", {"data-testid": "price-and-discounted-price"})
        rating_el = card.find("div", {"data-testid": "review-score"})

        hotels.append({
            "name": name_el.get_text(strip=True) if name_el else "N/A",
            "price": price_el.get_text(strip=True) if price_el else "N/A",
            "rating": rating_el.get_text(strip=True) if rating_el else "N/A",
            "source": "booking.com"
        })

    return hotels

results = scrape_booking("Paris", "2026-04-01", "2026-04-03", "YOUR_API_KEY")
for hotel in results[:5]:
    print(f"{hotel['name']} - {hotel['price']} - Rating: {hotel['rating']}")

Scraping Hotels.com

def scrape_hotels_com(city, checkin, checkout, api_key):
    url = f"https://www.hotels.com/Hotel-Search?destination={city}&startDate={checkin}&endDate={checkout}&adults=2"

    proxy_url = f"http://api.scraperapi.com?api_key={api_key}&url={url}&render=true"
    response = requests.get(proxy_url)
    soup = BeautifulSoup(response.text, "html.parser")

    hotels = []
    listings = soup.find_all("div", {"data-testid": "lodging-card-responsive"})

    for listing in listings:
        name = listing.find("h3")
        price = listing.find("div", class_=lambda c: c and "price" in c.lower() if c else False)
        rating = listing.find("span", class_=lambda c: c and "rating" in c.lower() if c else False)

        hotels.append({
            "name": name.get_text(strip=True) if name else "N/A",
            "price": price.get_text(strip=True) if price else "N/A",
            "rating": rating.get_text(strip=True) if rating else "N/A",
            "source": "hotels.com"
        })

    return hotels

Building a Price Comparison Engine

import pandas as pd
from datetime import datetime

class TravelPriceTracker:
    def __init__(self, api_key):
        self.api_key = api_key
        self.history = []

    def compare_prices(self, city, checkin, checkout):
        booking_results = scrape_booking(city, checkin, checkout, self.api_key)
        hotels_results = scrape_hotels_com(city, checkin, checkout, self.api_key)

        all_results = booking_results + hotels_results

        for result in all_results:
            result["scraped_at"] = datetime.now().isoformat()
            result["city"] = city
            result["checkin"] = checkin
            result["checkout"] = checkout

        self.history.extend(all_results)
        return all_results

    def find_best_deal(self, results):
        def parse_price(price_str):
            try:
                return float(price_str.replace("$", "").replace(",", "").replace("US", "").strip())
            except (ValueError, AttributeError):
                return float("inf")

        sorted_results = sorted(results, key=lambda x: parse_price(x["price"]))
        return sorted_results[0] if sorted_results else None

    def save_history(self, filename="travel_prices.csv"):
        df = pd.DataFrame(self.history)
        df.to_csv(filename, index=False)

tracker = TravelPriceTracker(api_key="YOUR_KEY")
results = tracker.compare_prices("London", "2026-05-01", "2026-05-03")
best = tracker.find_best_deal(results)
print(f"Best deal: {best['name']} at {best['price']} on {best['source']}")

Price History Tracking

import schedule
import time

def daily_price_check():
    cities = ["Paris", "London", "Tokyo", "New York"]
    tracker = TravelPriceTracker(api_key="YOUR_KEY")

    for city in cities:
        results = tracker.compare_prices(city, "2026-06-01", "2026-06-03")
        print(f"{city}: {len(results)} hotels found")
        time.sleep(5)

    tracker.save_history()

schedule.every().day.at("06:00").do(daily_price_check)

while True:
    schedule.run_pending()
    time.sleep(60)

Handling Anti-Bot Protection

Travel sites invest heavily in bot detection. Here's what you need:

Proxy rotation — ScraperAPI handles this automatically with their render=true parameter for JavaScript-heavy pages
Residential proxies — ThorData provides residential IPs that look like real users
Request spacing — add 3-5 second delays between requests
User-Agent rotation — cycle through real browser user agents
Session management — use cookies to maintain realistic browsing sessions

Data Analysis

import pandas as pd

df = pd.read_csv("travel_prices.csv")
df["price_num"] = df["price"].str.extract(r"(\d+)").astype(float)

avg_by_source = df.groupby("source")["price_num"].mean()
print("Average prices by platform:")
print(avg_by_source)

avg_by_city = df.groupby("city")["price_num"].mean().sort_values()
print("\nCheapest cities:")
print(avg_by_city)

Monitoring with ScrapeOps

For production scrapers, use ScrapeOps to monitor success rates, response times, and costs across your scraping jobs. Their dashboard shows you exactly which scrapers need attention.

Legal Considerations

Always check each site's robots.txt and Terms of Service. Use scraped data for personal research and analysis. Don't republish proprietary content or overload servers with requests.

Conclusion

Scraping travel sites requires more sophisticated tools than typical web scraping, but the data is incredibly valuable. Whether you're building a personal price alert system or a full comparison platform, the patterns shown here will get you started. The key is using reliable proxies and respecting rate limits.

Happy scraping!

DEV Community