The online travel industry generates over $750 billion annually, and Booking.com sits at the center of it. With over 28 million accommodation listings across 228 countries, Booking.com is the world's largest online travel agency. For travel tech companies, pricing analysts, hospitality researchers, and competitive intelligence teams, extracting data from Booking.com provides unmatched market insights.
In this comprehensive guide, we'll explore how to scrape Booking.com effectively — from understanding its data architecture to building reliable extraction pipelines for hotel listings, room pricing, guest reviews, and availability calendars.
Understanding Booking.com's Data Architecture
Booking.com is a complex web application with multiple layers of data. Let's map out the key data entities you'll encounter:
Hotel Listing Data
Each property page on Booking.com contains rich structured data:
- Property info: Hotel name, star rating, address, coordinates, property type
- Pricing: Room rates by date, currency, taxes and fees breakdown
- Reviews: Guest ratings (overall + category scores), written reviews, traveler type
- Amenities: WiFi, parking, pool, breakfast, gym, etc.
- Photos: Property images, room images, facility photos
- Policies: Check-in/check-out times, cancellation policy, payment methods
- Location: Distance to landmarks, neighborhood description, transport links
Search Results Structure
A typical search on Booking.com returns a JSON-heavy page with:
{
"hotel_id": 285283,
"hotel_name": "Grand Palace Hotel",
"stars": 4,
"review_score": 8.7,
"review_count": 2341,
"price": 189,
"currency": "USD",
"room_type": "Deluxe Double Room",
"free_cancellation": true,
"breakfast_included": false,
"distance_to_center": "0.3 km",
"latitude": 48.8566,
"longitude": 2.3522,
"photo_url": "https://cf.bstatic.com/...",
"urgency_message": "Only 2 rooms left!"
}
Review Data Structure
Reviews on Booking.com follow a verified-purchase model:
{
"review_id": "abc123",
"reviewer_name": "John",
"reviewer_country": "United States",
"reviewer_type": "Solo traveler",
"review_date": "2026-02-15",
"score": 9.2,
"positive": "Amazing location, friendly staff, clean rooms",
"negative": "Breakfast could have more variety",
"stayed_in": "Deluxe Double Room",
"nights": 3,
"categories": {
"cleanliness": 9.5,
"comfort": 9.0,
"location": 9.8,
"facilities": 8.5,
"staff": 9.3,
"value_for_money": 8.8
}
}
Technical Challenges of Scraping Booking.com
1. Dynamic Pricing Engine
Booking.com's pricing is highly dynamic. Prices change based on:
- Check-in and check-out dates
- Number of guests and rooms
- User's detected location (geo-pricing)
- Device type (mobile vs. desktop)
- Logged-in status and loyalty tier
- Real-time demand and availability
This means the same hotel can show different prices to different users at the same time. Your scraper needs to control these variables carefully.
2. JavaScript-Heavy Rendering
Like most modern travel sites, Booking.com relies heavily on client-side JavaScript rendering. Search results, pricing, and availability are loaded dynamically through API calls after the initial page load.
// Simple HTTP requests miss most of the data
const response = await fetch('https://www.booking.com/searchresults.html?dest_id=-2092174');
// Returns a shell with loading spinners, not actual hotel data
3. Anti-Bot Protection
Booking.com employs sophisticated anti-scraping measures:
- Perimeter-X and DataDome: Advanced bot detection
- Rate limiting: Aggressive throttling on repeated requests
- CAPTCHA: Image and behavioral challenges
- Session validation: Cookie and token verification
- Fingerprinting: Canvas, WebGL, and audio context fingerprinting
4. Complex URL Structure
Booking.com URLs encode search parameters in a specific format:
https://www.booking.com/searchresults.html
?ss=Paris
&dest_id=-1456928
&dest_type=city
&checkin=2026-04-15
&checkout=2026-04-18
&group_adults=2
&no_rooms=1
&selected_currency=USD
&order=price
Building a Booking.com Scraper
Step 1: Setting Up with Playwright
const { chromium } = require('playwright');
async function createBookingSession() {
const browser = await chromium.launch({
headless: true,
args: ['--no-sandbox', '--disable-dev-shm-usage']
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
viewport: { width: 1440, height: 900 },
locale: 'en-US',
timezoneId: 'America/New_York',
geolocation: { latitude: 40.7128, longitude: -74.0060 },
permissions: ['geolocation']
});
// Set currency cookie to get consistent pricing
await context.addCookies([{
name: 'selectedCurrency',
value: 'USD',
domain: '.booking.com',
path: '/'
}]);
return { browser, context };
}
Step 2: Scraping Search Results
The most reliable approach is to intercept Booking.com's internal GraphQL and REST API calls:
async function scrapeSearchResults(context, destination, checkin, checkout, guests = 2) {
const page = await context.newPage();
const hotels = [];
// Intercept API responses containing hotel data
page.on('response', async (response) => {
const url = response.url();
if (url.includes('/dml/graphql') || url.includes('searchresults')) {
try {
const data = await response.json();
const extracted = extractHotelData(data);
if (extracted.length > 0) {
hotels.push(...extracted);
}
} catch (e) {
// Skip non-JSON responses
}
}
});
const searchUrl = buildSearchUrl(destination, checkin, checkout, guests);
await page.goto(searchUrl, { waitUntil: 'networkidle' });
// Wait for results to fully render
await page.waitForSelector('[data-testid="property-card"]', { timeout: 15000 });
// Also extract from DOM as backup
const domHotels = await page.evaluate(() => {
const cards = document.querySelectorAll('[data-testid="property-card"]');
return Array.from(cards).map(card => {
const nameEl = card.querySelector('[data-testid="title"]');
const priceEl = card.querySelector('[data-testid="price-and-discounted-price"]');
const scoreEl = card.querySelector('[data-testid="review-score"]');
const addressEl = card.querySelector('[data-testid="address"]');
return {
name: nameEl?.textContent?.trim(),
price: priceEl?.textContent?.replace(/[^0-9.]/g, ''),
reviewScore: scoreEl?.textContent?.trim(),
address: addressEl?.textContent?.trim()
};
});
});
await page.close();
return mergeResults(hotels, domHotels);
}
function buildSearchUrl(destination, checkin, checkout, guests) {
const params = new URLSearchParams({
ss: destination,
checkin: checkin,
checkout: checkout,
group_adults: guests.toString(),
no_rooms: '1',
selected_currency: 'USD'
});
return `https://www.booking.com/searchresults.html?${params.toString()}`;
}
Step 3: Extracting Room Details and Pricing
Each hotel has multiple room types with different pricing. Here's how to extract that granular data:
async function scrapeRoomDetails(context, hotelUrl) {
const page = await context.newPage();
await page.goto(hotelUrl, { waitUntil: 'networkidle' });
await page.waitForSelector('#hprt-table', { timeout: 15000 });
const rooms = await page.evaluate(() => {
const roomRows = document.querySelectorAll('.hprt-table tr');
const results = [];
let currentRoom = null;
for (const row of roomRows) {
const roomName = row.querySelector('.hprt-roomtype-icon-link');
const priceEl = row.querySelector('.bui-price-display__value, .prco-valign-middle-helper');
const occupancy = row.querySelector('.hprt-occupancy-occupancy-info');
const conditions = row.querySelector('.hprt-conditions');
if (roomName) {
currentRoom = roomName.textContent.trim();
}
if (priceEl && currentRoom) {
const conditionText = conditions?.textContent?.trim() || '';
results.push({
roomType: currentRoom,
price: priceEl.textContent.replace(/[^0-9.]/g, ''),
maxOccupancy: occupancy?.textContent?.trim(),
freeCancellation: conditionText.includes('FREE cancellation'),
breakfastIncluded: conditionText.includes('breakfast included'),
paymentTerms: conditionText.includes('No prepayment')
? 'Pay at property' : 'Prepay online',
conditions: conditionText
});
}
}
return results;
});
await page.close();
return rooms;
}
Step 4: Scraping Guest Reviews
Reviews are critical for sentiment analysis and quality assessment:
async function scrapeReviews(context, hotelId, maxPages = 5) {
const allReviews = [];
for (let pageNum = 1; pageNum <= maxPages; pageNum++) {
const page = await context.newPage();
const reviewUrl = `https://www.booking.com/reviewlist.html?cc1=&pagename=${hotelId}&offset=${(pageNum - 1) * 25}&rows=25&sort=f_recent_desc`;
await page.goto(reviewUrl, { waitUntil: 'networkidle' });
const reviews = await page.evaluate(() => {
const reviewCards = document.querySelectorAll('.review_list_new_item_block');
return Array.from(reviewCards).map(card => {
const scoreEl = card.querySelector('.bui-review-score__badge');
const titleEl = card.querySelector('.c-review-block__title');
const positiveEl = card.querySelector('.c-review__body:nth-of-type(1)');
const negativeEl = card.querySelector('.lalala');
const dateEl = card.querySelector('.c-review-block__date');
const nameEl = card.querySelector('.bui-avatar-block__title');
const countryEl = card.querySelector('.bui-avatar-block__subtitle');
const stayInfoEl = card.querySelector('.c-review-block__stay-date');
return {
score: parseFloat(scoreEl?.textContent?.trim()) || null,
title: titleEl?.textContent?.trim(),
positive: positiveEl?.textContent?.trim(),
negative: negativeEl?.textContent?.trim(),
date: dateEl?.textContent?.trim(),
reviewerName: nameEl?.textContent?.trim(),
reviewerCountry: countryEl?.textContent?.trim(),
stayInfo: stayInfoEl?.textContent?.trim()
};
});
});
allReviews.push(...reviews);
await page.close();
// Respectful delay between pages
await new Promise(r => setTimeout(r, 2000 + Math.random() * 3000));
}
return allReviews;
}
Step 5: Calendar and Pricing Trends
Extracting prices across dates reveals pricing patterns:
async function scrapePricingCalendar(context, hotelUrl, startDate, days = 30) {
const pricingData = [];
for (let i = 0; i < days; i++) {
const checkin = new Date(startDate);
checkin.setDate(checkin.getDate() + i);
const checkout = new Date(checkin);
checkout.setDate(checkout.getDate() + 1);
const checkinStr = checkin.toISOString().split('T')[0];
const checkoutStr = checkout.toISOString().split('T')[0];
const url = `${hotelUrl}?checkin=${checkinStr}&checkout=${checkoutStr}&group_adults=2`;
const page = await context.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
const cheapestPrice = await page.evaluate(() => {
const priceEl = document.querySelector('.bui-price-display__value');
return priceEl ? parseFloat(priceEl.textContent.replace(/[^0-9.]/g, '')) : null;
});
pricingData.push({
date: checkinStr,
dayOfWeek: checkin.toLocaleDateString('en-US', { weekday: 'long' }),
cheapestPrice: cheapestPrice,
currency: 'USD'
});
await page.close();
// Rate limiting — critical for Booking.com
await new Promise(r => setTimeout(r, 3000 + Math.random() * 5000));
}
return pricingData;
}
function analyzePricingPatterns(pricingData) {
const weekdayPrices = pricingData
.filter(d => !['Saturday', 'Sunday'].includes(d.dayOfWeek))
.map(d => d.cheapestPrice)
.filter(Boolean);
const weekendPrices = pricingData
.filter(d => ['Saturday', 'Sunday'].includes(d.dayOfWeek))
.map(d => d.cheapestPrice)
.filter(Boolean);
return {
avgWeekdayPrice: average(weekdayPrices).toFixed(2),
avgWeekendPrice: average(weekendPrices).toFixed(2),
weekendPremium: ((average(weekendPrices) / average(weekdayPrices) - 1) * 100).toFixed(1) + '%',
cheapestDate: pricingData.reduce((min, d) =>
d.cheapestPrice && d.cheapestPrice < (min.cheapestPrice || Infinity) ? d : min
),
mostExpensiveDate: pricingData.reduce((max, d) =>
d.cheapestPrice && d.cheapestPrice > (max.cheapestPrice || 0) ? d : max
),
priceRange: {
min: Math.min(...pricingData.map(d => d.cheapestPrice).filter(Boolean)),
max: Math.max(...pricingData.map(d => d.cheapestPrice).filter(Boolean))
}
};
}
Production Scaling with Apify
Running a Booking.com scraper in production requires dealing with proxies, browser pools, error handling, and data storage at scale. This is exactly what the Apify platform handles for you.
The Apify Store features dedicated Booking.com scraper actors that handle all the complexity — anti-bot bypassing, proxy rotation, session management, and structured data output.
Using Apify for Booking.com Scraping
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN'
});
async function scrapeBookingWithApify() {
const run = await client.actor('booking-scraper').call({
destinations: ['Paris, France', 'Rome, Italy', 'Barcelona, Spain'],
checkin: '2026-05-01',
checkout: '2026-05-04',
adults: 2,
rooms: 1,
currency: 'USD',
language: 'en-us',
maxItems: 200,
includeReviews: true,
sortBy: 'review_score',
starRating: [3, 4, 5],
proxy: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL']
}
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} hotels`);
// Process results
for (const hotel of items) {
console.log(`${hotel.name} - $${hotel.price}/night - ${hotel.reviewScore}/10`);
}
return items;
}
Benefits of Using Apify for Travel Data
- Residential proxy pools: Essential for Booking.com's geo-pricing
- Browser fingerprint rotation: Avoids detection patterns
- Automatic retry logic: Handles transient failures gracefully
- Scheduled runs: Monitor pricing daily, weekly, or hourly
- Webhooks: Get notified when scraping completes
- Dataset export: JSON, CSV, Excel, or direct API access
Practical Use Cases
Price Monitoring Dashboard
async function buildPriceMonitor(hotels, dates) {
const priceMatrix = {};
for (const hotel of hotels) {
priceMatrix[hotel.name] = {};
for (const date of dates) {
const price = await getPrice(hotel.id, date);
priceMatrix[hotel.name][date] = price;
}
}
// Find best deals
const deals = Object.entries(priceMatrix).map(([hotel, prices]) => {
const priceValues = Object.values(prices).filter(Boolean);
const avgPrice = average(priceValues);
const minPrice = Math.min(...priceValues);
const bestDate = Object.entries(prices)
.reduce((best, [date, price]) => price < best[1] ? [date, price] : best);
return {
hotel,
averagePrice: avgPrice.toFixed(2),
bestPrice: minPrice,
bestDate: bestDate[0],
savings: ((avgPrice - minPrice) / avgPrice * 100).toFixed(1) + '%'
};
});
return deals.sort((a, b) => a.bestPrice - b.bestPrice);
}
Competitive Analysis for Hoteliers
function analyzeCompetition(targetHotel, competitors) {
const analysis = {
targetHotel: targetHotel.name,
targetScore: targetHotel.reviewScore,
targetPrice: targetHotel.price,
competitorCount: competitors.length,
pricePosition: null,
scorePosition: null,
strengths: [],
weaknesses: []
};
// Price ranking
const allByPrice = [targetHotel, ...competitors].sort((a, b) => a.price - b.price);
analysis.pricePosition = allByPrice.findIndex(h => h.name === targetHotel.name) + 1;
// Score ranking
const allByScore = [targetHotel, ...competitors].sort((a, b) => b.reviewScore - a.reviewScore);
analysis.scorePosition = allByScore.findIndex(h => h.name === targetHotel.name) + 1;
// Category comparison
const categories = ['cleanliness', 'comfort', 'location', 'facilities', 'staff', 'value'];
for (const cat of categories) {
const avgCompetitor = average(competitors.map(c => c.categories[cat]));
const diff = targetHotel.categories[cat] - avgCompetitor;
if (diff > 0.3) {
analysis.strengths.push(`${cat}: ${diff.toFixed(1)} points above average`);
} else if (diff < -0.3) {
analysis.weaknesses.push(`${cat}: ${Math.abs(diff).toFixed(1)} points below average`);
}
}
return analysis;
}
Review Sentiment Analysis
function analyzeReviewSentiment(reviews) {
const sentimentKeywords = {
positive: {
'clean': 0, 'friendly': 0, 'location': 0, 'comfortable': 0,
'breakfast': 0, 'helpful': 0, 'quiet': 0, 'spacious': 0,
'modern': 0, 'view': 0, 'excellent': 0, 'perfect': 0
},
negative: {
'noise': 0, 'dirty': 0, 'small': 0, 'expensive': 0,
'rude': 0, 'slow': 0, 'old': 0, 'broken': 0,
'smell': 0, 'wait': 0, 'cold': 0, 'uncomfortable': 0
}
};
for (const review of reviews) {
const text = `${review.positive || ''} ${review.negative || ''}`.toLowerCase();
for (const keyword of Object.keys(sentimentKeywords.positive)) {
if (text.includes(keyword)) sentimentKeywords.positive[keyword]++;
}
for (const keyword of Object.keys(sentimentKeywords.negative)) {
if (text.includes(keyword)) sentimentKeywords.negative[keyword]++;
}
}
const topPositive = Object.entries(sentimentKeywords.positive)
.sort((a, b) => b[1] - a[1])
.slice(0, 5);
const topNegative = Object.entries(sentimentKeywords.negative)
.sort((a, b) => b[1] - a[1])
.slice(0, 5);
return {
totalReviews: reviews.length,
averageScore: average(reviews.map(r => r.score).filter(Boolean)).toFixed(1),
topPositiveThemes: topPositive.map(([k, v]) => `${k} (${v} mentions)`),
topNegativeThemes: topNegative.map(([k, v]) => `${k} (${v} mentions)`),
scoreDistribution: {
excellent: reviews.filter(r => r.score >= 9).length,
good: reviews.filter(r => r.score >= 7 && r.score < 9).length,
okay: reviews.filter(r => r.score >= 5 && r.score < 7).length,
poor: reviews.filter(r => r.score < 5).length
}
};
}
Best Practices for Booking.com Scraping
1. Respect Rate Limits
// Implement exponential backoff
async function fetchWithRetry(page, url, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 });
return true;
} catch (error) {
const delay = Math.pow(2, attempt) * 5000 + Math.random() * 5000;
console.log(`Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`);
await new Promise(r => setTimeout(r, delay));
}
}
return false;
}
2. Handle Currency Consistently
Always set the currency explicitly to avoid geo-based currency switching that corrupts your data.
3. Account for Seasonal Pricing
Hotel prices fluctuate dramatically by season. Any meaningful analysis needs to account for seasonality by comparing like-for-like date ranges.
4. Validate Data Quality
function validateHotelData(hotel) {
const issues = [];
if (!hotel.name) issues.push('Missing hotel name');
if (!hotel.price || hotel.price <= 0) issues.push('Invalid price');
if (hotel.reviewScore && (hotel.reviewScore < 1 || hotel.reviewScore > 10)) {
issues.push('Review score out of range');
}
if (!hotel.latitude || !hotel.longitude) issues.push('Missing coordinates');
return {
isValid: issues.length === 0,
issues: issues
};
}
5. Legal and Ethical Considerations
- Review Booking.com's Terms of Service before scraping
- Use data for analysis and research, not to replicate their service
- Don't overload their servers with excessive request volume
- Consider using their official affiliate API for commercial applications
- Be transparent about data sources in your applications
Conclusion
Booking.com contains a treasure trove of hospitality data — from real-time pricing and availability to thousands of verified guest reviews. Whether you're building a price comparison tool, conducting market research, or optimizing your own hotel's competitive positioning, this data is incredibly valuable.
The technical challenges are real — dynamic rendering, anti-bot measures, and geo-pricing all make scraping non-trivial. For reliable, production-grade scraping, platforms like Apify provide the infrastructure you need: managed browsers, proxy rotation, automatic retries, and clean data output.
Check the Apify Store for ready-made Booking.com actors that let you start extracting data immediately without building and maintaining your own scraping infrastructure. Combined with scheduled runs and webhook integrations, you can build powerful travel data pipelines that run autonomously.
The travel industry thrives on data-driven decisions. With the right extraction tools and analysis pipeline, you can turn Booking.com's publicly available data into actionable competitive intelligence.
This article is for educational purposes. Always ensure your scraping activities comply with applicable terms of service and local regulations.
Top comments (0)