Airbnb has become one of the most valuable datasets in real estate tech. Investors use it to evaluate short-term rental markets. Property managers use it to price competitively. Researchers use it to study tourism impact on housing.
But Airbnb has no public API for listing data. And their frontend is a heavily JavaScript-rendered React application with serious anti-scraping measures.
Here's how to actually get Airbnb data in 2026 — from quick Python scripts to scalable solutions.
What Data Can You Extract from Airbnb?
Airbnb search results and listing pages contain:
- Listing details: Title, property type, bedrooms, bathrooms, max guests, amenities
- Pricing: Nightly rate, cleaning fee, service fee, total price for date range
- Reviews: Rating (overall + subcategories), review count, individual review text
- Host info: Name, superhost status, response rate, listings count
- Location: Neighborhood, coordinates (approximate), proximity info
- Availability: Calendar data, minimum/maximum stay requirements
Why Airbnb Is Hard to Scrape
Airbnb is one of the more challenging targets for web scraping:
- Full JavaScript rendering — The page loads a React shell, then fetches data via internal GraphQL APIs. Plain HTTP requests return an empty page.
- Aggressive bot detection — Fingerprinting, behavioral analysis, and device attestation.
- Dynamic selectors — CSS class names are hashed and change with every deployment.
- Rate limiting — Strict per-IP limits, especially on search and calendar endpoints.
- Legal stance — Airbnb actively fights scrapers in court (see the 2024 hiQ Labs precedent).
You need a headless browser at minimum. A simple requests.get() returns zero useful data.
Method 1: Playwright + Python
Playwright gives you a real browser that executes JavaScript. Here's a working scraper for Airbnb search results:
import asyncio
import json
from playwright.async_api import async_playwright
async def scrape_airbnb_listings(
location: str,
checkin: str,
checkout: str,
max_pages: int = 3,
) -> list[dict]:
"""
Scrape Airbnb search results using Playwright.
Args:
location: Search location (e.g., "Barcelona, Spain")
checkin: Check-in date (YYYY-MM-DD)
checkout: Check-out date (YYYY-MM-DD)
max_pages: Number of result pages to scrape
"""
listings = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
locale="en-US",
)
page = await context.new_page()
# Build search URL
search_url = (
f"https://www.airbnb.com/s/{location.replace(' ', '-')}/homes"
f"?checkin={checkin}&checkout={checkout}"
f"&adults=2&search_type=filter_change"
)
for page_num in range(max_pages):
url = search_url if page_num == 0 else f"{search_url}&cursor={page_num * 20}"
await page.goto(url, wait_until="networkidle", timeout=30000)
await page.wait_for_timeout(3000) # Let lazy-loaded content appear
# Scroll to trigger lazy loading
for _ in range(5):
await page.mouse.wheel(0, 800)
await page.wait_for_timeout(500)
# Extract listing data from the page
page_listings = await page.evaluate("""() => {
const cards = document.querySelectorAll("div[data-testid='card-container']");
return Array.from(cards).map(card => {
const titleEl = card.querySelector("div[data-testid='listing-card-title']");
const subtitleEl = card.querySelector("div[data-testid='listing-card-subtitle']");
const priceEl = card.querySelector("span._1y74zjx");
const ratingEl = card.querySelector("span[aria-label*='rating']");
const linkEl = card.querySelector("a[href*='/rooms/']");
const imgEl = card.querySelector("img");
return {
title: titleEl ? titleEl.innerText.trim() : null,
subtitle: subtitleEl ? subtitleEl.innerText.trim() : null,
price_per_night: priceEl ? priceEl.innerText.trim() : null,
rating: ratingEl ? ratingEl.getAttribute("aria-label") : null,
url: linkEl ? "https://www.airbnb.com" + linkEl.getAttribute("href").split("?")[0] : null,
image: imgEl ? imgEl.getAttribute("src") : null,
};
}).filter(l => l.title);
}""")
listings.extend(page_listings)
print(f"Page {page_num + 1}: found {len(page_listings)} listings")
# Human-like delay between pages
await page.wait_for_timeout(4000 + (page_num * 1000))
await browser.close()
return listings
# Usage
async def main():
results = await scrape_airbnb_listings(
location="Barcelona, Spain",
checkin="2026-04-15",
checkout="2026-04-20",
max_pages=2,
)
print(f"\nTotal listings found: {len(results)}")
for listing in results[:5]:
print(f" {listing['title']} — {listing['price_per_night']}/night")
print(f" {listing['rating'] or 'No rating'}")
asyncio.run(main())
Install dependencies first:
pip install playwright
playwright install chromium
Method 2: Intercepting Airbnb's Internal API
Airbnb's frontend talks to an internal GraphQL API called StaysSearch. If you intercept those requests, you get clean JSON instead of parsing messy HTML. This is more reliable than DOM scraping since it doesn't break when Airbnb changes their CSS:
async def scrape_airbnb_via_api(
location: str, checkin: str, checkout: str
) -> list[dict]:
"""Intercept Airbnb's internal API to get structured listing data."""
listings = []
api_responses = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
)
page = await context.new_page()
# Intercept API responses
async def handle_response(response):
if "StaysSearch" in response.url or "/api/v3/StaysSearch" in response.url:
try:
data = await response.json()
api_responses.append(data)
except Exception:
pass
page.on("response", handle_response)
search_url = (
f"https://www.airbnb.com/s/{location.replace(' ', '-')}/homes"
f"?checkin={checkin}&checkout={checkout}&adults=2"
)
await page.goto(search_url, wait_until="networkidle", timeout=45000)
await page.wait_for_timeout(5000)
# Parse the intercepted API data
for response_data in api_responses:
try:
results = (
response_data.get("data", {})
.get("presentation", {})
.get("staysSearch", {})
.get("results", {})
.get("searchResults", [])
)
for result in results:
listing = result.get("listing", {})
pricing = result.get("pricingQuote", {})
listings.append({
"id": listing.get("id"),
"title": listing.get("name"),
"property_type": listing.get("roomTypeCategory"),
"bedrooms": listing.get("bedrooms"),
"bathrooms": listing.get("bathrooms"),
"max_guests": listing.get("personCapacity"),
"rating": listing.get("avgRating"),
"review_count": listing.get("reviewsCount"),
"superhost": listing.get("isSuperhost"),
"price_per_night": pricing.get("rate", {}).get("amount"),
"currency": pricing.get("rate", {}).get("currency"),
"total_price": pricing.get("priceString"),
"url": f"https://www.airbnb.com/rooms/{listing.get('id')}",
})
except (KeyError, TypeError):
continue
await browser.close()
return listings
# Usage
results = asyncio.run(
scrape_airbnb_via_api("Lisbon, Portugal", "2026-05-01", "2026-05-05")
)
for r in results[:5]:
print(
f"{r['title']} — {r['price_per_night']} {r['currency']}/night "
f"({r['rating']}★, {r['review_count']} reviews)"
)
This method gives you much richer data — pricing breakdowns, exact ratings, property details — all in clean JSON.
Method 3: Scraping Individual Listing Pages
For detailed property data (full amenities list, host info, neighborhood details), you need to visit individual listing pages:
async def scrape_listing_details(listing_url: str) -> dict:
"""Scrape detailed data from a single Airbnb listing page."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
api_data = {}
async def capture_api(response):
if "/api/v3/PdpPlatformSections" in response.url:
try:
api_data.update(await response.json())
except Exception:
pass
page.on("response", capture_api)
await page.goto(listing_url, wait_until="networkidle", timeout=30000)
await page.wait_for_timeout(3000)
# Extract from page content as fallback
details = await page.evaluate("""() => {
const getTextByTestId = (id) => {
const el = document.querySelector(`[data-testid="${id}"]`);
return el ? el.innerText.trim() : null;
};
// Amenities
const amenities = Array.from(
document.querySelectorAll("div[data-testid='amenity-row'] span")
).map(el => el.innerText.trim());
// Host info
const hostSection = document.querySelector("div[data-testid='host-profile']");
const hostName = hostSection?.querySelector("h2")?.innerText;
return {
title: document.querySelector("h1")?.innerText?.trim(),
amenities: amenities,
host_name: hostName || null,
description: getTextByTestId("listing-description"),
};
}""")
# Merge API data if captured
if api_data:
details["api_data_available"] = True
await browser.close()
return details
The Proxy Problem
Even with Playwright, you'll get blocked after 20-30 listings from the same IP. Airbnb's detection is sophisticated — they track:
- IP reputation and ASN (datacenter IPs are instant blocks)
- Browser fingerprint consistency
- Navigation patterns and timing
- TLS fingerprint matching
You need residential proxies — real IP addresses from ISP networks that look like normal users.
Using ThorData for Residential Proxies
ThorData provides a large pool of residential IPs that work well with Airbnb. Here's how to integrate them with Playwright:
async def scrape_with_proxy(
location: str, checkin: str, checkout: str
) -> list[dict]:
"""Scrape Airbnb using ThorData residential proxies."""
proxy_config = {
"server": "http://proxy.thordata.com:9000",
"username": "your_username",
"password": "your_password",
}
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy=proxy_config,
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
locale="en-US",
)
page = await context.new_page()
search_url = (
f"https://www.airbnb.com/s/{location.replace(' ', '-')}/homes"
f"?checkin={checkin}&checkout={checkout}&adults=2"
)
await page.goto(search_url, wait_until="networkidle", timeout=45000)
# ... parse as shown in Methods 1 or 2
await browser.close()
return []
Residential proxies are essential for Airbnb scraping at any meaningful scale. Datacenter proxies will get you blocked almost immediately.
Handling Common Challenges
Challenge 1: Currency and Language
Airbnb shows different prices based on your apparent location. Force consistency:
context = await browser.new_context(
locale="en-US",
timezone_id="America/New_York",
extra_http_headers={
"Accept-Language": "en-US,en;q=0.9",
},
)
# Add currency parameter to URL
url += "¤cy=USD"
Challenge 2: Dynamic Class Names
Airbnb's CSS classes change constantly. Use data-testid attributes and ARIA labels instead:
# Bad — breaks every deployment
price = page.query_selector("span._1y74zjx")
# Good — stable selectors
price = page.query_selector("[data-testid='price-element']")
rating = page.query_selector("[aria-label*='rating']")
Challenge 3: Pagination
Airbnb uses cursor-based pagination, not page numbers. Capture the next cursor from the API response:
# From the intercepted StaysSearch response:
pagination = (
response_data["data"]["presentation"]["staysSearch"]
["results"]["paginationInfo"]
)
next_cursor = pagination.get("nextPageCursor")
# Append to next request: &cursor={next_cursor}
Key Takeaways
-
You must use a headless browser — Airbnb is 100% JavaScript-rendered,
requestsreturns nothing. -
Intercept the internal API — Parsing
StaysSearchGraphQL responses is more reliable than DOM scraping. -
Use
data-testidselectors — CSS class names are hashed and change constantly. - Residential proxies are mandatory at scale — ThorData or similar. Datacenter IPs get blocked instantly.
- Add human-like delays — 4-8 seconds between pages, vary randomly, scroll naturally.
- Force currency/locale — Or your pricing data will be inconsistent across scraping sessions.
Related Tools
For production-grade scraping without maintaining your own infrastructure, check out the scrapers on my Apify profile — pre-built actors that handle anti-bot detection, proxies, and data formatting out of the box.
Building more scrapers every week. Follow me on Apify for production-ready actors.
Top comments (0)