DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Airbnb in 2026: Listings, Prices, and Property Data

Airbnb has become one of the most valuable datasets in real estate tech. Investors use it to evaluate short-term rental markets. Property managers use it to price competitively. Researchers use it to study tourism impact on housing.

But Airbnb has no public API for listing data. And their frontend is a heavily JavaScript-rendered React application with serious anti-scraping measures.

Here's how to actually get Airbnb data in 2026 — from quick Python scripts to scalable solutions.

What Data Can You Extract from Airbnb?

Airbnb search results and listing pages contain:

  • Listing details: Title, property type, bedrooms, bathrooms, max guests, amenities
  • Pricing: Nightly rate, cleaning fee, service fee, total price for date range
  • Reviews: Rating (overall + subcategories), review count, individual review text
  • Host info: Name, superhost status, response rate, listings count
  • Location: Neighborhood, coordinates (approximate), proximity info
  • Availability: Calendar data, minimum/maximum stay requirements

Why Airbnb Is Hard to Scrape

Airbnb is one of the more challenging targets for web scraping:

  1. Full JavaScript rendering — The page loads a React shell, then fetches data via internal GraphQL APIs. Plain HTTP requests return an empty page.
  2. Aggressive bot detection — Fingerprinting, behavioral analysis, and device attestation.
  3. Dynamic selectors — CSS class names are hashed and change with every deployment.
  4. Rate limiting — Strict per-IP limits, especially on search and calendar endpoints.
  5. Legal stance — Airbnb actively fights scrapers in court (see the 2024 hiQ Labs precedent).

You need a headless browser at minimum. A simple requests.get() returns zero useful data.

Method 1: Playwright + Python

Playwright gives you a real browser that executes JavaScript. Here's a working scraper for Airbnb search results:

import asyncio
import json
from playwright.async_api import async_playwright


async def scrape_airbnb_listings(
    location: str,
    checkin: str,
    checkout: str,
    max_pages: int = 3,
) -> list[dict]:
    """
    Scrape Airbnb search results using Playwright.

    Args:
        location: Search location (e.g., "Barcelona, Spain")
        checkin: Check-in date (YYYY-MM-DD)
        checkout: Check-out date (YYYY-MM-DD)
        max_pages: Number of result pages to scrape
    """
    listings = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            locale="en-US",
        )

        page = await context.new_page()

        # Build search URL
        search_url = (
            f"https://www.airbnb.com/s/{location.replace(' ', '-')}/homes"
            f"?checkin={checkin}&checkout={checkout}"
            f"&adults=2&search_type=filter_change"
        )

        for page_num in range(max_pages):
            url = search_url if page_num == 0 else f"{search_url}&cursor={page_num * 20}"

            await page.goto(url, wait_until="networkidle", timeout=30000)
            await page.wait_for_timeout(3000)  # Let lazy-loaded content appear

            # Scroll to trigger lazy loading
            for _ in range(5):
                await page.mouse.wheel(0, 800)
                await page.wait_for_timeout(500)

            # Extract listing data from the page
            page_listings = await page.evaluate("""() => {
                const cards = document.querySelectorAll("div[data-testid='card-container']");
                return Array.from(cards).map(card => {
                    const titleEl = card.querySelector("div[data-testid='listing-card-title']");
                    const subtitleEl = card.querySelector("div[data-testid='listing-card-subtitle']");
                    const priceEl = card.querySelector("span._1y74zjx");
                    const ratingEl = card.querySelector("span[aria-label*='rating']");
                    const linkEl = card.querySelector("a[href*='/rooms/']");
                    const imgEl = card.querySelector("img");

                    return {
                        title: titleEl ? titleEl.innerText.trim() : null,
                        subtitle: subtitleEl ? subtitleEl.innerText.trim() : null,
                        price_per_night: priceEl ? priceEl.innerText.trim() : null,
                        rating: ratingEl ? ratingEl.getAttribute("aria-label") : null,
                        url: linkEl ? "https://www.airbnb.com" + linkEl.getAttribute("href").split("?")[0] : null,
                        image: imgEl ? imgEl.getAttribute("src") : null,
                    };
                }).filter(l => l.title);
            }""")

            listings.extend(page_listings)
            print(f"Page {page_num + 1}: found {len(page_listings)} listings")

            # Human-like delay between pages
            await page.wait_for_timeout(4000 + (page_num * 1000))

        await browser.close()

    return listings


# Usage
async def main():
    results = await scrape_airbnb_listings(
        location="Barcelona, Spain",
        checkin="2026-04-15",
        checkout="2026-04-20",
        max_pages=2,
    )
    print(f"\nTotal listings found: {len(results)}")
    for listing in results[:5]:
        print(f"  {listing['title']}{listing['price_per_night']}/night")
        print(f"    {listing['rating'] or 'No rating'}")


asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Install dependencies first:

pip install playwright
playwright install chromium
Enter fullscreen mode Exit fullscreen mode

Method 2: Intercepting Airbnb's Internal API

Airbnb's frontend talks to an internal GraphQL API called StaysSearch. If you intercept those requests, you get clean JSON instead of parsing messy HTML. This is more reliable than DOM scraping since it doesn't break when Airbnb changes their CSS:

async def scrape_airbnb_via_api(
    location: str, checkin: str, checkout: str
) -> list[dict]:
    """Intercept Airbnb's internal API to get structured listing data."""
    listings = []
    api_responses = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
        )
        page = await context.new_page()

        # Intercept API responses
        async def handle_response(response):
            if "StaysSearch" in response.url or "/api/v3/StaysSearch" in response.url:
                try:
                    data = await response.json()
                    api_responses.append(data)
                except Exception:
                    pass

        page.on("response", handle_response)

        search_url = (
            f"https://www.airbnb.com/s/{location.replace(' ', '-')}/homes"
            f"?checkin={checkin}&checkout={checkout}&adults=2"
        )

        await page.goto(search_url, wait_until="networkidle", timeout=45000)
        await page.wait_for_timeout(5000)

        # Parse the intercepted API data
        for response_data in api_responses:
            try:
                results = (
                    response_data.get("data", {})
                    .get("presentation", {})
                    .get("staysSearch", {})
                    .get("results", {})
                    .get("searchResults", [])
                )
                for result in results:
                    listing = result.get("listing", {})
                    pricing = result.get("pricingQuote", {})

                    listings.append({
                        "id": listing.get("id"),
                        "title": listing.get("name"),
                        "property_type": listing.get("roomTypeCategory"),
                        "bedrooms": listing.get("bedrooms"),
                        "bathrooms": listing.get("bathrooms"),
                        "max_guests": listing.get("personCapacity"),
                        "rating": listing.get("avgRating"),
                        "review_count": listing.get("reviewsCount"),
                        "superhost": listing.get("isSuperhost"),
                        "price_per_night": pricing.get("rate", {}).get("amount"),
                        "currency": pricing.get("rate", {}).get("currency"),
                        "total_price": pricing.get("priceString"),
                        "url": f"https://www.airbnb.com/rooms/{listing.get('id')}",
                    })
            except (KeyError, TypeError):
                continue

        await browser.close()

    return listings


# Usage
results = asyncio.run(
    scrape_airbnb_via_api("Lisbon, Portugal", "2026-05-01", "2026-05-05")
)
for r in results[:5]:
    print(
        f"{r['title']}{r['price_per_night']} {r['currency']}/night "
        f"({r['rating']}★, {r['review_count']} reviews)"
    )
Enter fullscreen mode Exit fullscreen mode

This method gives you much richer data — pricing breakdowns, exact ratings, property details — all in clean JSON.

Method 3: Scraping Individual Listing Pages

For detailed property data (full amenities list, host info, neighborhood details), you need to visit individual listing pages:

async def scrape_listing_details(listing_url: str) -> dict:
    """Scrape detailed data from a single Airbnb listing page."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        api_data = {}

        async def capture_api(response):
            if "/api/v3/PdpPlatformSections" in response.url:
                try:
                    api_data.update(await response.json())
                except Exception:
                    pass

        page.on("response", capture_api)

        await page.goto(listing_url, wait_until="networkidle", timeout=30000)
        await page.wait_for_timeout(3000)

        # Extract from page content as fallback
        details = await page.evaluate("""() => {
            const getTextByTestId = (id) => {
                const el = document.querySelector(`[data-testid="${id}"]`);
                return el ? el.innerText.trim() : null;
            };

            // Amenities
            const amenities = Array.from(
                document.querySelectorAll("div[data-testid='amenity-row'] span")
            ).map(el => el.innerText.trim());

            // Host info
            const hostSection = document.querySelector("div[data-testid='host-profile']");
            const hostName = hostSection?.querySelector("h2")?.innerText;

            return {
                title: document.querySelector("h1")?.innerText?.trim(),
                amenities: amenities,
                host_name: hostName || null,
                description: getTextByTestId("listing-description"),
            };
        }""")

        # Merge API data if captured
        if api_data:
            details["api_data_available"] = True

        await browser.close()
        return details
Enter fullscreen mode Exit fullscreen mode

The Proxy Problem

Even with Playwright, you'll get blocked after 20-30 listings from the same IP. Airbnb's detection is sophisticated — they track:

  • IP reputation and ASN (datacenter IPs are instant blocks)
  • Browser fingerprint consistency
  • Navigation patterns and timing
  • TLS fingerprint matching

You need residential proxies — real IP addresses from ISP networks that look like normal users.

Using ThorData for Residential Proxies

ThorData provides a large pool of residential IPs that work well with Airbnb. Here's how to integrate them with Playwright:

async def scrape_with_proxy(
    location: str, checkin: str, checkout: str
) -> list[dict]:
    """Scrape Airbnb using ThorData residential proxies."""
    proxy_config = {
        "server": "http://proxy.thordata.com:9000",
        "username": "your_username",
        "password": "your_password",
    }

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy=proxy_config,
        )
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            locale="en-US",
        )

        page = await context.new_page()
        search_url = (
            f"https://www.airbnb.com/s/{location.replace(' ', '-')}/homes"
            f"?checkin={checkin}&checkout={checkout}&adults=2"
        )

        await page.goto(search_url, wait_until="networkidle", timeout=45000)
        # ... parse as shown in Methods 1 or 2

        await browser.close()
    return []
Enter fullscreen mode Exit fullscreen mode

Residential proxies are essential for Airbnb scraping at any meaningful scale. Datacenter proxies will get you blocked almost immediately.

Handling Common Challenges

Challenge 1: Currency and Language

Airbnb shows different prices based on your apparent location. Force consistency:

context = await browser.new_context(
    locale="en-US",
    timezone_id="America/New_York",
    extra_http_headers={
        "Accept-Language": "en-US,en;q=0.9",
    },
)
# Add currency parameter to URL
url += "&currency=USD"
Enter fullscreen mode Exit fullscreen mode

Challenge 2: Dynamic Class Names

Airbnb's CSS classes change constantly. Use data-testid attributes and ARIA labels instead:

# Bad — breaks every deployment
price = page.query_selector("span._1y74zjx")

# Good — stable selectors
price = page.query_selector("[data-testid='price-element']")
rating = page.query_selector("[aria-label*='rating']")
Enter fullscreen mode Exit fullscreen mode

Challenge 3: Pagination

Airbnb uses cursor-based pagination, not page numbers. Capture the next cursor from the API response:

# From the intercepted StaysSearch response:
pagination = (
    response_data["data"]["presentation"]["staysSearch"]
    ["results"]["paginationInfo"]
)
next_cursor = pagination.get("nextPageCursor")
# Append to next request: &cursor={next_cursor}
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. You must use a headless browser — Airbnb is 100% JavaScript-rendered, requests returns nothing.
  2. Intercept the internal API — Parsing StaysSearch GraphQL responses is more reliable than DOM scraping.
  3. Use data-testid selectors — CSS class names are hashed and change constantly.
  4. Residential proxies are mandatory at scaleThorData or similar. Datacenter IPs get blocked instantly.
  5. Add human-like delays — 4-8 seconds between pages, vary randomly, scroll naturally.
  6. Force currency/locale — Or your pricing data will be inconsistent across scraping sessions.

Related Tools

For production-grade scraping without maintaining your own infrastructure, check out the scrapers on my Apify profile — pre-built actors that handle anti-bot detection, proxies, and data formatting out of the box.


Building more scrapers every week. Follow me on Apify for production-ready actors.

Top comments (0)