DEV Community

agenthustler
agenthustler

Posted on

Redfin Scraping: Extract Real Estate Listings and Market Data

Real estate data drives billions of dollars in decisions every year. From individual homebuyers comparing neighborhoods to hedge funds modeling housing market trends, access to accurate property data is a competitive advantage. Redfin, with its comprehensive MLS-sourced listings and proprietary market analytics, is one of the richest sources of real estate data on the web.

In this guide, we'll explore Redfin's data architecture, the types of information you can extract, and practical techniques for building reliable scrapers. Whether you're a real estate investor tracking price trends, a proptech startup building data products, or a researcher studying housing markets, this article covers the full technical landscape.

Understanding Redfin's Data Architecture

Redfin stands out from other real estate platforms because it operates as an actual brokerage. This means its data comes directly from MLS (Multiple Listing Service) feeds, making it more accurate and timely than aggregator sites.

URL Structure

Redfin uses a clean, predictable URL structure:

  • City search: redfin.com/city/30749/WA/Seattle
  • Zip code search: redfin.com/zipcode/98101
  • Individual listing: redfin.com/WA/Seattle/123-Main-St-98101/home/12345678
  • Neighborhood: redfin.com/neighborhood/529/WA/Seattle/Capitol-Hill

The numeric IDs at the end of listing URLs (/home/12345678) are Redfin's internal property IDs, which remain stable even if the address format changes.

Data Layers

Redfin organizes data in several interconnected layers:

  1. Search/Listing Layer: Property cards with summary data (price, beds, baths, sqft)
  2. Property Detail Layer: Full listing information, photos, description
  3. History Layer: Price changes, listing history, tax records
  4. Market Layer: Aggregated statistics for areas (median price, days on market, etc.)
  5. Agent Layer: Listing agent information, brokerage details

Redfin's Stingray API

Redfin uses an internal API (often called "Stingray") that powers its frontend. Many data requests go through endpoints like:

https://www.redfin.com/stingray/api/gis?al=1&region_id=16163&region_type=6
https://www.redfin.com/stingray/api/home/details/belowTheFold?propertyId=12345678
Enter fullscreen mode Exit fullscreen mode

These API endpoints return JSON data (often wrapped in a comment prefix {}&&{...} that needs to be stripped). Understanding these endpoints is key to efficient scraping.

What Data Can You Extract?

Property Listing Data

Data Point Source Notes
Address Listing header Full street address with unit
List price Price section Current asking price
Beds/Baths Property stats Bedroom and bathroom count
Square footage Property stats Living area in sqft
Lot size Property details Land area
Year built Property details Construction year
Property type Listing type Single family, condo, townhouse, etc.
MLS number Listing details Unique MLS identifier
Days on market Listing stats Time since listing went active
HOA dues Fee section Monthly HOA if applicable
Parking Property details Garage type and spaces
Status Listing badge Active, pending, sold, etc.

Price History Data

Each property has a price history tab containing:

  • Listing events: Date listed, price changes, taken off market, relisted
  • Sale records: Past sale dates and prices
  • Tax assessment history: Annual assessed values
  • Price per square foot over time

This data is invaluable for investment analysis and market modeling.

Agent and Brokerage Data

Each listing includes:

  • Listing agent name and contact information
  • Buyer's agent (for sold properties)
  • Brokerage name
  • Agent's active listings count
  • Agent's past sales

Market Statistics

Redfin publishes rich market data at various geographic levels:

  • Median sale price and year-over-year change
  • Median days on market
  • Number of homes sold
  • Sale-to-list price ratio
  • Inventory levels
  • Price drops percentage
  • Competition score (Redfin's proprietary metric)

Building a Redfin Scraper with Node.js

Let's build a comprehensive Redfin scraper using Crawlee.

Project Setup and Configuration

const { CheerioCrawler, Dataset, log } = require('crawlee');

const BASE_URL = 'https://www.redfin.com';

// Redfin-specific headers to mimic browser requests
const CUSTOM_HEADERS = {
    'User-Agent':
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
        + 'AppleWebKit/537.36 (KHTML, like Gecko) '
        + 'Chrome/120.0.0.0 Safari/537.36',
    'Accept':
        'text/html,application/xhtml+xml,'
        + 'application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://www.redfin.com/',
};

const crawler = new CheerioCrawler({
    maxConcurrency: 1, // Redfin is strict about rate limiting
    maxRequestRetries: 3,
    requestHandlerTimeoutSecs: 90,
    additionalMimeTypes: ['application/json'],

    preNavigationHooks: [
        (crawlingContext) => {
            crawlingContext.request.headers = {
                ...CUSTOM_HEADERS,
            };
        },
    ],

    async requestHandler({ request, $, body, log }) {
        const { label } = request.userData;

        switch (label) {
            case 'SEARCH':
                await handleSearchPage($, request, log);
                break;
            case 'LISTING':
                await handleListingPage($, body, request, log);
                break;
            case 'API':
                await handleApiResponse(body, request, log);
                break;
            default:
                log.warning(`Unknown label: ${label}`);
        }
    },
});
Enter fullscreen mode Exit fullscreen mode

Extracting Search Results

async function handleSearchPage($, request, log) {
    const listings = [];

    // Redfin renders property cards in the search results
    $('div.HomeCardContainer').each((i, el) => {
        const card = $(el);
        const listing = {
            address: card.find('div.homeAddressV2')
                         .text().trim(),
            price: card.find('span.homecardV2Price')
                       .text().trim()
                       .replace(/[^0-9]/g, ''),
            beds: card.find('div.HomeStatsV2 .beds')
                      .text().trim(),
            baths: card.find('div.HomeStatsV2 .baths')
                       .text().trim(),
            sqft: card.find('div.HomeStatsV2 .sqft')
                      .text().trim()
                      .replace(/[^0-9]/g, ''),
            url: BASE_URL + card.find('a.link-and-anchor')
                                .attr('href'),
            status: card.find('span.listingType')
                        .text().trim(),
        };

        if (listing.address) {
            listings.push(listing);
        }
    });

    log.info(
        `Found ${listings.length} listings on search page`
    );

    // Enqueue individual listing pages
    for (const listing of listings) {
        await crawler.addRequests([{
            url: listing.url,
            userData: { label: 'LISTING', searchData: listing },
        }]);
    }

    // Handle pagination
    const nextButton = $('button.PageArrow[data-rf-test-id="react-data-paginate-next"]');
    if (nextButton.length) {
        const currentPage = parseInt(
            $('span.pageText').text().match(/\d+/)?.[0] || '1'
        );
        const nextUrl = request.url.includes('/page-')
            ? request.url.replace(
                  /\/page-\d+/,
                  `/page-${currentPage + 1}`
              )
            : `${request.url}/page-2`;

        await crawler.addRequests([{
            url: nextUrl,
            userData: { label: 'SEARCH' },
        }]);
    }
}
Enter fullscreen mode Exit fullscreen mode

Extracting Detailed Property Information

async function handleListingPage($, body, request, log) {
    const property = {
        url: request.url,
        scrapedAt: new Date().toISOString(),
    };

    // Basic property information
    property.address = $('h1[data-rf-test-id="abp-homeinfo-homeaddress"]')
        .text().trim();
    property.price = $('div[data-rf-test-id="abp-price"] span')
        .text().trim().replace(/[^0-9]/g, '');
    property.status = $('span[data-rf-test-id="abp-status"]')
        .text().trim();

    // Property stats (beds, baths, sqft)
    property.beds = $('div[data-rf-test-id="abp-beds"] .statsValue')
        .text().trim();
    property.baths = $('div[data-rf-test-id="abp-baths"] .statsValue')
        .text().trim();
    property.sqft = $('div[data-rf-test-id="abp-sqFt"] .statsValue')
        .text().trim().replace(/[^0-9]/g, '');

    // Description
    property.description = $('div[data-rf-test-id="listing-remarks"]')
        .text().trim();

    // Property details from the key details section
    property.details = {};
    $('div.keyDetail').each((i, el) => {
        const label = $(el).find('span.header').text().trim();
        const value = $(el).find('span.content').text().trim();
        if (label && value) {
            property.details[label] = value;
        }
    });

    // Extract price history
    property.priceHistory = extractPriceHistory($);

    // Extract agent information
    property.listingAgent = {
        name: $('div.agent-basic-details span.agent-name')
              .text().trim(),
        brokerage: $('div.agent-basic-details span.office-name')
                   .text().trim(),
        phone: $('div.agent-basic-details a[href^="tel:"]')
               .text().trim(),
    };

    // School information
    property.schools = [];
    $('div.school-card').each((i, el) => {
        property.schools.push({
            name: $(el).find('span.school-name').text().trim(),
            rating: $(el).find('span.school-rating')
                         .text().trim(),
            distance: $(el).find('span.school-distance')
                           .text().trim(),
            type: $(el).find('span.school-type').text().trim(),
        });
    });

    log.info(`Extracted details for: ${property.address}`);
    await Dataset.pushData(property);
}

function extractPriceHistory($) {
    const history = [];

    $('table.property-history-table tbody tr').each((i, el) => {
        const cells = $(el).find('td');
        if (cells.length >= 4) {
            history.push({
                date: $(cells[0]).text().trim(),
                event: $(cells[1]).text().trim(),
                price: $(cells[2]).text().trim()
                           .replace(/[^0-9]/g, ''),
                pricePerSqft: $(cells[3]).text().trim(),
            });
        }
    });

    return history;
}
Enter fullscreen mode Exit fullscreen mode

Python Approach for Redfin Scraping

Here's a Python implementation focused on Redfin's internal API:

import requests
import json
import time
import re
from urllib.parse import quote

class RedfinScraper:
    BASE_URL = "https://www.redfin.com"
    STINGRAY_URL = f"{BASE_URL}/stingray/api"

    HEADERS = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/120.0.0.0 Safari/537.36"
        ),
        "Accept": "application/json",
        "Referer": "https://www.redfin.com/",
    }

    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update(self.HEADERS)

    def _parse_stingray_response(self, text):
        """Strip Redfin's JSON comment prefix."""
        cleaned = re.sub(r'^[^{]*&&', '', text)
        return json.loads(cleaned)

    def search_properties(self, query, num_homes=20):
        # Step 1: Resolve location via autocomplete
        auto_url = (
            f"{self.STINGRAY_URL}/v1/search/"
            f"typeahead?input={quote(query)}"
            f"&num_homes={num_homes}"
        )
        resp = self.session.get(auto_url)
        location_data = self._parse_stingray_response(
            resp.text
        )

        if not location_data.get("payload", {}).get(
            "sections"
        ):
            print("No location found for query.")
            return []

        first_result = (
            location_data["payload"]["sections"][0]
            ["rows"][0]
        )
        region_id = first_result.get("id")
        region_type = first_result.get("type")

        print(
            f"Found region: {first_result.get('name')} "
            f"(ID: {region_id}, Type: {region_type})"
        )
        time.sleep(2)

        # Step 2: Fetch listings via GIS API
        gis_url = (
            f"{self.STINGRAY_URL}/gis?"
            f"al=1&region_id={region_id}"
            f"&region_type={region_type}"
            f"&num_homes={num_homes}"
        )
        resp = self.session.get(gis_url)
        gis_data = self._parse_stingray_response(resp.text)

        properties = []
        homes = (
            gis_data.get("payload", {})
            .get("homes", [])
        )

        for home in homes:
            prop = {
                "property_id": home.get("propertyId"),
                "listing_id": home.get("listingId"),
                "address": (
                    home.get("streetLine", {})
                    .get("value", "")
                ),
                "city": home.get("city"),
                "state": home.get("state"),
                "zip": home.get("zip"),
                "price": home.get("price", {}).get(
                    "value"
                ),
                "beds": home.get("beds"),
                "baths": home.get("baths"),
                "sqft": home.get("sqFt", {}).get("value"),
                "lot_size": home.get("lotSize", {}).get(
                    "value"
                ),
                "year_built": home.get("yearBuilt", {})
                    .get("value"),
                "property_type": home.get(
                    "propertyType"
                ),
                "listing_status": home.get("status"),
                "days_on_market": home.get("dom", {})
                    .get("value"),
                "price_per_sqft": home.get(
                    "pricePerSqFt", {}
                ).get("value"),
                "hoa_dues": home.get("hoa", {}).get(
                    "value"
                ),
                "url": (
                    f"{self.BASE_URL}"
                    f"{home.get('url', '')}"
                ),
            }
            properties.append(prop)

        return properties

    def get_property_details(self, property_id):
        url = (
            f"{self.STINGRAY_URL}/home/details/"
            f"belowTheFold?propertyId={property_id}"
            f"&accessLevel=1"
        )
        resp = self.session.get(url)
        data = self._parse_stingray_response(resp.text)
        return data.get("payload", {})

    def get_price_history(self, property_id):
        details = self.get_property_details(property_id)
        history = (
            details.get("propertyHistoryInfo", {})
            .get("events", [])
        )

        return [
            {
                "date": event.get("eventDate"),
                "event_type": event.get(
                    "eventDescription"
                ),
                "price": event.get("price"),
                "price_per_sqft": event.get(
                    "pricePerSqFt"
                ),
                "source": event.get("source"),
            }
            for event in history
        ]

    def get_market_stats(self, region_id, region_type=6):
        url = (
            f"{self.STINGRAY_URL}/market-tracker/"
            f"overview?regionId={region_id}"
            f"&regionType={region_type}"
        )
        resp = self.session.get(url)
        data = self._parse_stingray_response(resp.text)
        payload = data.get("payload", {})

        return {
            "median_sale_price": payload.get(
                "medianSalePrice"
            ),
            "median_dom": payload.get("medianDom"),
            "homes_sold": payload.get("homesSold"),
            "inventory": payload.get("inventory"),
            "sale_to_list_ratio": payload.get(
                "saleToListRatio"
            ),
            "price_drops_pct": payload.get(
                "priceDropsPct"
            ),
            "yoy_change": payload.get(
                "medianSalePriceYoyChange"
            ),
        }


# Usage example
scraper = RedfinScraper()

# Search for properties in Seattle
properties = scraper.search_properties(
    "Seattle, WA", num_homes=10
)
print(f"Found {len(properties)} properties")

for prop in properties[:3]:
    print(
        f"\n{prop['address']}, {prop['city']}: "
        f"${prop['price']:,}"
    )

    # Get price history for each property
    time.sleep(3)  # Respectful delay
    history = scraper.get_price_history(
        prop['property_id']
    )
    for event in history[:5]:
        print(
            f"  {event['date']}: "
            f"{event['event_type']} - "
            f"${event.get('price', 'N/A')}"
        )
Enter fullscreen mode Exit fullscreen mode

Handling Redfin's Anti-Scraping Defenses

Redfin has some of the more sophisticated anti-scraping measures among real estate sites.

1. Request Fingerprinting

Redfin tracks browser fingerprints. Your requests need consistent headers:

// Maintain session consistency
const sessionHeaders = {
    'Cookie': 'RF_BROWSER_ID=abc123; RF_BID_UPDATED=1;',
    'X-Requested-With': 'XMLHttpRequest',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
};
Enter fullscreen mode Exit fullscreen mode

2. Rate Limiting and CAPTCHAs

Redfin aggressively rate-limits automated requests:

import time
import random

class RateLimiter:
    def __init__(self, base_delay=3.0):
        self.base_delay = base_delay
        self.consecutive_errors = 0

    def wait(self):
        delay = self.base_delay * (
            2 ** self.consecutive_errors
        )
        jitter = random.uniform(0.5, 1.5)
        actual_delay = delay * jitter
        time.sleep(min(actual_delay, 60))

    def success(self):
        self.consecutive_errors = max(
            0, self.consecutive_errors - 1
        )

    def failure(self):
        self.consecutive_errors += 1
Enter fullscreen mode Exit fullscreen mode

3. JavaScript-Rendered Content

Some property details only appear after JavaScript execution:

const { PlaywrightCrawler } = require('crawlee');

const crawler = new PlaywrightCrawler({
    headless: true,
    maxConcurrency: 1,

    async requestHandler({ page, request, log }) {
        // Wait for the main content to load
        await page.waitForSelector(
            'div[data-rf-test-id="abp-price"]',
            { timeout: 20000 }
        );

        // Scroll to trigger lazy-loaded sections
        await page.evaluate(async () => {
            const delay = ms =>
                new Promise(r => setTimeout(r, ms));
            for (let i = 0; i < 5; i++) {
                window.scrollBy(0, 800);
                await delay(1000);
            }
        });

        // Wait for price history table
        await page.waitForSelector(
            'table.property-history-table',
            { timeout: 10000 }
        ).catch(() => {
            log.warning('Price history table not found');
        });

        // Now extract the fully rendered content
        const data = await page.evaluate(() => {
            // ... extraction logic
        });

        log.info(
            `Extracted: ${data.address} - $${data.price}`
        );
    },
});
Enter fullscreen mode Exit fullscreen mode

Using Apify for Redfin Scraping

For production-grade Redfin scraping, Apify handles the infrastructure complexity:

Running a Redfin Actor on Apify

const Apify = require('apify');

const run = await Apify.call('redfin/property-scraper', {
    searchUrls: [
        {
            url: 'https://www.redfin.com/city/30749/WA/Seattle'
        },
        {
            url: 'https://www.redfin.com/city/11203/CA/Los-Angeles'
        },
    ],
    maxItems: 200,
    includeDetails: true,
    includePriceHistory: true,
    includeSchools: true,
    proxy: {
        useApifyProxy: true,
        apifyProxyGroups: ['RESIDENTIAL'],
    },
});

const dataset = await Apify.openDataset(
    run.defaultDatasetId
);
const { items } = await dataset.getData();

console.log(`Scraped ${items.length} properties`);

// Filter for investment opportunities
const deals = items.filter(item => {
    const pricePerSqft = parseInt(item.price)
        / parseInt(item.sqft);
    const avgForArea = 450; // Example area average
    return pricePerSqft < avgForArea * 0.85;
});

console.log(
    `Found ${deals.length} potential below-market deals`
);
Enter fullscreen mode Exit fullscreen mode

Automated Market Monitoring

Set up regular scraping to track market trends:

// Apify scheduled task for weekly market monitoring
const task = {
    actorId: 'redfin/market-tracker',
    name: 'seattle-market-weekly',
    options: {
        build: 'latest',
        memoryMbytes: 4096,
        timeoutSecs: 3600,
    },
    input: {
        regions: [
            { name: 'Seattle', regionId: 16163 },
            { name: 'Bellevue', regionId: 1528 },
            { name: 'Redmond', regionId: 14470 },
        ],
        metrics: [
            'medianSalePrice',
            'medianDom',
            'inventory',
            'saleToListRatio',
        ],
        outputFormat: 'csv',
    },
    scheduleExpression: '0 8 * * 1', // Mondays 8 AM
};
Enter fullscreen mode Exit fullscreen mode

Practical Use Cases for Redfin Data

1. Investment Property Analysis

import pandas as pd
import numpy as np

def analyze_investment_potential(properties):
    df = pd.DataFrame(properties)

    # Clean numeric columns
    for col in ['price', 'sqft', 'year_built']:
        df[col] = pd.to_numeric(df[col], errors='coerce')

    # Calculate metrics
    df['price_per_sqft'] = df['price'] / df['sqft']
    df['age'] = 2026 - df['year_built']

    # Score: lower price/sqft and lower DOM = better deal
    df['price_score'] = 1 - (
        df['price_per_sqft'].rank(pct=True)
    )
    df['dom_score'] = df['days_on_market'].rank(pct=True)

    # High DOM + low price = motivated seller
    df['deal_score'] = (
        df['price_score'] * 0.6
        + df['dom_score'] * 0.4
    )

    return df.nlargest(10, 'deal_score')[[
        'address', 'price', 'sqft',
        'price_per_sqft', 'days_on_market',
        'deal_score',
    ]]
Enter fullscreen mode Exit fullscreen mode

2. Neighborhood Comparison Dashboard

def compare_neighborhoods(market_stats):
    comparison = pd.DataFrame(market_stats)

    # Calculate relative value
    avg_price = comparison['median_sale_price'].mean()
    comparison['price_index'] = (
        comparison['median_sale_price'] / avg_price * 100
    ).round(1)

    # Buyer's vs seller's market indicator
    comparison['market_type'] = comparison[
        'sale_to_list_ratio'
    ].apply(
        lambda x: "Seller's" if x > 1.0 else "Buyer's"
    )

    return comparison.sort_values('price_index')
Enter fullscreen mode Exit fullscreen mode

3. Price Trend Forecasting Data Prep

def prepare_trend_data(price_histories):
    all_events = []

    for prop_id, history in price_histories.items():
        for event in history:
            if event['event_type'] == 'Sold':
                all_events.append({
                    'property_id': prop_id,
                    'date': pd.to_datetime(
                        event['date']
                    ),
                    'price': event['price'],
                    'price_per_sqft':
                        event['price_per_sqft'],
                })

    df = pd.DataFrame(all_events)
    df = df.set_index('date').sort_index()

    # Monthly median price trends
    monthly = df.resample('M')['price'].agg([
        'median', 'count', 'std'
    ])
    monthly['yoy_change'] = monthly['median'].pct_change(
        periods=12
    )

    return monthly
Enter fullscreen mode Exit fullscreen mode

Data Export and Integration

Exporting to Common Formats

const { Dataset } = require('crawlee');

// After scraping is complete
const dataset = await Dataset.open('redfin-properties');

// Export to CSV for spreadsheet analysis
const csvData = await dataset.exportToCSV();

// Export to JSON for API consumption
const jsonData = await dataset.exportToJSON();

// Direct integration with Google Sheets
const { google } = require('googleapis');

async function exportToSheets(data, spreadsheetId) {
    const sheets = google.sheets({ version: 'v4' });
    const rows = data.map(item => [
        item.address,
        item.price,
        item.beds,
        item.baths,
        item.sqft,
        item.pricePerSqft,
        item.daysOnMarket,
        item.url,
    ]);

    await sheets.spreadsheets.values.append({
        spreadsheetId,
        range: 'Properties!A:H',
        valueInputOption: 'USER_ENTERED',
        resource: { values: rows },
    });
}
Enter fullscreen mode Exit fullscreen mode

Legal and Ethical Considerations

Real estate data scraping has specific legal nuances:

  • MLS data is copyrighted by local MLS organizations — be aware of licensing terms
  • Redfin's Terms of Service prohibit automated access — assess your risk tolerance
  • Fair Housing Act implications: ensure scraped data isn't used for discriminatory purposes
  • Personal data (agent info) may be subject to privacy regulations
  • Rate limiting is not just ethical — aggressive scraping can impact the platform for other users
  • Commercial use of scraped data may have additional legal requirements

Always consult with a legal professional before using scraped real estate data commercially.

Conclusion

Redfin scraping opens powerful possibilities for real estate analysis, investment research, and market intelligence. The platform's rich data — from granular property details and price histories to aggregated market statistics — provides the foundation for sophisticated real estate analytics.

The key challenges are Redfin's anti-scraping defenses and the need for consistent, reliable data collection. Whether you build a custom scraper tailored to your specific needs or leverage Apify's managed infrastructure, the techniques in this guide give you the technical foundation to extract real estate data effectively.

Start with a small geographic area, validate your data against what you see on the site, and scale gradually. Real estate data has real commercial value — the investment in building a solid scraping pipeline pays dividends through better-informed property decisions and market insights.

Remember: the goal isn't just to collect data, but to transform it into actionable intelligence. Combine scraped Redfin data with other sources (census data, economic indicators, permit records) to build a comprehensive view of any real estate market.

Top comments (0)