DEV Community

Jordan Hansen
Jordan Hansen

Posted on • Originally published at javascriptwebscrapingguy.com on

Jordan Scrapes Redfin

Demo code here

Video walk through

Today we go to Redfin! This is in the real estate data arena. It is the kind of thing that goes in pair with the post I wrote about scraping real estate auctions. You would get the auction you are looking for and then go to Redfin.com to get some estimated pricing and other data.

Investigation

fun real estate gif

When scraping a real estate site like this there are really two steps. The first is to be able to leverage an address to find the details page on the site. The second part is more obvious, just scraping the site for the desired data.

Redfin is a modern site and they do live returns of property information as you type. When they do this, it returns something that allows the user to go directly to the details page of this address. This almost for sure means that we can use it to find a way to the details page.

Check it.

web scraping redfin's search

On the left you can see the searched data and the exact property discovered. On the right you can see the XHR requests that return the following data:

{}&&{"version":348,"errorMessage":"Success","resultCode":0,"payload":{"sections":[{"rows":[{"id":"1_60647192","type":"1","name":"3950 Callahan Dr","subName":"Memphis, TN, USA","url":"/TN/Memphis/3950-Callahan-Dr-38127/home/60647192","active":true,"claimedHome":false,"invalidMRS":false,"businessMarketIds":[58],"countryCode":"US"}],"name":"Addresses"}],"exactMatch":{"id":"1_60647192","type":"1","name":"3950 Callahan Dr","subName":"Memphis, TN, USA","url":"/TN/Memphis/3950-Callahan-Dr-38127/home/60647192","active":true,"claimedHome":false,"invalidMRS":false,"businessMarketIds":[58],"countryCode":"US"},"extraResults":{},"responseTime":0,"hasFakeResults":false,"isGeocoded":false,"isRedfinServiced":false}}
Enter fullscreen mode Exit fullscreen mode

This data is kind of funny because it’s not quite JSON. Remove that first {}&& and the rest is valid JSON. And inside…we see a url! Bingo. We’re in business.

With this url, we can go directly to the webpage we are looking for. At the top, what do we find? The property value that we were looking for!

Redfin estimate.

Unfortunately, the details page doesn’t have any XHR requests with property data. The easiest way to confirm this is by looking at the network tab in developer tools and checking the “Doc” tab. If you see the page requested fully rendered then that means it is returning from the server fully fleshed out already.

scraping Redfin Doc page

I’ll just use cheerio for this part and parse the HTML to get the price I’m looking for.

The Code

fun real estate gif

Pretty simple code execution here. The async block that will handle it all will look like this:

const exampleAddresses = [
    '3950 CALLAHAN DR, Memphis, TN 38127',
    '17421 Deforest Ave, Cleveland, OH 44128',
    '1226 DIVISION AVENUE, San Antonio, TX 78225'
];

(async () => {

    for (let i = 0; i < exampleAddresses.length; i++) {
        const path = await getUrl(exampleAddresses[i]);

        console.log('path', path);

        const price = await getPrice(path);

        console.log('price', price);

        await timeout(2000);
    }
})();
Enter fullscreen mode Exit fullscreen mode

You’d loop through your target addresses, get the url (really the path), and use that when you get the price.

async function getUrl(address: string) {
    // Location and v are required query parameters
    const url = `https://www.redfin.com/stingray/do/location-autocomplete?location=${address}&v=2`;

    const axiosResponse = await axios.get(url);

    const parsedData = JSON.parse(axiosResponse.data?.replace('{}&&', ''));

    return parsedData.payload.exactMatch.url;
}
Enter fullscreen mode Exit fullscreen mode

The above function will get the path from that weird almost JSON. We just get the data and then remove the {}&& with a replace function.

The getPrice function is a simple call with axios and parse with cheerio.

async function getPrice(path: string) {
    const url = `https://redfin.com${path}`;

    const axiosResponse = await axios.get(url);

    const $ = cheerio.load(axiosResponse.data);

    let price = $('[data-rf-test-id="avm-price"] .statsValue').text();

    if (!price) {
        price = $('[data-rf-test-id="avmLdpPrice"] .value').text();
    }

    return price;
}
Enter fullscreen mode Exit fullscreen mode

Bam. And that’s the end. We got ourselves some property prices from Redfin.

Demo code here

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!

The post Jordan Scrapes Redfin appeared first on Javascript Web Scraping Guy.

Latest comments (0)