DEV Community

Cover image for How to Extract Airbnb Pages Using ProxyCrawl in .NET
Neil R. Zamora
Neil R. Zamora

Posted on

How to Extract Airbnb Pages Using ProxyCrawl in .NET

Building a web scraper from the ground up to extract Airbnb data is no easy task. If you do not have the proper knowledge and tools at your disposal, you will most likely end up getting IP blocked.

So in this project, I will share with you an easy way to create a web scraper with the help of ProxyCrawl’s Scraper API. Using this API will allow you to avoid most IP blocks and CAPTCHAs as it is built on top of rotating proxies. It also automatically scrapes the web for you at scale and returns parsed content instead of the complete HTML source code saving you time and effort from building your own parser.

We will use my favorite platform, which is Microsoft's .NET, to demonstrate how simple it is to integrate the Scraper API in a web crawler and retrieve parsed data from Airbnb search results.

What we'll cover

What you'll need

  • Knowledge in C# Programming Language
  • Knowledge in Microsoft Visual Studio
  • Microsoft Visual Studio installed on Windows
  • ProxyCrawl account to use the Scraper API

Code Setup

First, create a new C# Console Application Project in Microsoft Visual Studio. You can copy and paste the sample code below:

using System;

namespace ConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {

        }
    }
}
Enter fullscreen mode Exit fullscreen mode

We will utilize the NuGet dependency called ProxyCrawlAPI (2.0.0) that acts as an easy-to-use library wrapping for ProxyCrawl services.

The Scraper Code

To scrape Airbnb search results, we have to use the following URL format: https://www.airbnb.com/s/**YOUR PLACE HERE**/homes. For this example, we will be searching places in Beirut. You can write the following code in our Main method:

using System;

namespace ConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var api = new ProxyCrawl.ScraperAPI("YOUR_PROXYCRAWL_TOKEN_HERE");
            api.Get("https://www.airbnb.com/s/Beirut/homes");
            Console.WriteLine(api.Body);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The api.Body returns a structured AirBnb search items. You can see the example output below:

{
  "residents": [
    {
      "position": 1,
      "title": "Romy's Apartment at The Cube",
      "superHost": true,
      "residentType": "Entire apartment",
      "location": "Horch Tabet",
      "samplePhotoUrl": "https://a0.muscache.com/im/pictures/miso/Hosting-48845122/original/876dcd11-337b-464b-a4a7-b575858ed18f.jpeg?im_w=720",
      "accommodation": {
        "guests": "2 guests",
        "bedrooms": "1 bedroom",
        "beds": "1 bed",
        "baths": "1.5 baths"
      },
      "amenities": [
        "Wifi",
        "Air conditioning",
        "Kitchen",
        "Washer"
      ],
      "rating": "5.0",
      "personReviewed": "10",
      "costs": {
        "PricePerNight": "$67"
      }
    },
    {
      "position": 2,
      "title": "Michele's Apartment at The Cube",
      "superHost": true,
      "residentType": "Entire apartment",
      "location": "El Fil",
      "samplePhotoUrl": "https://a0.muscache.com/im/pictures/f36baf12-17d6-46a0-9b31-2229677ef43b.jpg?im_w=720",
      "accommodation": {
        "guests": "3 guests",
        "bedrooms": "2 bedrooms",
        "beds": "2 beds",
        "baths": "2 baths"
      },
      "amenities": [
        "Wifi",
        "Air conditioning",
        "Kitchen",
        "Washer"
      ],
      "rating": "4.81",
      "personReviewed": "73",
      "costs": {
        "PricePerNight": "$90"
      }
    }
  ],
  "residentsFound": 20
}
Enter fullscreen mode Exit fullscreen mode

Extracting Data Using C# Objects

Working with JSON is painful, so we will be using C# objects for this example. That said, let’s create the following classes first:

using System;

using Newtonsoft.Json;

namespace ConsoleApp
{
    class Program
    {
        #region Inner Classes

        public class AirBnbScraperResult
        {
            public AirBnbResident[] Residents { get; set; }
        }

        public class AirBnbResident
        {
            [JsonProperty("title")]
            public string Title { get; set; }

            [JsonProperty("superHost")]
            public bool? SuperHost { get; set; }

            [JsonProperty("residentType")]
            public string ResidentType { get; set; }

            [JsonProperty("location")]
            public string Location { get; set; }

            [JsonProperty("samplePhotoUrl")]
            public string SamplePhotoUrl { get; set; }

            [JsonProperty("rating")]
            public decimal? Rating { get; set; }

            [JsonProperty("personReviewed")]
            public int? PersonReviewed { get; set; }

            [JsonProperty("accommodation")]
            public AirBnbResidentAccommodation Accommodation { get; set; }

            [JsonProperty("amenities")]
            public string[] Amenities { get; set; }

            [JsonProperty("costs")]
            public AirBnbResidentCost Costs { get; set; }
        }

        public class AirBnbResidentAccommodation
        {
            [JsonProperty("guests")]
            public string Guests { get; set; }

            [JsonProperty("bedrooms")]
            public string Bedrooms { get; set; }

            [JsonProperty("beds")]
            public string Beds { get; set; }

            [JsonProperty("baths")]
            public string Baths { get; set; }
        }

        public class AirBnbResidentCost
        {
            [JsonProperty("priceCurrency")]
            public string PriceCurrency { get; set; }

            [JsonProperty("pricePerNight")]
            public string PricePerNight { get; set; }
        }

        #endregion

        ...
    }
}
Enter fullscreen mode Exit fullscreen mode

Then, we de-serialize from our created object and navigate our objects from the api.Body in the main method above.

using System;

using Newtonsoft.Json;

namespace ConsoleApp
{
    class Program
    {
        ...

        static void Main(string[] args)
        {
            var api = new ProxyCrawl.ScraperAPI("YOUR_PROXYCRAWL_TOKEN_HERE");
            api.Get("https://www.airbnb.com/s/Beirut/homes");

            AirBnbScraperResult results = JsonConvert.DeserializeObject<AirBnbScraperResult>(api.Body);
            foreach (var resident in results.Residents)
            {
                if (resident.SuperHost.HasValue && resident.SuperHost.Value)
                {
                    Console.WriteLine("{0} <SuperHost>", resident.Title);
                }
                else
                {
                    Console.WriteLine(resident.Title);
                }
                Console.WriteLine("Type: {0}", resident.ResidentType);
                Console.WriteLine("Location: {0}", resident.Location);
                Console.WriteLine("Photo: {0}", resident.SamplePhotoUrl);
                Console.WriteLine("Rating: {0}", resident.Rating);
                Console.WriteLine("Reviewers count: {0}", resident.PersonReviewed);
                if (resident.Amenities != null && resident.Amenities.Length > 0)
                {
                    Console.WriteLine("Amenities: {0}", string.Join(", ", resident.Amenities));
                }
                if (resident.Accommodation != null)
                {
                    Console.WriteLine("Amenities");
                    Console.WriteLine("  * Bedrooms: {0}", resident.Accommodation.Bedrooms);
                    Console.WriteLine("  *     Beds: {0}", resident.Accommodation.Beds);
                    Console.WriteLine("  *    Baths: {0}", resident.Accommodation.Baths);
                    Console.WriteLine("  *   Guests: {0}", resident.Accommodation.Guests);
                }
                if (resident.Costs != null)
                {
                    Console.WriteLine("Costs");
                    Console.WriteLine("  * Currency: {0}", resident.Costs.PriceCurrency);
                    Console.WriteLine("  *Per Night: {0}", resident.Costs.PricePerNight);
                }

                Console.WriteLine();
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

And it outputs similar to the screenshot below.
Web Scraper result ProxyCrawl

See the complete source code at https://github.com/neilrzamora/proxycrawl-airbnb-scraper-dotnet.git

Conclusion

So there you go, extracting data from Airbnb with the help of ProxyCrawl’s .NET library is just a breeze. There’s no need to compile a list of proxies or even write several lines of code to avoid CAPTCHAs and get the parsed data. With just one line of code, the Scraper API will handle parsing and proxies so you can just concentrate on the returned data.

ProxyCrawl is truly a versatile platform for web crawling and scraping. Feel free to utilize and expand the example code in this tutorial based on your needs.

Oldest comments (0)