DEV Community

Cover image for How Python and BeautifulSoup Will Help You Scrape listings from Airbnb?
X-Byte Enterprise Crawling
X-Byte Enterprise Crawling

Posted on

How Python and BeautifulSoup Will Help You Scrape listings from Airbnb?

Scraping Airbnb listings from various websites is among the most popular Web Scraping apps that help to scrape listings from Airbnb with Python and BeautifulSoup. This might be done by keeping an eye on the rates, building an aggregator, or improving the user experience on current hotel booking services.

This can be accomplished using a simple code. We will use BeautifulSoup to extract data and information from Airbnb.com.
To begin with, we will prefer to use some codes to extract data from Airbnb.com search pages and configure BeautifulSoup in assisting inquiring the page for useful data using CSS selectors.

-- coding: utf-8 --

from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')

To get blocked, we are also passing the user agent headers to fake a browser call.

Now, let us look at the Airbnb consequences for a certain destination. The below image shows how does it look.

When we look at the website, we notice that each HTML item is included within a tag that includes the attribute itemprop and the value itemListElement.

We can simply divide the HTML document into these cards, each of which contains personal item data, as shown below.

-- coding: utf-8 --

from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('.a-carousel-card')[0].get_text())for item in soup.select('[itemprop=itemListElement]'):
try:
print('----------------------------------------')
print(item)
print('----------------------------------------')
except Exception as e:
#raise e
print('')
Once you run the code:
python3 scrapeAirbnb.py
You can see that the code isolates the HTML cards..

On closer inspection, the name of the bed and breakfast always includes the area-label property. So let’s see if we can get it back.

-- coding: utf-8 --

from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('.a-carousel-card')[0].get_text())for item in soup.select('[itemprop=itemListElement]'):
try:
print('----------------------------------------')
print(item.select('a')[0]['aria-label'])
#name = item.find("meta", itemprop="name")
print(name) print('----------------------------------------')
except Exception as e:
#raise e
print('')
This will display the result:

Now let us extract the other pieces of information.

-- coding: utf-8 --

from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.airbnb.co.in/s/New-York--NY--United-States/homes?query=New York, NY, United States&checkin=2020-03-12&checkout=2020-03-19&adults=4&children=1&infants=0&guests=5&place_id=ChIJOwg_06VPwokRYv534QaPC8g&refinement_paths[]=/for_you&toddlers=0&source=mc_search_bar&search_type=unknown'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('[itemprop=itemListElement]'):
try:
print('----------------------------------------')
print(item.select('a')[0]['aria-label'])
print(item.select('a')[0]['href'])
print(item.select('._krjbj')[0].get_text())
print(item.select('._krjbj')[1].get_text())
print(item.select('._16shi2n')[0].get_text()) print(item.select('._zkkcbwd')[0].get_text())
print(name) print('----------------------------------------')
except Exception as e:
#raise e
print('')
When you run the code:

It displays all the data we need including reviews, ratings, links, and reduced price.

In more complicated solutions, you will even have to rotate the User-Agent string, so Airbnb cannot detect if you are using a similar browser. If we go a step further, you will find that Airbnb will block your IP, ignoring all the previous efforts. This is disappointing because that is where the majority of web crawling programs fall short.

Overcoming IP Blocks
Investing in a personal rotating proxy service such as proxies API can generally make the difference between a successful and pain-free web scraping operation that will consistently do the job.
Investment of a private rotating proxy service will include Proxies API that can often mean the change between an effective and pain-free web extracting operation that consistently gets the job done.

Plus, with the current offer of 1000 free API requests, there’s almost nothing to lose by using our rotating proxy and comparing notes. It simply takes a single line of integration to make it almost unnoticeable.

Our rotating proxy server Proxies API is indeed a simple API that instantly solves any IP Blocking issues.
There are thousands of high-speed spinning proxies scattered over the globe.

Using our IP rotation service, you can rest assured that your IP address will be changed.

Using our automated rotation of the User-Agent-String (which mimics requests from various, valid web browsers and versions of web browsers) and our automatic CAPTCHA solving technology
Our CAPTCHA-solving technology that works automatically
Thousands of our clients have used a simple API to solve the problem of IP restrictions.

In any computer language, a basic API like the one given below can be used to access the entire system.

curl https://xbyte.io/?key=API_KEY&url=https://example.com

For any further assistance, you can kindly contact X-Byte Enterprise Crawling.

Oldest comments (0)