In this tough time and each of us should share knowledge and collaborate. I was trying to make a dataset of People wearing mask and without mask,I have collected a little data. But I am sharing,how can you scrape google images and do this task. Here is my video explaining the concept.
First of all we need to have Selenium and a webdriver, e.g. chromium webdriver.
Here is the code:
import os
import time
import urllib.request
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome("C:\\Users\\Sourabh\\chromedriver.exe")
driver.get('https://www.google.com/') #opens up google
search = driver.find_element_by_name('q') # the name of the searchbox
search.send_keys('people wearing mask',Keys.ENTER)
Now, we need to go to the images section
elem = driver.find_element_by_link_text('Images')
elem.get_attribute('href')
elem.click()
Now, We need to scroll the page to collect more and more values of src from
value = 0
for i in range(50): #Scrolls the page 50 times
driver.execute_script('scrollBy("+ str(value) +",+100);')
value += 100
time.sleep(4)
Now we need to find the class/id of img tag to get the src attribute from there.As of now there are three classes in google images img tag.Keep in mind that google changes it periodically ,So, It might not work after weeks.
elements = driver.find_elements_by_xpath('//img[contains(@class,"rg_i") and contains(@class, "Q4LuWd") and contains(@class, "tx8vtf")]')
try:
os.mkdir('peoplewithmask')
except FileExistsError:
pass
Finally we need to retrieve and download the links
count = 0
for i in elements:
src = i.get_attribute('src')
try:
if src != None:
src = str(src)
count+=1
urllib.request.urlretrieve(src, os.path.join('withMask','image'+str(count)+'.jpg'))
if count%10 == 0: print("downloaded",count,"images")
else:
raise TypeError
except TypeError:
pass
Done, This was all for today. Feel free to reach out if you need help.
I did not explain how to inspect and find out the class,id,etc because I feel that most developers know,Still if you find problem please refer to this video tutorial.
Thanking You
Sourabh
Top comments (0)