DEV Community

Cover image for Automating Image Scraping from Google Lens with Python
Airo
Airo

Posted on

Automating Image Scraping from Google Lens with Python

In today’s digital age, images are abundant and vital for various applications, from machine learning training datasets to visual content aggregation. Google Lens, the visual search engine developed by Google, is a treasure trove of images covering a wide range of topics. However, manually collecting images from Google Lens can be a time-consuming and tedious task. In this post, we’ll explore how to automate the process of scraping images from Google Lens using Python.

import os
import time
import requests
from selenium import webdriver

# Define the URL to scrape
WEB_URL = "ENTER YOUR GOOGLE LENS URL HERE"

# Initialize a Firefox webdriver
driver = webdriver.Firefox()

# Open the webpage
driver.get(WEB_URL)

# Wait for 5 seconds to let the page load
time.sleep(5)

# Execute JavaScript to get src attributes of specific elements
elements = driver.execute_script("""
    var elements = document.getElementsByClassName("wETe9b jFVN1");
    var srcList = [];
    for (let element of elements) {
        srcList.push(element.src);
    }
    return srcList; 
""")

# Close the webdriver
driver.quit()

# Create directory to save downloaded images if it doesn't exist
if not os.path.exists('downloaded_images'):
    os.makedirs('downloaded_images')

# Download each image
for index, image_url in enumerate(elements):
    try:
        response = requests.get(image_url)
        if response.status_code == 200:
            # Save the image to disk
            with open(f"downloaded_images/image_{index}.jpg", 'wb') as f:
                f.write(response.content)
            print(f"Image {index} downloaded successfully.")
        else:
            print(f"Failed to download image {index}: HTTP status code {response.status_code}")
    except Exception as e:
        print(f"Error downloading image {index}: {str(e)}")

print("All images downloaded.")
Enter fullscreen mode Exit fullscreen mode

This Python script utilizes Selenium and Requests libraries to automate the process of scraping images from Google Lens. Here’s how it works:

Import Libraries: We import the necessary libraries, including os, time, requests, and webdriver from Selenium.

Define URL: We define the URL of the Google Lens search results page to scrape.

Initialize WebDriver: We initialize a Firefox webdriver instance to interact with the webpage.

Open Webpage: We open the specified webpage using the WebDriver.

Wait for Page Load: We wait for 5 seconds to ensure that the page loads completely.

Execute JavaScript: We execute JavaScript code to extract the src attributes of specific HTML elements containing the images we want to scrape.

Close WebDriver: We close the WebDriver after extracting the image URLs.

Download Images: We create a directory to save the downloaded images and iterate over the extracted image URLs. We use the Requests library to download each image and save it to the local filesystem.

Handle Errors: We handle any errors that occur during the image downloading process and provide informative messages.

Completion Message: Finally, we print a message indicating that all images have been successfully downloaded.

Conclusion:
Automating image scraping from Google Lens with Python opens up a world of possibilities for data-driven projects and research endeavors. By harnessing the power of web automation tools like Selenium, developers can build efficient solutions to collect, analyze, and utilize vast amounts of visual data available on the internet. The code provided in this post serves as a practical example of how automation can enhance productivity and streamline data acquisition processes, empowering users to focus on their core tasks and projects.

Top comments (0)