DEV Community

Cover image for Automate Image Downloads from Excel Using Google Colab
Tahsin Abrar
Tahsin Abrar

Posted on • Edited on

Automate Image Downloads from Excel Using Google Colab

Introduction

Downloading multiple images manually from a dataset can be time-consuming. If you have a list of image URLs stored in an Excel file, you can automate the process using Python in Google Colab. This tutorial will guide you through writing a script to download images and save them with specific filenames.

Prerequisites

To follow this tutorial, you need:

  • A Google Colab account
  • An Excel file (Alumni.xlsx) containing the following columns:
    • LM ID (Unique identifier for each entry)
    • image (URL of the image to be downloaded)
  • Basic knowledge of Python

Steps to Download Images

Step 1: Upload the Excel File

Google Colab allows users to upload files interactively. The following script prompts you to upload Alumni.xlsx:

from google.colab import files
import pandas as pd

# Upload the file
uploaded = files.upload()
file_path = list(uploaded.keys())[0]  # Get the uploaded file name

# Load the Excel file
df = pd.read_excel(file_path)
Enter fullscreen mode Exit fullscreen mode

Step 2: Ensure Necessary Columns Exist

We must check if the required columns (LM ID and image) are present in the uploaded file:

required_columns = {'LM ID', 'image'}
if not required_columns.issubset(df.columns):
    raise ValueError(f"Missing columns: {required_columns - set(df.columns)}")
Enter fullscreen mode Exit fullscreen mode

Step 3: Download Images

The script will iterate through each row, extract the image URL, and save it using LM ID as the filename:

import os
import requests

# Create a folder for images
output_folder = "alumni_images"
os.makedirs(output_folder, exist_ok=True)

for index, row in df.iterrows():
    lm_id = str(row['LM ID'])
    image_url = row['image']

    if pd.notna(image_url) and isinstance(image_url, str):
        try:
            response = requests.get(image_url, stream=True)
            if response.status_code == 200:
                image_path = os.path.join(output_folder, f"{lm_id}.jpg")
                with open(image_path, 'wb') as file:
                    for chunk in response.iter_content(1024):
                        file.write(chunk)
                print(f"Downloaded: {lm_id}.jpg")
            else:
                print(f"Failed to download {image_url} for LM ID: {lm_id}")
        except Exception as e:
            print(f"Error downloading {image_url}: {e}")
    else:
        print(f"Invalid image URL for LM ID: {lm_id}")
Enter fullscreen mode Exit fullscreen mode

Step 4: Download the Images as a ZIP File

After downloading, we can zip the alumni_images folder and provide a download link:

import shutil

# Create a ZIP file of the folder
shutil.make_archive("alumni_images", 'zip', "alumni_images")

# Provide download link
files.download("alumni_images.zip")
Enter fullscreen mode Exit fullscreen mode

Top comments (0)