DEV Community

Allen Yang
Allen Yang

Posted on

How to Extract Images from Excel Using Python

Extract Images from Excel Using Python

In everyday data processing tasks, we often encounter Excel files that contain a large number of images. Whether it’s product catalogs, employee records, or asset inventories, these images usually carry important visual information. However, when we need to batch extract these images for further processing, analysis, or archiving, manual operations are not only inefficient but also highly error-prone. Imagine an Excel file containing hundreds of images—copying and pasting them one by one would be a nightmare.

Fortunately, Python, often regarded as the Swiss Army knife of data processing, can perfectly address this pain point. By leveraging powerful third-party libraries, we can easily automate the extraction of images from Excel files, significantly improving productivity. In this article, we will take an in-depth look at how to use Python together with the Spire.XLS for Python library to provide a detailed, practical Excel image extraction tutorial, helping you transform tedious manual work into an efficient automated workflow.


Understanding How Images Are Stored in Excel and Choosing the Right Python Library

Before extracting images, it is important to understand how Excel files (especially the .xlsx format) store them. Typically, images are not stored directly inside cells. Instead, they exist as independent embedded objects within a worksheet and are associated with specific cell ranges. When you insert an image into Excel, the application embeds the image data into the file structure and records metadata such as position and size.

To efficiently and accurately handle these embedded images, we need a powerful and easy-to-use Python library. Among the many libraries available for working with Excel files, Spire.XLS for Python stands out.

Introduction to Spire.XLS for Python:

Spire.XLS for Python is a professional Excel processing library that provides rich functionality for creating, reading, writing, converting, and printing Excel files within Python applications. Its advantages include:

  • Comprehensive features: Supports handling various Excel elements, including cells, rows, columns, formulas, charts, images, shapes, and comments.
  • Strong compatibility: Works with multiple Excel formats (such as .xls, .xlsx, .xlsm, .xltx, .xltm) and supports versions from Excel 97 through Excel 2019.
  • Ease of use: Offers an intuitive API that enables developers to accomplish complex tasks with minimal code.

Installation Guide:

Installing Spire.XLS for Python in your Python environment is simple. Just run the following pip command:

pip install spire.xls
Enter fullscreen mode Exit fullscreen mode


`


Step-by-Step Guide to Extracting Images with Spire.XLS for Python

Next, we will demonstrate step by step how to use Spire.XLS for Python to extract images from an Excel file.

For demonstration purposes, suppose we have an Excel file named sample_with_images.xlsx that contains several images. We want to extract them into a folder called output_images.

Step 1: Load the Excel Workbook

First, we need to load the target Excel file. In Spire.XLS for Python, this is done using the Workbook class.

`python
from spire.xls import *
from spire.xls.common import *
import os

Ensure the output directory exists

output_dir = "output_images"
if not os.path.exists(output_dir):
os.makedirs(output_dir)

Create a Workbook object

workbook = Workbook()

Load the Excel file

excel_file_path = "sample_with_images.xlsx" # Replace with your Excel file path
workbook.LoadFromFile(excel_file_path)

print(f"Successfully loaded Excel file: {excel_file_path}")
`

Step 2: Iterate Through Worksheets

An Excel file may contain multiple worksheets, and images may be distributed across different sheets. Therefore, we need to iterate through each worksheet in the workbook.

`python

... (continue from the previous step)

for sheet_index in range(workbook.Worksheets.Count):
sheet = workbook.Worksheets.get_Item(sheet_index)
print(f"\nProcessing worksheet: {sheet.Name} (Index: {sheet_index})")
# Image extraction code will go here
`

Step 3: Identify and Extract Images

Spire.XLS for Python provides a Pictures collection for each worksheet, which contains all image objects within that sheet. We can iterate through this collection and use the SaveToImage method to save each image locally.

`python

... (continue from the previous step)

if sheet.Pictures.Count > 0:
for i in range(sheet.Pictures.Count):
picture = sheet.Pictures.get_Item(i)
    # Construct the image save path and filename
    # To avoid filename conflicts, combine the worksheet name and image index
    # You may also attempt to use the image's original name if available

    # Save as PNG format by default (other formats like Jpeg are also supported)
    filepath = os.path.join(output_dir, f"{sheet.Name}_picture_{i}.png")

    # Save the image to the specified path
    picture.SaveToImage(filepath)
    print(f"  Image saved to: {filepath}")
Enter fullscreen mode Exit fullscreen mode

else:
print(f" No images detected in worksheet '{sheet.Name}'.")

Enter fullscreen mode Exit fullscreen mode




Release resources after processing

workbook.Dispose()
print("\nAll images have been successfully extracted!")
`

Complete Code Example:

`python
from spire.xls import *
from spire.xls.common import *
import os

def extract_images_from_excel(excel_path, output_folder="extracted_images"):
"""
Extract all images from an Excel file and save them to the specified folder.

Args:
excel_path (str): Path to the Excel file.
output_folder (str): Target folder for saving extracted images.
"""

Ensure the output directory exists

if not os.path.exists(output_folder):
os.makedirs(output_folder)
print(f"Created output directory: {output_folder}")

Create a Workbook object

workbook = Workbook()

try:
# Load the Excel file
workbook.LoadFromFile(excel_path)
print(f"Successfully loaded Excel file: {excel_path}")

image_count = 0
for sheet_index in range(workbook.Worksheets.Count):
    sheet = workbook.Worksheets.get_Item(sheet_index)
    print(f"\nProcessing worksheet: {sheet.Name} (Index: {sheet_index})")

    if sheet.Pictures.Count > 0:
        for i in range(sheet.Pictures.Count):
            picture = sheet.Pictures.get_Item(i)

            # Construct image filename
            # For better robustness, consider adding timestamps or UUIDs
            filename = f"{sheet.Name}_picture_{i}.png"
            filepath = os.path.join(output_folder, filename)

            # Save the image (you can also use ImageFormat.Jpeg, ImageFormat.Gif, etc.)
            picture.SaveToImage(filepath)
            print(f"  Image saved to: {filepath}")
            image_count += 1
    else:
        print(f"  No images detected in worksheet '{sheet.Name}'.")

print(f"\nAll worksheets processed. A total of {image_count} images were extracted.")
Enter fullscreen mode Exit fullscreen mode

except Exception as e:
print(f"An error occurred while processing the Excel file: {e}")
finally:
# Ensure resources are released
workbook.Dispose()
print("Workbook resources have been released.")

Enter fullscreen mode Exit fullscreen mode




Call the function

if name == "main":
# Create a dummy Excel file (if not exists) for testing
# In real-world scenarios, replace this with your actual Excel file
if not os.path.exists("sample_with_images.xlsx"):
print("Please create an Excel file named 'sample_with_images.xlsx' and insert some images for testing.")
print("Or modify the excel_path parameter in the extract_images_from_excel function.")
else:
extract_images_from_excel("sample_with_images.xlsx", "output_images")
`

Extraction Result Preview:

Result of Extracting Images from Excel with Python

Exception Handling:

In real-world applications, issues such as missing files, corrupted files, or worksheets without images may occur. The above code includes a try...except...finally block to catch potential errors and ensure that resources are properly released, which is considered good programming practice.


Advanced Applications and Considerations

Batch Processing

After encapsulating the logic into a function, you can easily process multiple Excel files in batch mode. Simply iterate through a list of Excel file paths and call the extract_images_from_excel function for each one.

`python

Suppose you have a list of Excel files

excel_files = ["file1.xlsx", "file2.xlsx", "another_data.xlsx"]

for excel_file in excel_files:
print(f"\n--- Starting processing for file: {excel_file} ---")
extract_images_from_excel(
excel_file,
f"output_images_{os.path.splitext(excel_file)[0]}"
)
print(f"--- Finished processing file: {excel_file} ---")
`

Accessing Image Metadata

In addition to saving images, the picture object provides access to image metadata, such as:

  • picture.Left: The horizontal position of the image’s left edge relative to the top-left corner of the worksheet (in points).
  • picture.Top: The vertical position of the image’s top edge relative to the top-left corner of the worksheet (in points).
  • picture.Width: The width of the image (in points).
  • picture.Height: The height of the image (in points).
  • picture.AlternativeText: The alternative text of the image (if any).

You can use this information to record the image’s position in Excel or perform further analysis.

`python

Add this inside the image extraction loop

...

        print(
f" Image position: Left={picture.Left}, Top={picture.Top}, "
f"Width={picture.Width}, Height={picture.Height}"
)
if picture.AlternativeText:
print(f" Alternative text: {picture.AlternativeText}")
Enter fullscreen mode Exit fullscreen mode




...

`

Performance Considerations

For Excel files containing a large number of images or very large file sizes, processing time may increase. While Spire.XLS for Python performs well in terms of efficiency, in extreme cases you may consider:

  • Chunked processing: If memory becomes a bottleneck, try loading and processing only certain worksheets at a time.
  • Optimizing file I/O: Ensure that the output directory is located on high-performance storage.

Common Issues and Solutions

  • File path issues: Ensure that the Excel file path and output folder path are correct and that you have read/write permissions.
  • Missing or incomplete images: In rare cases, if the Excel file itself is corrupted, image extraction may fail.
  • Filename conflicts: When extracting images in batch mode, ensure that generated filenames are unique. You can include the worksheet name, file prefix, or a unique ID to avoid overwriting files.

Conclusion

Through this detailed tutorial, you have learned how to use Python and the Spire.XLS for Python library to efficiently extract images from Excel files. With its intuitive and feature-rich API, Spire.XLS for Python greatly simplifies this otherwise complex data processing task. From loading workbooks to iterating through worksheets and precisely identifying and saving images, each step has been clearly explained and demonstrated with practical code examples.

Automation is the key to improving productivity, and Python excels in this domain. Automating repetitive tasks such as image extraction not only saves valuable time but also reduces human error. We encourage you to apply these techniques in your own projects and explore even more possibilities of Python automation in Excel processing. In the future, Python’s applications in Excel data cleaning, report generation, and data visualization will continue to expand, bringing greater convenience and efficiency to your workflow.

Top comments (0)