DEV Community

Cover image for Extract Data from zip folder using Python
Jagroop Singh
Jagroop Singh

Posted on

Extract Data from zip folder using Python

Problem Statement :

We have a zip folder link which we need to extract, keep in our directory, and then load and visualise.

Solution :

Required python modules:

  • requests : this module allows you to send HTTP requests using Python.
  • zipfile : this module provides tools to create, read, write, append, and list a ZIP file.
  • pathlib : this module enables us to handle file and folder paths in a modern way
  • os : provides a portable way of using operating system dependent functionality

Data Link :

We're utilising this zip folder.


Step 1: Import required modules of python

import requests
import zipfile
from pathlib import Path
import os
Enter fullscreen mode Exit fullscreen mode

If you're using Google Colab, there's no need to manually install the packages. But, if you're doing it in your code editor, use pip install package-name


Step 2: Setup Path and download the folder

In our zip file, there are image folders for pizza, steak, and sushi, and each one contains images corresponding to its name.

# setup path to a data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"
Enter fullscreen mode Exit fullscreen mode

In our zip file, there are image folders for pizza, steak, and sushi, and each one contains images corresponding to its name.

# If the image folder doesn't exist, download it and preprare it ..

if image_path.is_dir():
  print(f"{image_path} directory already exists.. skipping download")
else:
  print(f"{image_path}  does not exist, creating one...")
  image_path.mkdir(parents=True,exist_ok=True)


Enter fullscreen mode Exit fullscreen mode

Let's read the file and write it in our specified path :

# Download pizza, steak and shush data
with open (data_path/ "pizza_steak_sushi.zip","wb") as f:
  request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
  print(f"Downloading pizza, steak, sushi data...")
  f.write(request.content)
Enter fullscreen mode Exit fullscreen mode

Step 3 : Unzip the downloaded zip folder

# Unzip pizza, steak, sushi data
with zipfile.ZipFile(data_path/"pizza_steak_sushi.zip","r") as zip_ref:
  print("Unzipping data ...")
  zip_ref.extractall(image_path)
Enter fullscreen mode Exit fullscreen mode

The zip file has been successfully unzipped, and you can now examine the data contained in the folders.

That's all in this blog. 📝✨

Top comments (0)