DEV Community

loading...

Automate the boring stuff with Python 3's pathlib module.

mekicha profile image Emeka Icha ・5 min read

The Pathlib module was introduced in Python 3.4 with the aim:

to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

In this article, I intend to go through some of these common operations. Towards the end, I show how I keep the downloads folder on my computer clean and organized using the Pathlib module and some cron job. Let us get started.

As per the documentation:

If you’ve never used this module before or just aren’t sure which class is right for your task, Path is most likely what you need. It instantiates a concrete path for the platform the code is running on.

That is understood. Let's import the Path class then.

In [9]: from pathlib import Path

If you are following along with an interactive shell like Ipython or any other, it is usually a good idea to dir the imported class or object to get a clue on the attributes and methods the class provides.

Get the user home directory

# returns PosixPath('/home/mekicha')
# This is equivalent to os.path.expanduser('~')
home_dir = Path.home()

Get a directory in the home directory

# returns PosixPath('/home/mekicha/Downloads')
# Equivalent to: home_dir.joinpath('Downloads')
downloads_dir = home_dir / 'Downloads'

downloads_dir.exists() # True

Now, at this point, you might be thinking: "Is this all there is to this click-baiting article that promised to show me practical examples of using the Pathlib module?". Before you close your tab in fury at the waste of time and bandwith, I offer a calm answer to your urgent question: "No, friend, that is not all".

In fact, if that was all, I would have just pointed you to the documentation where such and more examples abound.
I just provided the examples above as some form of a ground work for the better things ahead(hopefully).
Now, what do you think about the following examples.

Given a directory, calculate the total size of all files

from pathlib import Path

def get_size(directory: Path):
  """ Given a directory, return total size of files in bytes """
  total_size = 0
  for file in directory.iterdir():
     total_size += file.stat().st_size  # in bytes
  return total_size

if __name__ == '__main__':
  # Let's get the size of my home directory
  directory = Path.home()
  size_in_byte = get_size(directory)

  # Nice to have: convert the size to megabyte or gigabyte

  mega = 1 << 20
  giga = 1 << 30 
  if size_in_byte / giga < 1:
    if size_in_byte / mega < 1:
      print(f'file size = {size_in_byte} bytes')
    print(f'file size = {size_in_byte / mega} megabytes')
  print(f'file size = {fize_in_byte / giga} gigabytes'

Given a directory, return the file with the biggest size

def get_max_file(directory: Path):
  return max((f.stat().st_size, f) for f in directory.iterdir())

# In the same vein, let's get last modified file in a directory

def get_last_modified(directory: Path):
  return max((f.stat().st_mtime, f) for f in directory.iterdir())

Given a directory, return a list of subdirectories in the directory

def get_dir_list(directory: Path):
  return [child for child in directory.iterdir() if child.is_dir()]

Organizing my computer's download folder

If you have read this far, hopefully, this is the beginning of what you came for.

My download folder used to be a mess. Files of different formats a lie side by side with one another in a helpless disorderly manner, each one crying for my attention, and perhaps feeling neglected when I did not click on them. Not anymore. Today, help has come their way.

I am going to put the files in their places, where they should belong. Images should go to the images sub-folder, documents(think pdf, doc, xls) should live in the documents folder and it is right for zipped files to dwell in the zipped folder. I will assign a cron agent to do the job.

The cron agent would be setup to run once every day, at a time when I should be asleep.

Here is the code:

#!usr/bin/env python3

import time
import logging
from pathlib import Path

logging.basicConfig(level=logging.DEBUG)

logger = logging.getLogger(__file__)


def organize_directory(directory: Path):
    """ Take a path to a directory and organizes the files there
        into subdirs.
     """

    sub_dirs = ['images', 'docs', 'zipped','media', 'others']
    images = ['.png', '.jpeg', '.jpg', '.gif']
    zipped = ['.zip', '.gz', 'tgz']
    docs = ['.doc', '.docx', '.pdf', '.xls', '.xlxs', '.PPT', '.txt']
    media = ['.mp4', '.mkv', '.avi', '.mp3']

    # create base subdirs.
    # Map each subdir to its subdir path
    dir_map = {}
    for subdir in sub_dirs:
        subdir_path = directory.joinpath(subdir)
        subdir_path.mkdir(parents=True, exist_ok=True)
        dir_map[subdir] = subdir_path

    start_time = time.monotonic()
    for child in directory.iterdir():
        child_name = child.name
        if child.is_file():
            if child.suffix in images:
                target = dir_map['images'].joinpath(child_name)
                child.replace(target)
                logger.info(f'moved {child} to images folder')
            elif child.suffix in docs:
                target = dir_map['docs'].joinpath(child_name)
                child.replace(target)
                logger.info(f'moved {child} to docs folder')
            elif child.suffix in zipped:
                target = dir_map['zipped'].joinpath(child_name)
                child.replace(target)
                logger.info(f'moved {child} to zipped folder')
            elif child.suffix in media:
                target = dir_map['media'].joinpath(child_name)
                child.replace(target)
                logger.info(f'moved {child} to media folder')
            else:
                target = dir_map['others'].joinpath(child_name)
                child.replace(target)
                logger.info(f'moved {child} to others folder')

    end_time = time.monotonic() - start_time
    logger.info(f'Organizing directory took {end_time} seconds')



if __name__ == '__main__':
    download_dir = Path.home().joinpath('Downloads')

    organize_directory(download_dir)


Again, the documentation entries for iterdir, mkdir and replace are very clear on what they do.

Running the above script gave me this:

INFO:download_folder_agent.py:Organizing directory took 0.01179114996921271 seconds

That's not all. It has also given me a much better organized downloads folder.

Setup the cron job

Setting up the cron job was fairly easy.

Here are the steps I took that gave me what I wanted:

  1. Open the crontab editor by typing: crontab -e
  2. Look up the crontab schedule expression on [crontab-guru][crontab guru]. I choose everyday at 5 am. 0 5 * * *
  3. Adding the following job entry in the file: `0 5 * * * python3 /home/mekicha/agent/download_folder_agent.py >> /home/mekicha/agent/log 2>&1
  4. I saved the changes and exited the file. 5.Voila! Everything was working.

Actually, I first set the schedule to run every minute * * * * *, checked that it was working before I committed the above schedule.

I set the schedule to run every day by 5 am, which is not really ideal. It's not like I am downloading stuff every now and then. But I will leave it at that for now. Feel free to choose whatever seems best to you.

If you are wondering what does this 2>&1 thing mean, this stackoverflow answer was what I read when I was wondering that too.

Okay, I think this is a good place to bring this whole business to an end.
In the next episode of automating the boring stuff, I will show how I keep the downloads folder clean by removing files that have outlived their usefulness.

Thank you!

P.S. This automate the boring stuff article series is an imitation of a book by the same name.

Discussion

pic
Editor guide