Our law firm recently switched to a new client management system and, as part of the migration from our old system, we had to move the contents of clients' files from the old folder to a new folder where the new system would point to. In our case, the old client folders and new folders were placed in the same directory by the migration team, but the contents still needed to be moved. Additionally, while the old folders were named after a client's account number, the new folders had the account number and the client's name. Here's a fake, but representative, example of the parent directory:
I have been wanting to explore Python recently and thought this would be a great opportunity to learn.
To start, I needed to import some modules:
import shutil
import os
import re
from pathlib import Path
Next, I needed the path to the parent directory of the client folders; in my case, "Test Directory":
source = 'Users/johnnyturco/Downloads/Test Directory'
Note: This is a string of a macOS path; if you're on Windows, you will need to use the appropriate path with back slashes (\
). Also, \
is a special character in Python, so you need to "escape" it with another \
. To make this platform-independent, you could use os.path.join
. Because only I will be running this script and it will be on a Mac, I decided to keep my code a little more readable and use a string for my source path.
I also needed a list of the directories inside of the parent source:
directories = os.listdir(source)
Because the source directory contained both the old (e.g., "2020-0123") and new folders (e.g., "2020-0123 - Smith, John"), I needed a way to target just one of a client's folders. I figured if I targeted the new folder naming convention with a regular expression, I could easily derive the corresponding old folder's name with a regular expression group:
folder_name_regex = re.compile(r'(^\d{4}-\d{4}) - .*')
For a quick primer on what regular expressions are, check out my blog post.
Putting all that together with the for-loop:
for directory in directories:
if folder_name_regex.search(directory):
account_number = folder_name_regex.search(directory).group(1)
dir_source = source + account_number
dir_target = source + directory
files = os.listdir(dir_source)
for file in files:
shutil.move(os.path.join(dir_source, file),
os.path.join(dir_target, file))
Let me explain what's going on here. For each directory inside the parent source, check if the directory matches a regular expression that would return something like "2020-0123 - Smith, John" (as opposed to its counterpart, "2020-0123"). From the regular expression, get the account number and assign it to a variable. Use that variable to make the path to the old source folder (2020-0123); assign it to dir_source
. Create a path to the new destination, dir_target
(2020-0123 - Smith, John). Finally, get a list of all the files and folders in the source folder (2020-0123), loop over each file/folder, and move it from the source to the target.
Here's all of the code together:
import shutil
import os
import re
from pathlib import Path
source = '/Users/johnnyturco/Downloads/Test Directory/'
directories = os.listdir(source)
folder_name_regex = re.compile(r'(^\d{4}-\d{4}) - .*')
for directory in directories:
if folder_name_regex.search(directory):
account_number = folder_name_regex.search(directory).group(1)
dir_source = source + account_number
dir_target = source + directory
files = os.listdir(dir_source)
for file in files:
shutil.move(os.path.join(dir_source, file),
os.path.join(dir_target, file))
Top comments (0)