These notes are a companion piece for the video:
Topic: Supplanting Functions
These are the show notes for a video demonstration of changing an existing Python program to add new functions that will supplant some existing function calls.
The purpose of the video is to show an example of doing this in the context of a real and non-trivial program. In this case it happens during an edit of "Foldatry" which is a multi-purpose file and folder tool that I am writing in Python.
Note: all the code on this page is actually "air code", written in a mere text editor as planned code which is then clipped in during the video. I've deliberately left intact the errors that quickly became apparent as development continued.
The Example Situation
From some runs of Foldatry I had the wish that I could know what quantity of data storage was at issue. While there are several places where I'd want to make that change, for this run-through I'll just look at one of its parts - Congruentry.
Note: if you want to inspect the code (as it was when these notes were written), use this specific link:
The Congruentry module is all about comparing two folder locations and determining whether they and all their sub-folders are the same. The main use for this is for confirming that a copy process - done by some other tool - was completed perfectly. While the simplest answer to that is either "Yes, the same" or "No, not the same" in practice when the answer is "No" we want to know something useful about that.
Congruentry is also capable of indicating when one of the two folders is a full subset of the other, in which case it can assemble a list of the additional items in the superset folder.
But, it does not bring back any information about how much file size those differences amount to. One reason that currently does not happen is because it uses three functions from the stock filecmp
library to enact the comparisons - and none of those return any information about file sizes.
- As it happens, for other reasons, I am also writing my own replacement for the stock
filecmp
library, as another module inside Foldatry - but it is not ready yet.
So our target here is that we want to replace use of the stock filecmp
library with calls to a "mock" set of functions, so that:
- for now they will just call the
filecmp
library anyway; - then later they can be changed to call the new module when it has been written (and proven).
Actually we can go one better than that plan, and add an extra change in the sequence:
- initially they will just call the
filecmp
library anyway; - then extensions can be made, so that are the
filecmp
cals, the list of differences can be traversed to collect file size information; - then later again they can be changed to call the new module when it has been written (and proven).
What To Replace
Ok, so where are the calls that we will supplant?
We can easily trace them, because the top of the module congruentry.py
has this import:
import filecmp as im_filecmp
Due to the alias there, the calls will all be like im_filecmp.something
And here they are (also showing the function nesting of their locations);
- first, we have a call that does most of the work:
def congruency_check_trees( cct_p_Path_A, cct_p_Path_B, cct_p_Stringent, cct_p_badwinfname, cct_p_badwinchars, \
def trees_pass( tp_p_Stringent ):
def congruency_check_subtrees( ccs_p_Path_A, ccs_p_Path_B, ccs_p_depth, ccs_p_still_congruent, ccs_p_extra_side ):
dcmp = im_filecmp.dircmp( ccs_p_Path_A, ccs_p_Path_B ) # ? shallow=True
wherein the function called was dircmp
which returned an object dcmp
that gets further interactions afterwards - we'll discuss the details of that later.
- second, we have a call that is used to perform the "stringent" file comparisons of file contents:
def congruency_check_subtrees( ccs_p_Path_A, ccs_p_Path_B, ccs_p_depth, ccs_p_still_congruent, ccs_p_extra_side ):
(fils_match, fils_mismatch, fils_errors) = im_filecmp.cmpfiles(
ccs_p_Path_A, ccs_p_Path_B, dcmp.common_files, shallow=False)
wherein the function called was cmpfiles
which returned three lists of filenames.
- third, we have a call that just compares two specific files:
def congruency_check_subtrees( ccs_p_Path_A, ccs_p_Path_B, ccs_p_depth, ccs_p_still_congruent, ccs_p_extra_side ):
chckd_XAB_congruent = im_filecmp.cmp( pathfile_a, pathfile_b, shallow=False)
wherein the function called was cmp
which returned a boolean.
Note: that part gets done as a followup comparison that copes with filenames that are non-identical - e.g. because of different character encodings in the file systems. For this excercise we can ignore the why of all that.
- fourth, because the Congruentry module provides a direct file vs file comparison, we have another call that just compares two specific files:
def congruentry_files_command( pathfile_a, pathfile_b, p_multi_log):
chckd_congruent = im_filecmp.cmp(pathfile_a, pathfile_b, shallow=False)
wherein the function called was again cmp
which returned a boolean.
So, that was usage of:
-
dircmp
= compare directories, returning an object - note that some comparison actions don't happen until parts of the object are called -
cmpfiles
= compare the files in two directories -
cmp
= compare two specified files
which are therefore the features we need to make supplanting functions. We may as well do these inside the Congruentry module.
The Mocking
Mock function names
We'll need three new functions, Here is my plan:
-
fcmp_for_two_dirs_compare_get_object
to replacedircmp
-
fcmp_for_dirs_compare_files_get_lists
= to replacecmpfiles
-
fcmp_for_two_files_compare_contents_get_bool
= to replacecmp
Now it should be said, that it took a few rounds of thought to get those names. Quite a bit of the following was written with the names being quite different. Eventually, I settled on a prefix fcmp_
for them all and the names to be a sequence of "for this" then "do this" then "get this".
Naive mock up
Time to use the names and start framing them into being Python functions.
def fcmp_for_two_dirs_compare_get_object()
def fcmp_for_dirs_compare_files_get_lists()
def fcmp_for_two_files_compare_contents_get_bool( )
Parameters
Now let's add the the parameters. As my first target is to simply pass-through to the filecmp
library, I'll set out the same parameter sets.
def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True )
def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False )
def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False )
I am cheating however, in only bothering about the parameters I'm currently using in my own code.
Usable Mocks
Let's now add enough code to pass through to the existing calls. I don't even need to bother with local variables and instead just put the calls in the return
lines.
def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True)
return im_filecmp.dircmp( p_path_a, p_path_b, p_shallow )
def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False )
return im_filecmp.cmpfiles( p_path_a, p_path_b, p_lst_common_files, p_shallow)
def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False )
return im_filecmp.cmp(pathfile_a, pathfile_b, shallow=False)
Customised Round One
But, the whole point of this exercise was to enable some changes, so let's do the first parts of how that might work.
For the simple comparison of two files (fcmp_for_two_files_compare_contents_get_bool
), let's make it return whether or not a deep/content comparison was required and done.
For the comparison of just the files in the two directories (fcmp_for_dirs_compare_files_get_lists
), let's return the total file size of those. For this, we'll put two extra functions inside it (pathfile_filesize
and pathfile_filesizes_sum
). And, one of those will, for now, only be a mock function as it will just always return None
- we can work out what the valid Python for doing that is later.
def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True):
r_dcmp = im_filecmp.dircmp( p_path_a, p_path_b, p_shallow )
return r_dcmp
def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False ):
def pathfile_filesize( p_path, p_file):
return None
def pathfile_filesizes_sum( p_path, p_lst_files):
klang = False
r_sum = 0
for i_file in p_lst_files :
i_sum = pathfile_filesize( p_path, i_file)
if not i_sum is None :
r_sum = r_sum + i_sum
else:
# a single failed filesize means the sum is invalid
klang = klang or True
if klang :
r_sum = -1
return r_sum
(r_files_match, r_files_mismatch, r_files_errors) = im_filecmp.cmpfiles( p_path_a, p_path_b, p_lst_common_files, p_shallow)
r_match_size_sum = pathfile_filesizes_sum( p_path_a, r_files_match)
return r_files_match, r_files_mismatch, r_files_errors, r_match_size_sum
def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False ):
r_same_shallow = im_filecmp.cmp(pathfile_a, pathfile_b, True)
if r_same_shallow and p_shallow :
r_same_content = im_filecmp.cmp(pathfile_a, pathfile_b, False)
else:
r_same_content = False
return r_same_shallow, r_same_content
Customised Round Two
In which we extend the functionality of the object method so that it can return the collective sum of the files found to be in common.
As part of this we can lean on the two functions we've already built inside the function fcmp_for_dirs_compare_files_get_lists
but to have those available to both that and the "object" function, we'll bring them to the outside. For clarity we'll add our prefix fcmp_
to their names.
# support functions
def fcmp_pathfile_filesize( p_path, p_file):
return None
def fcmp_pathfile_filesizes_sum( p_path, p_lst_files):
klang = False
r_sum = 0
for i_file in p_lst_files :
i_sum = fcmp_pathfile_filesize( p_path, i_file)
if not i_sum is None :
r_sum = r_sum + i_sum
else:
# a single failed filesize means the sum is invalid
klang = klang or True
if klang :
r_sum = -1
return r_sum
# the replacement functions
def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True):
r_dcmp = im_filecmp.dircmp( p_path_a, p_path_b, p_shallow )
i_files_match = r_dcmp.same_files
r_match_size_sum = fcmp_pathfile_filesizes_sum( p_path_a, i_files_match)
return r_dcmp, r_match_size_sum
def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False ):
(r_files_match, r_files_mismatch, r_files_errors) = im_filecmp.cmpfiles( p_path_a, p_path_b, p_lst_common_files, p_shallow)
r_match_size_sum = fcmp_pathfile_filesizes_sum( p_path_a, r_files_match)
return r_files_match, r_files_mismatch, r_files_errors, r_match_size_sum
def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False ):
r_same_shallow = im_filecmp.cmp(pathfile_a, pathfile_b, True)
if r_same_shallow and p_shallow :
r_same_content = im_filecmp.cmp(pathfile_a, pathfile_b, False)
else:
r_same_content = False
return r_same_shallow, r_same_content
Ready for Implanting
The above has been a "paper exercise" - just written in my programming notes tool rather than in the IDE where I actually work on my Python program. So from here, the exercise will shift into one of pasting this code into there and seeing if it will actually work.
Enacting the Size
So far, we've left the function for getting the file sizes quite unable to actually do that.
def fcmp_pathfile_filesize( p_path, p_file):
return None
As it happens, I already have ome code for doing this, as it was needed in another module.
In the Matchsubtry module we have aline:
size = im_os.path.getsize(fpath)
A quick check confirms that Congruentry has the same library imported:
import os as im_os
As our intended function fcmp_pathfile_filesize
is currently taking two parameters: p_path
and p_file
we will need something to combine them.
A quick read find the place in Congruentry where I already do that:
new_subpath_a = im_osp.join(ccs_p_Path_A, subdir)
So this give me enough to construct the pieces to make the function operative.
def fcmp_pathfile_filesize( p_path, p_file):
i_pathfile = im_osp.join( p_path, p_file)
r_size = im_os.path.getsize( i_pathfile)
return r_size
Note that I didn't go straight to putting that all in the return
line. I like giving myself the option of putitng in some print
statements (or some other "debug" calls) as I go to implement this for the first time.
e.g.
def fcmp_pathfile_filesize( p_path, p_file):
print( p_path, p_file)
i_pathfile = im_osp.join( p_path, p_file)
print( i_pathfile)
r_size = im_os.path.getsize( i_pathfile)
print( r_size)
return r_size
As we're dealing with the file system here, it is wise to not assume that we won't have things going wrong at run time.
The simplest thing is to wrap each file system interaction with a try except
pair.
def fcmp_pathfile_filesize( p_path, p_file):
try:
i_pathfile = im_osp.join( p_path, p_file)
path_ok = True
except:
path_ok = False
try:
r_size = im_os.path.getsize( i_pathfile)
size_ok = True
except:
size_ok = False
if not size_ok :
r_size = None
return r_size
That's actually slightly ambiguous, as it attempts to call getsize
regardless of whether the path+file construction gave an error. I like to have the possible errors kept separate.
def fcmp_pathfile_filesize( p_path, p_file):
# first form the path
try:
i_pathfile = im_osp.join( p_path, p_file)
path_ok = True
except:
path_ok = False
# then get the size
size_ok = False
if path_ok :
try:
r_size = im_os.path.getsize( i_pathfile)
size_ok = True
except:
pass
if not size_ok :
r_size = None
return r_size
Note that there are many ways to code that logic - for various degrees of optimisation and/or being Pythonic. For now, just getting the logic valid and bulletproof will do.
See Also
While this article used work on Foldatry as an example, rather being an article about Foldatry, if you are interested in what kind of thing Foldatry is, you can find it here:
That said, while it's real and I use it myself, I haven't yet gotten around to packaging it or even checking if it works on operating systems other than Xubuntu.
Top comments (0)