The Topic
Are you confused by Python functions that return functions and the places and ways that Python requires you to understand that? If so, perhaps this description of one example use case may help you.
Rationale
This article was written this as a note-to-self - and if anyone finds it helpful then that's good too. Some opinions will be expressed that hopefully are relevant to the code that's being discussed.
The Example
This example comes from a very small part of the process of writing the Python program Foldatry. However, that program is merely incidental for this article. It is however, a real example, from a real application that is being written and used.
Dictionary of Single Value Items
There is a part of Foldatry where it traverses a folder tree and noting the file extensions - the dot-something at the end of the name - as it goes.
To store those, it created a dictionary, where each new extension became a new key, and the value at the key was the count of files it had seen with that extension.
Thus, after the traversal was done, and the dictionary built, it was desired to output the findings, in order from the extension with the most files found, to the least found.
- Something to note about Python and dictionaries, is that its idea of how the keys are has changed. In earlier Python the .keys() list was either considered unsorted or would be in the order that the keys were added. Perhaps to remove the ambiguity and/or to match the under-the-hood implementation of Python, that list is now defined as being tp preserve their order of insertion.
So, with the dictionary collected and held as the variable i_dct_counts
- if we didn't care at all about the order to print them, we could do:
for k_ext in i_dct_counts.keys() :
print( i_dct_counts[ k_ext ] )
- for the Python pedants, yes, this is overlooking that iterating over the dictionary inherently goes through the keys - without explictly calling on
.keys()
Most likely though, what a user wants to see, is which extension accounts for the most files, so to do that we will want to re-sort the list of keys - to be in descending order by the discovered counts.
To do that, we can use a generic sorted
function that Python offers. The code will then look like:
lst_keys_sorted = sorted( i_dct_counts, key=i_dct_counts.get)
for k_ext in lst_keys_sorted :
print( i_dct_counts[ k_ext ] )
Frankly that kind of thing - the use of sorted
is easy to look up online and plug into place - and indeed that's what was done here. In doing so, note that the sorted
function needed to be passed the slighlty non-obvious thing: i_dct_counts.get
but it clearly worked so all was good.
Dictionary of Tuples
For a revision of the program, it was decided to have the dictionary also hold information about the sizes of the files it found. To that end a "named tuple" was made to be used as the new data item.
Here's the definition for it:
tpl_Extn_Info = namedtuple('tpl_Extn_Info', 'ei_Count ei_SizeSum ei_SizeMin ei_SizeMax')
for which you can ignore the Pythonist mechanics and just see it as a non-simple data type to hold named elements of:
-
ei_Count
- to hold the counts of files -
ei_SizeSum
- to hold the total size of files encountered -
ei_SizeMin
- to hold the largest size of file encountered -
ei_SizeMax
- to hold the smallest size of file encountered
where these are all things that could just be updated as files are encountered during the traversal, and are meaningful at the end of the traversal.
This change required some other bits of code to handle creating and updating these tuples during the folder tree travervals - but those don't matter for this tutorial, where we will merely assume it resuls in a suitably created dictionary of tuples.
But what this did change is what happens where this line executes:
lst_keys_sorted = sorted( i_dct_counts, key=i_dct_counts.get)
Because unlike before, the part key=i_dct_counts.get
no longer tells the sorted
function how to get a simple value that can control the sorting - and instead that will now deliver a tuple. This causes an error at run time, from inside the sorted
function - because it "gets" something that has no defined comparison operation.
This means needing to have to done one of two things:
- have our dictionary item be a kind of thing that is inherently sortable ;
- provide a better function to
sorted
than the.get
Here we will ignore the first of those options - partly because it just wasn't the implemented resolution, but also because requires tackling a different aspect of how Python operates - and would thus be a tutorial about something else.
So the question then, became: what kind of function needs to be provided to sorted
? And how do we make such a thing?
Function that returns a Function
It is very tempting to show the blundering steps that was taken to work this out. It didn't take long - maybe 20 or 30 minutes - but those steps are harder to write about.
- And to be very clear, this is certainly not claiming this is the best way to do this.
So let's jump to the code, and then talk it through.
tpl_Extn_Info = namedtuple('tpl_Extn_Info', 'ei_Count ei_SizeSum ei_SizeMin ei_SizeMax')
def ei_Count_of( p_tpl_Extn_Info ):
return p_tpl_Extn_Info.ei_Count
def ei_Count_at_key( p_dct_tpl_Extn_Info, p_key ):
return ei_Count_of( p_dct_tpl_Extn_Info[ p_key ] )
def fn_ei_Count_at_key( p_dct_tpl_Extn_Info ):
def fn_ei_Count_of_dct_at_key( p_key ):
return ei_Count_at_key( p_dct_tpl_Extn_Info, p_key )
return fn_ei_Count_of_dct_at_key
The first function is simple - ei_Count_of
when passed a tuple, will return its ei_Count
value. Of course, when you have a particular tuple item the code for this is so trivial as to not be worth writing a function - but we know our goal here is to have a function we can quote. At first glance, this can seem to be the function required by sorted
but cannot work because there is no way to tell sorted
about this function as operating on the specific dictionary that it is dealing with.
The next function ei_Count_at_key
is one that can be passed two things, the dictionary and the key, and will then return the count for that key. At first glance, this can seem to be the function required by sorted
but cannot work because sorted
needs to given a function that only takes the key as a parameter.
Finally we have the tricky bit. This is a function with a nested function. While that idea is not itself obscure, the reason for doing this is different to the reason we often do this kind of nesting - because here the reason for the inner function is so that it rather than what it does at execution time, is what the outer function returns.
Because, yes, when we call fn_ei_Count_at_key
- passing it the dictionary we have in mind - what we get back is a function - and notably is a function that is customised for just that dictionary.
The reason this works is perhaps subtle - it is because the inner function makes use of the parameter passed to the outer function, so that it (the inner function) does not need the dictionary as a parameter. This makes the inner function - customised by the outer call - to be the kind of function that sorted
needs to be told to use.
And here is how that new function gets used for supplying to the sorted
function.
lst_keys_sorted = sorted( i_dct_counts, key=fn_ei_Count_at_key( i_dct_counts), reverse=True )
for k_ext in lst_keys_sorted :
print( i_dct_counts[ k_ext ] )
A major part of understanding what happens at run-time is that when the above line is executed, the key=
calling clause derives a function for the specific dictionary and then passed that in to the sorted
function for it to use as it iterates through the dictionary.
A lesser part to be clear about there, is that when it does that iteration, it will iterate through the keys of the dictionary but operate on the items of the dictionary. Depending on your viewpoint, that is either obvious, or a subtle thing about handling dictionaries. It is worth noting that sorted
is quite generic, and operates on anything (or almost anything?) that is iterable.
Process of Discovery and Aftermath Options
A reality of those functions, is that they represent the steps of building a way to having a function to pass to the sorted
function. What is needed is a function, not to execute immediately in the sorted
call, but for it to use as it iterates through the dictionary. Hence started by writing functions - rather than object de-references.
Indeed the meta-function approach arrived from realising that sorted
was not happy to be given the function ei_Count_at_key
or even a use of ei_Count_of
combined with the get
from the simpler non-tuple method.
Having solved the problem - of how to get a suitably sorted list of keys - the functions have been left in place. But should they now be revised to be more Pythonic, perhaps even to use a lambda
in the sorted
line?
Perhaps as written makes it quite clear what is going on. In terms of performance, there isn't much concern because the set of extensions is generally small - often finding between 5 and 100 discovered extension in typical usage.
Also, consider that maybe:
- code that is more brief can be problematic by depending on deeper understanding of Python at run-time;
- there isn't a problem with functions that are constructed but only used once - i.e. that this is not a good enough reason to use nameless methods, such as
lambda
Top comments (1)
p.s. the specific commit that this corresponds to is: this one