DEV Community

techocodger
techocodger

Posted on • Edited on

Yet Another Functions of Functions Python Tutorial

The Topic

Are you confused by Python functions that return functions and the places and ways that Python requires you to understand that? If so, perhaps this description of one example use case may help you.

Rationale

This article was written this as a note-to-self - and if anyone finds it helpful then that's good too. Some opinions will be expressed that hopefully are relevant to the code that's being discussed.

The Example

This example comes from a very small part of the process of writing the Python program Foldatry. However, that program is merely incidental for this article. It is however, a real example, from a real application that is being written and used.

Dictionary of Single Value Items

There is a part of Foldatry where it traverses a folder tree and noting the file extensions - the dot-something at the end of the name - as it goes.

To store those, it created a dictionary, where each new extension became a new key, and the value at the key was the count of files it had seen with that extension.

Thus, after the traversal was done, and the dictionary built, it was desired to output the findings, in order from the extension with the most files found, to the least found.

  • Something to note about Python and dictionaries, is that its idea of how the keys are has changed. In earlier Python the .keys() list was either considered unsorted or would be in the order that the keys were added. Perhaps to remove the ambiguity and/or to match the under-the-hood implementation of Python, that list is now defined as being tp preserve their order of insertion.

So, with the dictionary collected and held as the variable i_dct_counts - if we didn't care at all about the order to print them, we could do:

for k_ext in i_dct_counts.keys() :
    print( i_dct_counts[ k_ext ] )

Enter fullscreen mode Exit fullscreen mode
  • for the Python pedants, yes, this is overlooking that iterating over the dictionary inherently goes through the keys - without explictly calling on .keys()

Most likely though, what a user wants to see, is which extension accounts for the most files, so to do that we will want to re-sort the list of keys - to be in descending order by the discovered counts.

To do that, we can use a generic sorted function that Python offers. The code will then look like:

lst_keys_sorted = sorted( i_dct_counts, key=i_dct_counts.get) 
for k_ext in lst_keys_sorted :
    print( i_dct_counts[ k_ext ] )

Enter fullscreen mode Exit fullscreen mode

Frankly that kind of thing - the use of sorted is easy to look up online and plug into place - and indeed that's what was done here. In doing so, note that the sorted function needed to be passed the slighlty non-obvious thing: i_dct_counts.get but it clearly worked so all was good.

Dictionary of Tuples

For a revision of the program, it was decided to have the dictionary also hold information about the sizes of the files it found. To that end a "named tuple" was made to be used as the new data item.

Here's the definition for it:

tpl_Extn_Info = namedtuple('tpl_Extn_Info', 'ei_Count ei_SizeSum ei_SizeMin ei_SizeMax')

Enter fullscreen mode Exit fullscreen mode

for which you can ignore the Pythonist mechanics and just see it as a non-simple data type to hold named elements of:

  • ei_Count - to hold the counts of files
  • ei_SizeSum - to hold the total size of files encountered
  • ei_SizeMin - to hold the largest size of file encountered
  • ei_SizeMax - to hold the smallest size of file encountered

where these are all things that could just be updated as files are encountered during the traversal, and are meaningful at the end of the traversal.

This change required some other bits of code to handle creating and updating these tuples during the folder tree travervals - but those don't matter for this tutorial, where we will merely assume it resuls in a suitably created dictionary of tuples.

But what this did change is what happens where this line executes:

lst_keys_sorted = sorted( i_dct_counts, key=i_dct_counts.get) 

Enter fullscreen mode Exit fullscreen mode

Because unlike before, the part key=i_dct_counts.get no longer tells the sorted function how to get a simple value that can control the sorting - and instead that will now deliver a tuple. This causes an error at run time, from inside the sorted function - because it "gets" something that has no defined comparison operation.

This means needing to have to done one of two things:

  • have our dictionary item be a kind of thing that is inherently sortable ;
  • provide a better function to sorted than the .get

Here we will ignore the first of those options - partly because it just wasn't the implemented resolution, but also because requires tackling a different aspect of how Python operates - and would thus be a tutorial about something else.

So the question then, became: what kind of function needs to be provided to sorted ? And how do we make such a thing?

Function that returns a Function

It is very tempting to show the blundering steps that was taken to work this out. It didn't take long - maybe 20 or 30 minutes - but those steps are harder to write about.

  • And to be very clear, this is certainly not claiming this is the best way to do this.

So let's jump to the code, and then talk it through.

tpl_Extn_Info = namedtuple('tpl_Extn_Info', 'ei_Count ei_SizeSum ei_SizeMin ei_SizeMax')

def ei_Count_of( p_tpl_Extn_Info ):
    return p_tpl_Extn_Info.ei_Count

def ei_Count_at_key( p_dct_tpl_Extn_Info, p_key ):
    return ei_Count_of( p_dct_tpl_Extn_Info[ p_key ] )

def fn_ei_Count_at_key( p_dct_tpl_Extn_Info ):
    def fn_ei_Count_of_dct_at_key( p_key ):
        return ei_Count_at_key( p_dct_tpl_Extn_Info, p_key )
    return fn_ei_Count_of_dct_at_key

Enter fullscreen mode Exit fullscreen mode

The first function is simple - ei_Count_of when passed a tuple, will return its ei_Count value. Of course, when you have a particular tuple item the code for this is so trivial as to not be worth writing a function - but we know our goal here is to have a function we can quote. At first glance, this can seem to be the function required by sorted but cannot work because there is no way to tell sorted about this function as operating on the specific dictionary that it is dealing with.

The next function ei_Count_at_key is one that can be passed two things, the dictionary and the key, and will then return the count for that key. At first glance, this can seem to be the function required by sorted but cannot work because sorted needs to given a function that only takes the key as a parameter.

Finally we have the tricky bit. This is a function with a nested function. While that idea is not itself obscure, the reason for doing this is different to the reason we often do this kind of nesting - because here the reason for the inner function is so that it rather than what it does at execution time, is what the outer function returns.

Because, yes, when we call fn_ei_Count_at_key - passing it the dictionary we have in mind - what we get back is a function - and notably is a function that is customised for just that dictionary.

The reason this works is perhaps subtle - it is because the inner function makes use of the parameter passed to the outer function, so that it (the inner function) does not need the dictionary as a parameter. This makes the inner function - customised by the outer call - to be the kind of function that sorted needs to be told to use.

And here is how that new function gets used for supplying to the sorted function.

lst_keys_sorted = sorted( i_dct_counts, key=fn_ei_Count_at_key( i_dct_counts), reverse=True )
for k_ext in lst_keys_sorted :
    print( i_dct_counts[ k_ext ] )

Enter fullscreen mode Exit fullscreen mode

A major part of understanding what happens at run-time is that when the above line is executed, the key= calling clause derives a function for the specific dictionary and then passed that in to the sorted function for it to use as it iterates through the dictionary.

A lesser part to be clear about there, is that when it does that iteration, it will iterate through the keys of the dictionary but operate on the items of the dictionary. Depending on your viewpoint, that is either obvious, or a subtle thing about handling dictionaries. It is worth noting that sorted is quite generic, and operates on anything (or almost anything?) that is iterable.

Process of Discovery and Aftermath Options

A reality of those functions, is that they represent the steps of building a way to having a function to pass to the sorted function. What is needed is a function, not to execute immediately in the sorted call, but for it to use as it iterates through the dictionary. Hence started by writing functions - rather than object de-references.

Indeed the meta-function approach arrived from realising that sorted was not happy to be given the function ei_Count_at_key or even a use of ei_Count_of combined with the get from the simpler non-tuple method.

Having solved the problem - of how to get a suitably sorted list of keys - the functions have been left in place. But should they now be revised to be more Pythonic, perhaps even to use a lambda in the sorted line?

Perhaps as written makes it quite clear what is going on. In terms of performance, there isn't much concern because the set of extensions is generally small - often finding between 5 and 100 discovered extension in typical usage.

Also, consider that maybe:

  • code that is more brief can be problematic by depending on deeper understanding of Python at run-time;
  • there isn't a problem with functions that are constructed but only used once - i.e. that this is not a good enough reason to use nameless methods, such as lambda

Top comments (1)

Collapse
 
techocodger profile image
techocodger

p.s. the specific commit that this corresponds to is: this one