Discussion on: Folder Organizer Using Python and Tkinter

View post

Replies for: Cool, and thanks for sharing this. I also found myself with a particular need for a file tool and took the path of writing one in Python and turne...

Hi geraldew, thanks for your comment!

Yes, you are right for match case to work, you need to have python 3.10 at least.

However, you can also download the zip file with the executable only here if you just want to use the program. In that case, it should work without issues.

Do let me know if you find any issue when using it so I can check that out.

I also had a look at your repository, and it seems like an excellent app with much more functionalities than mine haha.

Thanks again!

geraldew • Jul 3 '22

Well, I didn't feel like changing my Python version (that's a whole other topic) so I figured I'd just rewrite the match case construct to the older style.

Except, when I noticed all the logical Or symbols I was prompted to re-imagine how I would handle that sort of thing. I decided that I preferred the run-time handling to be just looking into a dictionary rather than doing a cascade of boolean comparisons.

So after some looking at other code I'd written here's what I came up with.

The prep work is to define a look-up resource - implemented as instructions for making two layers of dictionary.

from enum import Enum, unique, auto

@unique
class TypeOfFile(Enum):
    Document = auto() 
    Audio = auto() 
    Video = auto() 
    Picture = auto() 
    Executable = auto()
    Graphic2D = auto()
    Graphic3D = auto()
    Font = auto()
    Text = auto()
    Compressed = auto()
    DiskImage = auto()
    MobilePhone = auto()
    Databases = auto()

def SubFolder_For_TypeOfFile( tof):
    if tof in [ TypeOfFile.Document ]:
        return '/Documents/'
    elif tof in [ TypeOfFile.Audio ]:
        return '/Audio Files/'
    elif tof in [ TypeOfFile.Video ]:
        return '/Video Files/'
    elif tof in [ TypeOfFile.Picture ]:
        return '/Images/'
    elif tof in [ TypeOfFile.Executable ]:
        return '/Executable Files/'
    elif tof in [ TypeOfFile.Graphic2D ]:
        return '/Graphic Files/'
    elif tof in [ TypeOfFile.Graphic3D ]:
        return '/3D Graphics/'
    elif tof in [ TypeOfFile.Font ]:
        return '/Font Files/'
    elif tof in [ TypeOfFile.Text ]:
        return '/Text Files/'
    elif tof in [ TypeOfFile.Compressed ]:
        return '/Compressed Files/'
    elif tof in [ TypeOfFile.DiskImage ]:
        return '/Disk Images/'
    elif tof in [ TypeOfFile.MobilePhone ]:
        return '/Mobile Phone Related Files/'
    elif tof in [ TypeOfFile.Databases ]:
        return '/Databases Files/'

def Extensions_For_TypeOfFile( tof):
    if tof in [ TypeOfFile.Document ]:
        return [ '.abw','.aww','.chm','.cnt','.dbx','.djvu','.doc','.docm','.docx','.dot','.dotm','.dotx','.epub','.gp4','.ind','.indd','.key','.keynote','.mht','.mpp','.odf','.ods','.odt','.opx','.ott','.oxps','.pages','.pdf','.pmd','.pot','.potx','.pps','.ppsx','.ppt','.pptm','.pptx','.prn','.ps','.pub','.pwi','.rtf','.sdd','.sdw','.shs','.snp','.sxw','.tpl','.vsd','.wpd','.wps','.wri','.xps','.numbers','.ods','.sdc','.sxc','.xls','.xlsm','.xlsx' ]
    elif tof in [ TypeOfFile.Audio ]:
        return [ '.3ga','.aac','.aiff','.amr','.ape','.arf','.asf','.asx','.cda','.dvf','.flac','.gp4','.gp5','.gpx','.logic','.m4a','.m4b','.m4p','.midi','.mp3','.ogg','.opus','.pcm','.rec','.snd','.sng','.uax','.wav','.wma','.wpl','.zab' ]
    elif tof in [ TypeOfFile.Video ]:
        return [ '.264','.3g2','.3gp','.ard','.asf','.asx','.avi','.bik','.dat','.dvr','.flv','.h264','.m2t','.m2ts','.m4v','.mkv','.mod','.mov','.mp4','.mpeg','.mpg','.mts','.ogv','.prproj','.rec','.rmvb','.swf ','.tod','.tp','.ts','.vob','.webm','.wlmp','.wmv' ]
    elif tof in [ TypeOfFile.Picture ]:
        return [ '.bmp','.cpt','.dds','.dib','.dng','.emf','.gif','.heic','.ico','.icon','.jpeg','.jpg','.pcx','.pic','.png','.psd','.psdx','.raw','.tga','.thm','.tif','.tiff','.wbmp','.wdp','.webp' ]
    elif tof in [ TypeOfFile.Executable ]:
        return [ '.air','.app','.application','.appx','.bat','.bin','.com','.cpl','.deb','.dll','.elf','.exe','.jar','.js' ]
    elif tof in [ TypeOfFile.Graphic2D ]:
        return [ '.abr','.ai','.ani','.cdt','.djvu','.eps','.fla','.icns','.ico','.icon','.mdi','.odg','.pic','.psb','.psd','.pzl','.sup','.vsdx','.xmp' ]
    elif tof in [ TypeOfFile.Graphic3D ]:
        return [ '.3d','.3ds','.c4d','.dgn','.dwfx','.dwg','.dxf','.ipt','.lcf','.max','.obj','.pro','.skp','.stl','.u3d','.x_t' ]
    elif tof in [ TypeOfFile.Font ]:
        return [ '.eot','.otf','.ttc','.ttf','.woff' ]
    elif tof in [ TypeOfFile.Text ]:
        return [ '.1st','.alx','.application','.asp','.csv','.htm','.html','.log','.lrc','.lst','.md','.nfo','.opml','.plist','.reg','.rtf','.srt','.sub','.tbl','.text','.txt','.xml','.xmp','.xsd','.xsl','.xslt','.ini' ]
    elif tof in [ TypeOfFile.Compressed ]:
        return [ '.001','.002','.003','.004','.005','.006','.007','.008','.009','.010','.7z','.7z.001','.7z.002','.7z.003','.7z.004','.7zip','.a00','.a01','.a02','.a03','.a04','.a05','.ace','.air','.appxbundle','.arc','.arj','.bar','.bin','.c00','.c01','.c02','.c03','.cab','.cbr','.cbz','.cso','.deb','.dlc','.gz','.gzip','.hqx','.inv','.isz','.jar','.msu','.nbh','.pak','.part1.exe','.part1.rar','.part2.rar','.pkg','.pkg','.r00','.r01','.r02','.r03','.r04','.r05','.r06','.r07','.r08','.r09','.r10','.rar','.rpm','.sit','.sitd','.sitx','.tar','.tar.gz','.tgz','.uax','.vsix','.webarchive','.z01','.z02','.z03','.z04','.z05','.zab','.zip','.zipx' ]
    elif tof in [ TypeOfFile.DiskImage ]:
        return [ '.000','.ccd','.cue','.daa','.dao','.dmg','.img','.img','.iso','.mdf','.mds','.mdx','.nrg','.tao','.tc','.toast','.uif','.vcd' ]
    elif tof in [ TypeOfFile.MobilePhone ]:
        return [ '.apk','.asec','.bbb','.crypt','.crypt14','.ipa','.ipd','.ipsw','.lqm','.mdbackup','.nbh','.nomedia','.npf','.pkpass','.rem','.rsc','.sbf','.sis','.sisx','.spd','.thm','.tpk','.vcf','.xap','.xapk' ]
    elif tof in [ TypeOfFile.Databases ]:
        return [ '.accdb','.accdt','.csv','.db','.dbf','.fdb','.gdb','.idx','.mdb','.mdf','.sdf','.sql','.sqlite','.wdb' ]

def make_ext_lookups():
    # make a lookup dictionary by TypeOfFile with each one's subfolder name
    dct_filetype_subfolder = {}
    for tof in TypeOfFile:
        dct_filetype_subfolder[ tof ] = SubFolder_For_TypeOfFile( tof)
    # make the extension list
    dct_extensions = {}
    for tof in TypeOfFile:
        for ext in Extensions_For_TypeOfFile( tof):
            if ext in dct_extensions:
                print( "Ignoring multiple use of " + ext + " for " + SubFolder_For_TypeOfFile( tof) + " is already in " +  dct_filetype_subfolder[ dct_extensions [ ext ] ])
            else:
                dct_extensions[ ext ] = tof
    return dct_filetype_subfolder, dct_extensions

I used an enumeration as the link between the two - in effect this is a translation of your various case groups.

Then, to the top of:

def organize(directory):

I added a line:

    dct_filetype_subfolder, dct_extensions = make_ext_lookups()

so that constructs an instance of the nested dictionaries.

Then, instead of your match structure, I do:

        # replace use of match with a two-level dictionary lookup
        if ext in dct_extensions:
            move_files( directory, file, dct_filetype_subfolder[ dct_extensions [ ext ] ] )
        else:
            move_files(directory, file, '/Others/')

By the way, inside def make_ext_lookups(): I added a check to tell me if I'd miskeyed when I adapted the extension lists. This was done with if ext in dct_extensions: and the print that it does.

I wasn't actually expecting that to show anything, but as it happened, it did - printing the following:

Ignoring multiple use of .ods for /Documents/ is already in /Documents/
Ignoring multiple use of .gp4 for /Audio Files/ is already in /Documents/
Ignoring multiple use of .asf for /Video Files/ is already in /Audio Files/
Ignoring multiple use of .asx for /Video Files/ is already in /Audio Files/
Ignoring multiple use of .rec for /Video Files/ is already in /Audio Files/
Ignoring multiple use of .djvu for /Graphic Files/ is already in /Documents/
Ignoring multiple use of .ico for /Graphic Files/ is already in /Images/
Ignoring multiple use of .icon for /Graphic Files/ is already in /Images/
Ignoring multiple use of .pic for /Graphic Files/ is already in /Images/
Ignoring multiple use of .psd for /Graphic Files/ is already in /Images/
Ignoring multiple use of .application for /Text Files/ is already in /Executable Files/
Ignoring multiple use of .rtf for /Text Files/ is already in /Documents/
Ignoring multiple use of .xmp for /Text Files/ is already in /Graphic Files/
Ignoring multiple use of .air for /Compressed Files/ is already in /Executable Files/
Ignoring multiple use of .bin for /Compressed Files/ is already in /Executable Files/
Ignoring multiple use of .deb for /Compressed Files/ is already in /Executable Files/
Ignoring multiple use of .jar for /Compressed Files/ is already in /Executable Files/
Ignoring multiple use of .pkg for /Compressed Files/ is already in /Compressed Files/
Ignoring multiple use of .uax for /Compressed Files/ is already in /Audio Files/
Ignoring multiple use of .zab for /Compressed Files/ is already in /Audio Files/
Ignoring multiple use of .img for /Disk Images/ is already in /Disk Images/
Ignoring multiple use of .nbh for /Mobile Phone Related Files/ is already in /Compressed Files/
Ignoring multiple use of .thm for /Mobile Phone Related Files/ is already in /Images/
Ignoring multiple use of .csv for /Databases Files/ is already in /Text Files/
Ignoring multiple use of .mdf for /Databases Files/ is already in /Disk Images/

So you might want to check your source code for similar double presences in your case lines.

Anyway, now that I have a variant that runs of my older Python (which will be just whatever is installed on Xubuntu 20.04) - it seems to work nicely.

I don't know that I like "Others" as a folder name for unrecognised things - I'd prefer something either alphabetically before or after all the rest.

Nicolas Agudelo • Jul 3 '22

Hi geraldew,

It took me a moment to understand how your solution was working since I'm still learning a lot of new stuff and had not used enum before but after testing for a while with the debugger I understood how you solved the sorting using enum and dictionaries. I wanted to ask you, does this make the code run more efficiently? or was it just your workaround to not use match case? I'm not sure yet how to test when a code is more or less efficient so I would appreciate if you could tell me if there is a difference in performance with one approach or the other.

Regarding the duplicates I did have a look and found out that the source from where I got the extensions list does have some extensions listed in multiple categories which is why your code found the use of the same extension on different categories. I guess I'll have to manually decide to which categories I want those extensions to be sorted out.

Thanks again!

geraldew • Jul 4 '22

Well, I certainly made the change to not use match case.

My main reaction though was because I don't like seeing so much data-like material being hard coded. Where I can, I like to move that kind of thing into a data structure. This often has the advantage of making the code simpler at the point of decision.

But another advantage is that it prepares the ground for maybe loading that decision data from a config file, say from a JSON file. That way, fine tuning of what the program does can be done without rewriting it.

As for which is more "efficient" that might depend on quite what meaning you want for that.

As you had a match case construct, answering that will partly depend on quite how that gets implemented under the hood, i.e. by CPython. Double-guessing (or even checking the CPython source code) seems to be a popular game in some quarters. My personal view is that if that's a worry then its probably time to code in something other than Python. It is an interpreted scripting language after all.

Anyway, I wasn't really comparing to the match case, rather to the complex of if and elif blocks I would need and also the number of Or operators - just to replace what you had.

When I see a lot of Or usage I tend to remember that the speed of the operation becomes quite variable depending on the data. It was that thought of variance that prompted me to think that for each distinct extension, we (the programmers) already know which categorisation should be used. So how do we express that best in Python? Well in short, using a known value to get another known related value is what dictionaries are good at.

I strongly suspect that the dictionary lookup is faster than a lot of cascading conditional logic operators, but it does beg the question of how dictionaries are handled by CPython.

For that matter, I could have constructed a dictionary that directly mapped from extension to sub-folder - rather than the double-dictionary method that I first wrote. Thus are the many, many options of tackling these things.

As for when to construct those dictionaries, that comes down to knowing the scope and lifespan of the program. As a quick thing to do, I put that construction step - that is, calling the function to do it - inside the def organize(directory). But the way your program currently works, it could have been done outside that, thereby only be done once to cover all runs of organize. I was just too lazy to work which way to do that: make it global, pass it in as a control parameter etc.

BTW another thing that the as-data approach enables is having alternate dictionaries to pass to the organise function. For example, there could be some stock but varied combination for the user to select among.

Nicolas Agudelo • Jul 10 '22 • Edited

Hi geraldew,

Sorry for the delay I was offline for a few days, yes I can see what you mean by loading the decision from a file by doing it using data structures instead of hard coding it and re-writing the code each time you want to add or remove something.

As for the efficiency topic I was asking more because right now I'm reading about time and space complexity and was curious if may be this was something that played a role on why you decided to do it like that. If I'm not mistaken python uses a garbage collector so space complexity is out of the hands of the programmer (I think) but I'm still not sure how to know when a program will be more or less time complex.

Thanks for your comments you have shared some really interesting stuff ^^