DEV Community

Cover image for The 12 Days of Christmas - Programmed vs. Compressed
Jack Beresford
Jack Beresford

Posted on

The 12 Days of Christmas - Programmed vs. Compressed

The famous 'Twelve days of Christmas' song has a cumulative nature, in that every day adds a gift, and therefore a line to the song.
The fact that there are only 12 unique gifts, the numbers 1 through 12, yet about 90 lines in the song, should tempt an exercise to shorten it.

The song can easily to be reproduced in a programming language, with the verbal ingredients as follows:

days = ["first","second","third","fourth",\
        "fifth","sixth","seventh","eighth",\
        "ninth","tenth","eleventh","twelth"]

gifts = ["A partridge in a pear tree.","Two turtle doves","Three French hens",\
         "Four calling birds","Five gold rings","Six geese a-laying",\
         "Seven swans a-swimming","Eight maids a-milking","Nine ladies dancing",\
         "Ten lords a-leaping","Eleven pipers piping","Twelve drummers drumming"]

onThe = "On the " 
gave = " day of Christmas my true love gave to me"
Enter fullscreen mode Exit fullscreen mode

With some simple iteration in Python (working along the days/gifts), we can write the song in a new text file as follows:

with open("12Days.txt", "w") as file:
    for i in range(len(days)):

        # First line of every day
        file.write(f"\n{onThe}{days[i]}{gave}:\n{gifts[i]}\n")

        # Avoid operations for first day
        if i == 0:
            continue

        # iterate until no more gifts to give
        next = i-1
        while next > 0:
            file.write(f"{gifts[next]}\n")
            next -= 1
        file.write(f"And {gifts[next].lower()}\n")
Enter fullscreen mode Exit fullscreen mode

The result is a text file ending like so:

Not bad. But one flaw of this reproduction is that it required understanding of the song's syntax and structure. Compression, which we see in zip files, jpegs, and so on, can 'shorten' without any understanding, as compression algorithms run on independent rules.

You can see this with the zlib compression algorithm, imported from Python https://docs.python.org/3/library/zlib.html. It is highly performant with files like this, as repeated lines can be replaced with 'pointers' to refer back to an original reference (e.g. the first utterance of 'a partridge in a pear tree').

Calling zlib.compress() with the original file will return a bytes object, looking like this:

This text might be incomprehensible, but at a size of 0.31 KB (just 13% of the size of the full song's 2.36 KB), the zlib algorithm shows its power and application for internet libraries and more.
Most importantly, it can decompress(), allowing the file to return to its original format.

To take this one step further, one can try the complete Works of Shakespeare: https://www.gutenberg.org/cache/epub/100/pg100.txt
I got this down to 2MB with a quick attempt, but I'm sure it can be made even smaller!

Top comments (0)