Michael Braverman

Posted on Nov 15, 2022

1. Applying Design Patterns in Python: GZIP Decorator

#python #oop #programming #100daysofcode

Understanding design patterns is a powerful skill that any programmer should have in their arsenal. It is as powerful as learning grammar of a spoken language. Although most native speakers go about their lives without giving second thought to how they write and utter sentences, those who learn the grammar of a language possess an advantageous power, regardless of whether they are native speakers or not. Not only does this power translate into a more intricate understanding of a given language, it also translates into an ability to understand universal rules that are common among all spoken languages.

The same analogy applies to Design Patterns in programming languages. And as with any skill, one can really learn something only through practice. So today we will be creating a GZIP compressor by implementing the Decorator Design Pattern, while also learning a few things about bytes, strings and UTF-8 encoding on the side.

The Decorator Design Pattern

The Decorator design pattern is used to dynamically add a new feature to an object without changing its implementation. It differs from inheritance because a new feature is added only to that particular object, not to the entire subclass.

Similar to how decorators in Python are functions that wrap other functions, the Decorator pattern on the other hand, is essentially a class for wrapping other classes and objects.

Creating a Base Class for our Data

Before we proceed compressing strings with GZIP, we can create a helper class that will allow us to manage our data that we will later “decorate”. This is necessary because GZIP works with the bytes (also referred to as a “blob”) type, not with the str type. The class Data will handle the conversion from strings to bytes and vice versa, as well as handle the encoding to and from UTF-8:

# Define a common encoding
ENCODING_TYPE = 'utf-8'

class Data:
    """Data Class
    Stores data in string or bytes. 
    Always returns bytes
    """

    # Decrease memory usage from 48 to 40 bytes
    __slots__ = ('_data')

    def __init__(self, data: str | bytes) -> None:
        self._data = data

    def _get_str(self) -> str:
        """Get string from stored data and convert using the set encoding"""
        if type(self._data) is not str:
            return str(self._data, encoding=ENCODING_TYPE)
        return self._data

    def _get_bytes(self) -> bytes:
        """Get bytes from stored data and convert using the set encoding"""
        if type(self._data) is not bytes:
            return bytes(self._data, encoding=ENCODING_TYPE)
        return self._data

    @property
    def text(self) -> str:
        return self._get_str()

    @text.setter
    def text(self, value: str | bytes) -> None:
        self._data = value

    @property
    def blob(self) -> bytes:
        return self._get_bytes()

    @blob.setter
    def blob(self, value: str | bytes) -> None:
        self._data = value

Our Data class is able to handle any data type, whether it is a byte or a str.

Creating an Abstract Class for our GZIP Decorator

As an extra step, we can create an abstract class called DataSourceDecorator. Abstract classes in Python are helpful when we want to define a blueprint for other classes. By defining an abstract base class, you can define a common Application Program Interface(API) for a set of subclasses. In this case, we want to create a class that handles GZIP compression and decompression. By defining a base class for our GZIP class, we can have the ability to extend how we want to manipulate our data later on. For example, we could later add a class that can handle bz2, lzma, zlib, tar, or zip compression — or perhaps, encrypt and decrypt our data.

from abc import ABC, abstractmethod

class DataSourceDecorator(ABC):
    """Abstract Data Source Decorator"""

    @abstractmethod
    def __init__(self, data: Data) -> None:
        self._data = data

    @abstractmethod
    def _compress(self, blob: bytes) -> None:
        ...

    @abstractmethod
    def _inflate(self, blob: bytes) -> None:
        ...

    def read(self) -> Data:
        return self._data

Regardless of our case, we can define __init__(), _compress(), and _inflate() abstract methods. By specifying @abstractmethod for a given method, we tell Python’s interpreter that this method can only be used once it is redefined in a child class. This makes sense because, for every compression algorithm that we would later want to implement, we have to define the various libraries that we might have to be initialized, as well as define the diverging inflation and compression steps. While the read() method is likely to be used everywhere the same way regardless of the compression algorithm we decide to implement and therefore does not have to be abstract.

Implementing GZIP

Python’s standard library has an excellent built-in gzip module that we will be using for implementing out GZIPDecorator class based on the DataSourceDecorator base class.

import gzip

class GZIPDecorator(DataSourceDecorator):
    """GZIP Compression Decorator"""

    def __init__(self, data: Data, compressed: bool = False, compresslevel: int = 9) -> None:
        if type(data) is Data:
            blob = data.blob
        else:
            raise TypeError(f'Unsupported type "{type(data)}"')

        self._compresslevel = compresslevel

        if compressed:
            self._inflate(blob)
        else:
            self._compress(blob)

    def _compress(self, blob: bytes):
        self._data = Data(gzip.compress(blob, self._compresslevel))

    def _inflate(self, blob: bytes):
        self._data = Data(gzip.decompress(blob))

In our GZIPDecorator class, we redefined our __init__() method so that we can specify whether the data we want to wrap is compressed or inflated, set the compression level (from 0 to 9), and require that the input data is of a Data type so that we can comfortably use the bytes of our data.

Seeing our Decorator in action

Now that we have our classes in place, we can test whether our compression decorator works as intended:

# Create a long string with duplicated data
our_data = Data('Our Data' * 1000)
print(f'Uncompressed size:\t{len(our_data.blob)}')

# Compress our data
compressed_data = GZIPDecorator(our_data)
print(f'Compressed size:\t{len(compressed_data.read().blob)}')

# Calculate compression ratio
ratio = len(our_data.blob) / len(compressed_data.read().blob)
print(f'Compressed by:  \t{ratio:.2f}X!')

# Test decompressed data
decompressed_data = GZIPDecorator(compressed_data.read(), True)
status = 'SUCCESS' if decompressed_data.read().blob == our_data.blob else 'FAILURE'
print(f'Decompression   \t{status}')

And we get the following result:

Uncompressed size:      8000
Compressed size:        57
Compressed by:          140.35X!
Decompression           SUCCESS

On a set of data that has a 8 byte-long string duplicated 1000 times we got a 140X compression ratio. Not too bad!

Conclusion

The Decorator can be a helpful pattern in the Python language. This example demonstrated how the GZIPDecorator class can effectively be used to wrap our Data class without changing its underlying functionality. By using the same pattern, we then create other types of decorator classes that could manipulate our Data class.

Code is available on GitHub.

DEV Community

1. Applying Design Patterns in Python: GZIP Decorator

The Decorator Design Pattern

Creating a Base Class for our Data

Creating an Abstract Class for our GZIP Decorator

Implementing GZIP

Seeing our Decorator in action

Conclusion

Top comments (0)

Read next

AI Model Achieves 64% Accuracy in Detecting Pronunciation Errors Using New HMamba Architecture

Careers in cloud computing

Million-Token AI Now Runs on Regular GPUs: New Method Slashes Memory Use by 8x

Advanced AI System Learns Complex Patterns Across Multiple Data Streams Over Time