Understanding design patterns is a powerful skill that any programmer should have in their arsenal. It is as powerful as learning grammar of a spoken language. Although most native speakers go about their lives without giving second thought to how they write and utter sentences, those who learn the grammar of a language possess an advantageous power, regardless of whether they are native speakers or not. Not only does this power translate into a more intricate understanding of a given language, it also translates into an ability to understand universal rules that are common among all spoken languages.
The same analogy applies to Design Patterns in programming languages. And as with any skill, one can really learn something only through practice. So today we will be creating a GZIP compressor by implementing the Decorator Design Pattern, while also learning a few things about bytes, strings and UTF-8 encoding on the side.
The Decorator Design Pattern
The Decorator design pattern is used to dynamically add a new feature to an object without changing its implementation. It differs from inheritance because a new feature is added only to that particular object, not to the entire subclass.
Similar to how decorators in Python are functions that wrap other functions, the Decorator pattern on the other hand, is essentially a class for wrapping other classes and objects.
Creating a Base Class for our Data
Before we proceed compressing strings with GZIP, we can create a helper class that will allow us to manage our data that we will later “decorate”. This is necessary because GZIP works with the bytes
(also referred to as a “blob”) type, not with the str
type. The class Data
will handle the conversion from strings to bytes and vice versa, as well as handle the encoding to and from UTF-8:
# Define a common encoding
ENCODING_TYPE = 'utf-8'
class Data:
"""Data Class
Stores data in string or bytes.
Always returns bytes
"""
# Decrease memory usage from 48 to 40 bytes
__slots__ = ('_data')
def __init__(self, data: str | bytes) -> None:
self._data = data
def _get_str(self) -> str:
"""Get string from stored data and convert using the set encoding"""
if type(self._data) is not str:
return str(self._data, encoding=ENCODING_TYPE)
return self._data
def _get_bytes(self) -> bytes:
"""Get bytes from stored data and convert using the set encoding"""
if type(self._data) is not bytes:
return bytes(self._data, encoding=ENCODING_TYPE)
return self._data
@property
def text(self) -> str:
return self._get_str()
@text.setter
def text(self, value: str | bytes) -> None:
self._data = value
@property
def blob(self) -> bytes:
return self._get_bytes()
@blob.setter
def blob(self, value: str | bytes) -> None:
self._data = value
Our Data class is able to handle any data type, whether it is a byte
or a str
.
Creating an Abstract Class for our GZIP Decorator
As an extra step, we can create an abstract class called DataSourceDecorator
. Abstract classes in Python are helpful when we want to define a blueprint for other classes. By defining an abstract base class, you can define a common Application Program Interface(API) for a set of subclasses. In this case, we want to create a class that handles GZIP compression and decompression. By defining a base class for our GZIP class, we can have the ability to extend how we want to manipulate our data later on. For example, we could later add a class that can handle bz2, lzma, zlib, tar, or zip compression — or perhaps, encrypt and decrypt our data.
from abc import ABC, abstractmethod
class DataSourceDecorator(ABC):
"""Abstract Data Source Decorator"""
@abstractmethod
def __init__(self, data: Data) -> None:
self._data = data
@abstractmethod
def _compress(self, blob: bytes) -> None:
...
@abstractmethod
def _inflate(self, blob: bytes) -> None:
...
def read(self) -> Data:
return self._data
Regardless of our case, we can define __init__()
, _compress()
, and _inflate()
abstract methods. By specifying @abstractmethod
for a given method, we tell Python’s interpreter that this method can only be used once it is redefined in a child class. This makes sense because, for every compression algorithm that we would later want to implement, we have to define the various libraries that we might have to be initialized, as well as define the diverging inflation and compression steps. While the read()
method is likely to be used everywhere the same way regardless of the compression algorithm we decide to implement and therefore does not have to be abstract.
Implementing GZIP
Python’s standard library has an excellent built-in gzip module that we will be using for implementing out GZIPDecorator
class based on the DataSourceDecorator
base class.
import gzip
class GZIPDecorator(DataSourceDecorator):
"""GZIP Compression Decorator"""
def __init__(self, data: Data, compressed: bool = False, compresslevel: int = 9) -> None:
if type(data) is Data:
blob = data.blob
else:
raise TypeError(f'Unsupported type "{type(data)}"')
self._compresslevel = compresslevel
if compressed:
self._inflate(blob)
else:
self._compress(blob)
def _compress(self, blob: bytes):
self._data = Data(gzip.compress(blob, self._compresslevel))
def _inflate(self, blob: bytes):
self._data = Data(gzip.decompress(blob))
In our GZIPDecorator
class, we redefined our __init__()
method so that we can specify whether the data we want to wrap is compressed or inflated, set the compression level (from 0 to 9), and require that the input data is of a Data
type so that we can comfortably use the bytes of our data.
Seeing our Decorator in action
Now that we have our classes in place, we can test whether our compression decorator works as intended:
# Create a long string with duplicated data
our_data = Data('Our Data' * 1000)
print(f'Uncompressed size:\t{len(our_data.blob)}')
# Compress our data
compressed_data = GZIPDecorator(our_data)
print(f'Compressed size:\t{len(compressed_data.read().blob)}')
# Calculate compression ratio
ratio = len(our_data.blob) / len(compressed_data.read().blob)
print(f'Compressed by: \t{ratio:.2f}X!')
# Test decompressed data
decompressed_data = GZIPDecorator(compressed_data.read(), True)
status = 'SUCCESS' if decompressed_data.read().blob == our_data.blob else 'FAILURE'
print(f'Decompression \t{status}')
And we get the following result:
Uncompressed size: 8000
Compressed size: 57
Compressed by: 140.35X!
Decompression SUCCESS
On a set of data that has a 8 byte-long string duplicated 1000 times we got a 140X compression ratio. Not too bad!
Conclusion
The Decorator can be a helpful pattern in the Python language. This example demonstrated how the GZIPDecorator
class can effectively be used to wrap our Data
class without changing its underlying functionality. By using the same pattern, we then create other types of decorator classes that could manipulate our Data
class.
Code is available on GitHub.
Top comments (0)