DEV Community

Yuan Gao
Yuan Gao

Posted on

Python's new Protocols (Structural subtyping), Abstract Base Classes, and Factories

I've been exploring Python's new Protocols (Structural subtyping) as per PEP 544 for better options of making factories and abstract factories

Motivation

I'm writing an application that needs to periodically store data in bucket storage. To make it easy to handle this, I am writing methods for upload(), download() and a few other tasks that are needed. A trivial example would look like this:

# upload some data
upload(data, path_to_upload)

# download some data
data = download(path_to_download)
Enter fullscreen mode Exit fullscreen mode

It's early on in the project, and I expect this storage format to gradually evolve as the project requirements changes. So I, or someone else, would need to write new versions of these functions with all the new bells and whistles (the actual application has a few more methods needed to pull out different pieces of metadata, but I'm reducing it to just upload/download functions for this post). However, the old data that's still stored in the bucket still needs to be retrievable, so we would leave a copy of these old functions that can read the old data stored in this format.

The naive way

The naive way of handling multiple versions of these functions is to just stick everything inside if statements (assume the version variable is coming from a database entry or some other way of retrieving metadata, like the filename):

# upload some data
if version == "old":
  upload(data, path_to_upload)
elif version == "new":
  new_upload(data, path_to_upload)
else:
  raise NotImplementedError(f"{version=} not supported")

# download some data
if version == "old":
  data = download(path_to_download)
elif version == "new":
  data = new_download(path_to_downlaoad)
else:
  raise NotImplementedError(f"{version=} not supported")
Enter fullscreen mode Exit fullscreen mode

This is positively ugly, and will only get worse as new file handlers are added: whomever is implementing a new version of the file handler would need to sift through the code looking for places to add a new elif to handle a new version, and would also need to figure out exactly what methods they actually need to implement.

Doing better with OOP

Of course we can do better. Using an object immediately removes most of these if-elif:

class FileHandler:
  def download(self, path):
    ...

  def upload(self, data, path):
    ...

class NewFileHandler:
  def download(self, path):
    ...

  def upload(self, data, path):
    ...
Enter fullscreen mode Exit fullscreen mode

By defining a class for each of the file handlers, our application code can now look like this:

# create handler
if version == "old":
  handler = FileHandler()
elif version == "new":
  handler = NewFileHandler()

# upload some data
handler.upload(data, path_to_upload)

# download some data
data = handler.download(path_to_download)
Enter fullscreen mode Exit fullscreen mode

Much cleaner. Now, if we needed to add new handler versions, we just need to add new handler objects, and add a single elif.

However, this still relies on the next person implementing the new handler to figure out what methods are called by the code, and what methods only exist for handling version-specific details. Perhaps NewFileHandler() has a few internal encode() and decode() methods. Does the new file handler need to have these too? what do they do? There's no easy way to tell currently without reading extensive code or documentation.

Doing even better with Abstract Base Class

Instead of writing some documentation that says "new file handlers must have an upload() that takes the data as bytes and path as string for uploading; and a download() that takes the path as string that returns data as bytes", what if we write it in code instead, and use it with some automatic way of checking that all the needed parts are there? That way there's no ambiguity about whether the documentation still matches the code, since this definition of the class is what is used at runtime, and will error out if the implementation was incorrect.

Python has a built-in library for this called abc which stands for Abstract Base Class. The idea is to define an abstract base class for the file handler, against which new concrete implementations of different file handlers can be built. Any errors in implementation of the concrete file handlers will cause errors.

from abc import ABC, abstractmethod

class FileHandlerInterface(ABC)
  @abstractmethod
  def download(self, path):
    raise NotImplementedError

  @abstractmethod
  def upload(self, data, path):
    raise NotImplementedError
Enter fullscreen mode Exit fullscreen mode

With the abstract base class defined, the concrete file handlers inherit from it.

class FileHandler(FileHandlerInterface):
  def download(self, path):
    ...

  def upload(self, data, path):
    ...

class NewFileHandler(FileHandlerInterface):
  def download(self, path):
    ...

  def upload(self, data, path):
    ...
Enter fullscreen mode Exit fullscreen mode

Note: at this point, the application code need not change at all. All we've done is defined a formal interface, the FileHandlerInterface, against which all concrete file handlers are implemented. If there's an error in implementation, or key methods are left out, a NotImplementedError will be raised, informing the developer that there is an implementation issue.

This is great! We've reduced duplicate code, and also added some safety so that implementation errors in the future are reduced! However, we must still actually run the code to tell there was an issue. We won't know until the code is run to see if it errors out. Can we do better?

Doing the bestest with Protocol and static analysis

Since around Python 3.5, we've had type hinting and static analysis. Static analysis is a powerful tool to help catch type and structural errors before having to actually run the code. To use static analysis, we have to first annotate some of our code with types. This may seem counter-intuitive: why are we adding types back into a language that has thus far greatly benefited from being dynamically typed?

That's a complex question and answer to give. It's undeniable that python's flexibility with types is a key part of what makes it a powerful but easy to learn language; yet many mistakes and bugs can be prevented by introducing some types into the code, including the type of bugs that we are trying to prevent using the Abstract Base Class above. So perhaps some type checks are beneficial, particularly at the interface between different modules or components. Fortunately Python gives you the option of adding type annotations to things you need and wish to gain more type safety, while letting you continue without if you don't add type hints.

The actual task of doing static analysis and type checking is left to external applications, and is not handled by the runtime. Much like how pylint can be used to spot syntax and style issues before running the code (static analysis); tools like mypy are used to do type checking before running anything.

PEP 544 introduces Protocols, which are used for structural subtyping. Protocols replace ABCs: where ABCs are run-time checks, Protocols provide static-checking, and can take place before run-time.

Our protocol looks like this:

from typing import Protocol

class Handler(Protocol):
  def download(self, path: str) -> bytes:
    raise NotImplementedError

  def upload(self, data: bytes, path: str):
    raise NotImplementedError

Enter fullscreen mode Exit fullscreen mode

The file handlers no longer need to inherit from the Handler Protocol:

class FileHandler:
  def download(self, path):
    ...

  def upload(self, data, path):
    ...

class NewFileHandler:
  def download(self, path):
    ...

  def upload(self, data, path):
    ...
Enter fullscreen mode Exit fullscreen mode

Our little factory needs an additional type annotation in order for the static analyzer to understand that we want this Protocol to be applied.

handler: Handler # this tells the static analyzer that handler is a Handler protocol, and any further assignments to it must match the Handler protocol

# create handler
if version == "old":
  handler = FileHandler()
elif version == "new":
  handler = NewFileHandler()
Enter fullscreen mode Exit fullscreen mode

Now, in the event that someone defines a new filehandler, and it doesn't have the requisite methods outlined by the protocol, static tests will complain. For example, if NewFileHandler is missing the upload() method, the following error happens on running mypy:

error: Incompatible types in assignment (expression has type "NewFileHandler", variable has type "Handler")
'NewFileHandler' is missing following 'Handler' protocol member:
    upload
Found 1 error in 1 file (checked 1 source file)
Enter fullscreen mode Exit fullscreen mode

In fact, we needn't have gotten rid of the @abstractmethod decorators, and Handler class inheritance, as doing so would allow both static checking and run-time checks.

from typing import Protocol
from abc import abstractmethod

class Handler(Protocol):
  @abstractmethod
  def download(self, path: str) -> bytes:
    raise NotImplementedError

  @abstractmethod
  def upload(self, data: bytes, path: str):
    raise NotImplementedError

class FileHandler(Handler):
  def download(self, path):
    ...

  def upload(self, data, path):
    ...

class NewFileHandler(Handler):
  def download(self, path):
    ...

  def upload(self, data, path):
    ...
Enter fullscreen mode Exit fullscreen mode

What about factories?

Recall we had a little if statement for producing different implementations of Handler depending on the string version.

if version == "old":
  handler = FileHandler()
elif version == "new":
  handler = NewFileHandler()
else:
  raise NotImplementedError
Enter fullscreen mode Exit fullscreen mode

This is a small example of a factory pattern. It's job is to produce the relevant class (the relevant concrete implementation of the Handler base class/protocol). We would probably want to put this code inside a function that lives with the rest of the handlers to keep these concerns separated, so that the application code doesn't need to know about the existence of the different types of Handler, and instead just needs to know that if it calls a make_handler(version) function, it'll receive a class that implements Handler protocol:

def make_handler(version):
  if version == "old":
    handler = FileHandler()
  elif version == "new":
    handler = NewFileHandler()
  else:
    raise NotImplementedError

  return handler
Enter fullscreen mode Exit fullscreen mode

This is a little awkward to write, we could choose to change this into a dictionary, saving us a large if/elif chain. Note here, we are instantiating the class as we pull it out of the dictionary. We don't want to be instantiating the class as we define the ALL_HANDLERS dictionary:

ALL_HANDLERS = {
  "old": FileHandler,
  "new": NewFileHandler,
}

def make_handler(version):
  try:
    handler = ALL_HANDLERS[version]()
  except KeyError as err:
    raise NotImplementedError from err

  return handler
Enter fullscreen mode Exit fullscreen mode

If we were to add static checking, it would become a bit more complex here, you'd likely need to define ALL_HANDLERS as ALL_HANDLERS: Dict[str, Type[Handler]] or similar.

Better factories with introspection

Can we do better than this? As it turns out, we can. Instead of having to explicitly create a new dict entry or elif for every available handler, we can instead add a class variable to each handler to identify them with a string, and then use introspection with the __subclasses__() dunder to fetch all the subclasses of Handler:

class FileHandler(Handler):
  version = "old"

  def download(self, path):
    ...

  def upload(self, data, path):
    ...

class NewFileHandler(Handler):
  version = "new"

  def download(self, path):
    ...

  def upload(self, data, path):
    ...
Enter fullscreen mode Exit fullscreen mode
def make_handler(version):
  try:
    handler = next(filter(lambda P: P.version == version, Handlers.__subclasses__()))()
  except StopIteration as err:
    raise NotImplementedError from err
Enter fullscreen mode Exit fullscreen mode

Even better factories with metaprogramming

As it turns out, we can also use metaprogramming/decorators to achieve this in a slightly nicer way:

class HandlerFactory:
  handlers = {}

  @classmethod
  def make_handler(cls, version):
    try:
      retval = cls.handlers[version]
    except KeyError as err:
      raise NotImplementedError(f"{version=} doesn't exist") from err
    return retval

  @classmethod
  def register(cls, type_name):
    def deco(deco_cls):
      cls.handlers[type_name] = deco_cls
      return deco_cls
    return deco


@HandlerFactory.register('old')
class FileHandler(Handler):
  def download(self, path):
    ...

  def upload(self, data, path):
    ...


@HandlerFactory.register('new')
class NewFileHandler(Handler):
  def download(self, path):
    ...

  def upload(self, data, path):
    ...
Enter fullscreen mode Exit fullscreen mode

Now the instantiation just looks like the following, delegating the task of selecting the correct handler to the factory:

handler = HandlerFactory.make_handler(version)
Enter fullscreen mode Exit fullscreen mode

Much neater - the decorators register the classes with particular version strings, and stores them in the HandlerFactory class's dictionary of handlers, to be recalled later on.

Metaprogramming and static analysis? Have we gone too far?

There should be a type-checked version of this possible, however it's somewhat verbose, and currently our python type checking tools aren't quite mature enough to handle this cleanly. I'd estimate it would look something like this:

class HandlerFactory:
  handlers: ClassVar[Dict[str, Type[Handler]]] = {}

  @classmethod
  def make_handler(cls, version: str) -> Type[Handler]:
    """ Factory for making Handlers """
    try:
      retval: Type[Handler] = cls.handlers[version]
    except KeyError as err:
      raise NotImplementedError(f"{version=} doesn't exist") from err
    return retval

  @classmethod
  def register(cls, type_name:str) -> Callable[[Type[Handler]], Type[Handler]]:
    def deco(deco_cls: Type[Handler]) -> Type[Handler]:
      cls.handlers[type_name] = deco_cls
      return deco_cls
    return deco
Enter fullscreen mode Exit fullscreen mode

Though please be warned: perhaps I'm making a mistake here since I'm unable to test it. Please leave a comment if you see a problem with it!

Top comments (2)

Collapse
 
bartmarinissen profile image
Bart Marinissen

I think MyPy has progressed to the point where your metaprogramming and static analysis code actually works. I managed to get it to work locally, and it complains if I make mistakes.

I made a minor change, having HandlerFactory.make_handler actually return an instance of a handler rather than just the class.

I tried registering classes that register non-handlers, and MyPy caught that. Even cooler, if I create a new handler that does not inherit from handler, but does correctly implement the subtype, mypy recognizes that and allows it, but only if I correctly implement the protocol.

from typing import Callable, ClassVar, Dict, Optional, Type

from typing import Protocol
from abc import abstractmethod


class Handler(Protocol):
    @abstractmethod
    def download(self, path: str) -> bytes:
        raise NotImplementedError

    @abstractmethod
    def upload(self, data: bytes, path: str):
        raise NotImplementedError

        ...


class HandlerFactory:
    handlers: ClassVar[Dict[str, Type[Handler]]] = {}

    @classmethod
    def make_handler(cls, version: str) -> Handler: # Minor change, I make the factory actually create an instance rather than just returning the Class type itself
        """Factory for making Handlers"""
        try:
            retval: Type[Handler] = cls.handlers[version]
        except KeyError as err:
            raise NotImplementedError(f"{version=} doesn't exist") from err
        return retval() # Minor change, I make the factory actually create an instance rather than just returning the Class type itself

    @classmethod
    def register(cls, type_name: str) -> Callable[[Type[Handler]], Type[Handler]]:
        def deco(deco_cls: Type[Handler]) -> Type[Handler]:
            cls.handlers[type_name] = deco_cls
            return deco_cls

        return deco


@HandlerFactory.register("structural")
class StructuralHandler:
    def download(self, path):
        ...

    def upload(self, data, path):
        ...


# only Mypy error in this file: Argument 1 has incompatible type "Type[typoHandler]"; expected "Type[Handler]"
# Note that this means that MyPy does allow the above defined StructuralHandler despite the lack of inheritance.
@HandlerFactory.register("structural_broken") 
class typoHandler:
    def bownload(self, path):  # download misspelled
        ...

    def upload(self, data, path):
        ...


@HandlerFactory.register("old")
class FileHandler(Handler):
    def download(self, path):
        ...

    def upload(self, data, path):
        ...


@HandlerFactory.register("new")
class NewFileHandler(Handler):
    def download(self, path):
        ...

    def upload(self, data, path):
        ...


h: Optional[Handler] = HandlerFactory.make_handler("new")
Enter fullscreen mode Exit fullscreen mode
Collapse
 
dstefaang profile image
D-stefaang

The metaprogramming can be done without decorators by extending the __init_subclass__ of the base class.