Artem

Posted on Sep 29

A Design Pattern Every Python Developer Should Know

#architecture #designpatterns #python

Today we will look at the very useful design pattern in the modern Python development.

Meet The Template method pattern

The Template Method is a design pattern that gives you a master plan for an algorithm. Think of it like a recipe with some specific steps left blank. The main recipe (the "template") is fixed, so you can't change the overall order of the steps. However, you can fill in the blank parts with your own custom code.

The problem template method solves

Let's take the real life problem. Imagine you are building a Crawler IP's Management Service responsible for securely identifying incoming requests from known search engine bots (Applebot, Bingbot, Googlebot, etc.) to improve SEO performance of your app. For each specific bot, the system must perform a sequence of standard steps to fetch and cache its valid list of IP addresses:

Check Cache: Look up the IP list using a bot-specific key in the cache storage.

Fetch Data: If the cache is empty, request the data from the bot's API URL.

Process Data: Extract the actual list of IPv4 and IPv6 addresses from the API response format.

Save Cache: Store the resulting IP list for a set duration.

The initial implementation involves creating a separate class for every bot (e.g., AppleCrawlersService, DuckDuckGoCrawlersService).

class DuckDuckGoCrawlersService:
    """
    A standalone service for managing DuckDuckGo bot IPs.
    This class also contains all the logic, further demonstrating duplication.
    """
    _cache_key = "duckduckgo_bot"
    _ips_list_url = settings.DUCKDUCK_BOT_IPS_URL

    def _extract_ips_from_api_response(self, data):
        """Extracts the list of IPs from the DuckDuckGo API response."""
        content = data.content
        if not content:
            logger.warning(f"No data available to extract IPs for key {self._cache_key}")
            return []

        processed_ips = []
        for line in content.splitlines():
            if line.startswith("- "):
                # Remove '-' and all spaces
                cleaned_line = re.sub(r"[-\s]", "", line)
                processed_ips.append(cleaned_line)
        return processed_ips

    # --- THE DUPLICATED LOGIC STARTS HERE ---

    def _fetch_ips(self, url: str):
        http_client = MockHTTPClientBase(api_base_url=url, api_secret="")
        response = http_client.make_request()
        logger.info(f"Fetched IPs with key: {self._cache_key}")
        return response.content

    def _get_ips_from_cache(self, key: str):
        logger.info(f"Getting crawlers ips from cache: {key}")
        return cache.get(key)

    def _save_ips_to_cache(self, key: str, ip_list: list):
        cache.set(key, ip_list, settings.DEFAULT_CRAWLERS_CACHE_TIMEOUT)
        logger.info(f"Saved crawlers IPs to cache with key: {key}")

    def get_ips_list(self) -> list[str]:
        """
        This entire method is duplicated from the other service classes.
        """
        ips = self._get_ips_from_cache(self._cache_key)
        if ips:
            return ips
        urls = self._ips_list_url if isinstance(self._ips_list_url, list) else [self._ips_list_url]

        with concurrent.futures.ThreadPoolExecutor() as executor:
            responses = list(executor.map(self._fetch_ips, urls))

        ips = list(chain.from_iterable(self._extract_ips_from_api_response(resp) for resp in responses))
        self._save_ips_to_cache(self._cache_key, ips)
        return ips

    # --- DUPLICATED LOGIC ENDS HERE ---


class BingCrawlersService:
    """
    A standalone service for managing Bingbot IPs.
    This class is nearly identical to the Apple service.
    """
    _cache_key = "bingbot"
    _ips_list_url = settings.BINGBOT_IPS_URL

    def _extract_ips_from_api_response(self, data):
        """Extracts the list of IPs from the Bingbot API response."""
        ip_addresses = []
        prefixes = data.get("prefixes", [])
        for prefix in prefixes:
            if settings.CRAWLERS_IP_V4_PREFIX in prefix:
                ip_addresses.append(prefix[settings.CRAWLERS_IP_V4_PREFIX])
            elif settings.CRAWLERS_IP_V6_PREFIX in prefix:
                ip_addresses.append(prefix[settings.CRAWLERS_IP_V6_PREFIX])
        return ip_addresses

    # --- THE DUPLICATED LOGIC STARTS HERE ---

    def _fetch_ips(self, url: str):
        http_client = MockHTTPClientBase(api_base_url=url, api_secret="")
        response = http_client.make_request()
        logger.info(f"Fetched IPs with key: {self._cache_key}")
        return response.content

    def _get_ips_from_cache(self, key: str):
        logger.info(f"Getting crawlers ips from cache: {key}")
        return cache.get(key)

    def _save_ips_to_cache(self, key: str, ip_list: list):
        cache.set(key, ip_list, settings.DEFAULT_CRAWLERS_CACHE_TIMEOUT)
        logger.info(f"Saved crawlers IPs to cache with key: {key}")

    def get_ips_list(self) -> list[str]:
        """
        This entire method is duplicated from the Apple service class.
        """
        ips = self._get_ips_from_cache(self._cache_key)
        if ips:
            return ips
        urls = self._ips_list_url if isinstance(self._ips_list_url, list) else [self._ips_list_url]

        with concurrent.futures.ThreadPoolExecutor() as executor:
            responses = list(executor.map(self._fetch_ips, urls))

        ips = list(chain.from_iterable(self._extract_ips_from_api_response(resp) for resp in responses))
        self._save_ips_to_cache(self._cache_key, ips)
        return ips

    # --- DUPLICATED LOGIC ENDS HERE ---

This works totally fine! But now imagine if you need to add a new crawler parser in the system or change anything in the existing logic for crawlers fetching. Then you need to copy paste your code, create new classes or change existing working code, this approach is error prone and takes additional efforts.

Now look at the code again, do you see any common logic on these classes?

The methods are the same across all crawlers services: _fetch_ips(), _get_ips_from_cache(), _save_ips_to_cache()

And these might be different implementation per class: _extract_ips_from_api_response(), ips_list_url, cache_key

We can create a new abstract class and implement all common methods there and also mark unique details as abstract methods.

class CrawlersServiceABC(ABC):
    @property
    def cache_key(self) -> str:
        """
        The name of the key with which to associate the cached list of crawlers IPs

        Returns:
             str: A name of the key for the cache
        """
        raise NotImplementedError

    @property
    def ips_list_url(self) -> list | str:
        """
        The URL of the resource that the _fetch_ips() method uses to get IP addresses

        Returns:
             str: URL of the resource
        """
        raise NotImplementedError

    @abstractmethod
    def _extract_ips_from_api_response(self, data: dict | requests.Response) -> list[str]:
        """
        Extracts the list of IPs from the response provided by resource

        Returns:
             list[str]: A list of IP addresses associated with the crawler
        """
        raise NotImplementedError

    def _fetch_ips(self, url: str) -> dict | requests.Response:
        http_client = HTTPClientBase(api_base_url=url, api_secret="")

        response = http_client.make_request()
        logger.info(f"Fetched IPs with key: {self.cache_key}")

        return response.content

    def _get_ips_from_cache(self, key: str):
        logger.info(f"Getting crawlers ips from cache: {key}")
        return cache.get(key)

    def _save_ips_to_cache(self, key: str, ip_list: list):
        cache.set(key, ip_list, settings.DEFAULT_CRAWLERS_CACHE_TIMEOUT)
        logger.info(f"Saved crawlers IPs to cache with key: {key}")

    def get_ips_list(self) -> list[str]:
        """
        Single public method that is an entry point for the client's code.
        Either it makes a request to a resource that contains a list of crawlers IPs
        or takes a list of IPs from the cache

        Returns:
             list[str]: A list of IP addresses associated with the crawler
        """
        ips = self._get_ips_from_cache(self.cache_key)
        if ips:
            return ips
        urls = self.ips_list_url if isinstance(self.ips_list_url, list) else [self.ips_list_url]

        # Use ThreadPoolExecutor to fetch IPs concurrently
        with concurrent.futures.ThreadPoolExecutor() as executor:
            responses = list(executor.map(self._fetch_ips, urls))

        # Extract IPs from all responses
        ips = list(chain.from_iterable(self._extract_ips_from_api_response(resp) for resp in responses))
        self._save_ips_to_cache(self.cache_key, ips)
        return ips

Do not pay much attention to the details, this code might a bit complicated, but the main part here is the get_ips_list() method.

This is our template method. It defines the fixed algorithm skeleton. You should generally not redefine this in subclasses.

Updated subclasses look like this:

class AppleCrawlersService(CrawlersServiceABC):
    _cache_key = "applebot"
    _ips_list_url = settings.APPLEBOT_IPS_URL

    def _extract_ips_from_api_response(self, data):
        ip_addresses = []
        prefixes = data.get("prefixes", [])

        for prefix in prefixes:
            if settings.CRAWLERS_IP_V4_PREFIX in prefix:
                ip_addresses.append(prefix[settings.CRAWLERS_IP_V4_PREFIX])
            elif settings.CRAWLERS_IP_V6_PREFIX in prefix:
                ip_addresses.append(prefix[settings.CRAWLERS_IP_V6_PREFIX])

        return ip_addresses

    @property
    def cache_key(self) -> str:
        return self._cache_key

    @property
    def ips_list_url(self) -> str:
        return self._ips_list_url

class DuckDuckGoCrawlersService(CrawlersServiceABC):
    _cache_key = "duckduckgo_bot"
    _ips_list_url = settings.DUCKDUCK_BOT_IPS_URL

    def _fetch_ips(self, url: str):
        try:
            response = requests.get(self._ips_list_url)
            response.raise_for_status()
            logger.info(f"Fetched IPs with key: {self.cache_key}")
            return response
        except requests.RequestException as e:
            logger.warning(f"Error fetching IPs with key {self.cache_key}: {e}")

    def _extract_ips_from_api_response(self, data):
        if not data or not data.text:
            logger.warning(f"No data available to extract IPs for key {self.cache_key}")
            return []

        content = data.text
        processed_ips = []

        for line in content.splitlines():
            if line.startswith("- "):
                # Remove '-' and all spaces
                cleaned_line = re.sub(r"[-\s]", "", line)
                processed_ips.append(cleaned_line)

        return processed_ips

    @property
    def cache_key(self) -> str:
        return self._cache_key

    @property
    def ips_list_url(self) -> str:
        return self._ips_list_url

Now each of them must implement _extract_ips_from_api_response() method.

But common methods _get_ips_from_cache() and _save_ips_to_cache() are taken from CrawlersServiceABC

All common logic is located directly in the CrawlersServiceABC abstract class, child classes manage the concrete logic of abstract methods, the entire algorithm is built using the template method. Finally, template method can be called by the client code.

apple_crawler_service = AppleCrawlersService()
duck_duck_go_crawler_service = DuckDuckGoCrawlersService()

apple_crawler_service.get_ips_list()
duck_duck_go_crawler_service.get_ips_list()

You might notice that we also redefined _fetch_ips() method for the DuckDuckGoCrawlersService, however it is not abstract method.

These methods are called hooks. Hooks are methods that have a default, common implementation in the base class, but which can be optionally overridden by a subclass to provide specific behavior.

We can do the same with the cache_key, ips_list_url properties if we want to.

So why is it important for Python devs?

Now that we've learned a new design pattern, we need to make sure we know how to recognize when someone else used it. Recognizing the Template Method pattern makes it much easier to understand someone else's code.

The most ubiquitous example of the Template Method pattern in Python is found in the Magic Methods (or "Dunder" methods, like __init__, __new__, __str__, etc.).

When you implement a class, you often override these built-in methods to customize how your object behaves. The secret here is that Python executes your code the same way as template method executes the code of your hooks in subclasses.

The process of creating an object in Python is one of the clearest examples of the Template Method pattern being executed by the interpreter itself.

The Template Method (the overall instantiation process) dictates the fixed order: __new__ then __init__. The Primitive Operation that developers override to inject their unique setup logic is __init__.

class BaseObject:
    # __new__ is part of the FIXED TEMPLATE. It creates the object.
    def __new__(cls, *args, **kwargs):
        print("TEMPLATED STEP 1: Allocating memory and creating raw object instance.")
        instance = super().__new__(cls)
        return instance

class UserProfile(BaseObject):
    # __init__ is hook we override.
    # We provide the useful logic for us here.
    def __init__(self, username, is_admin=False):
        print(f"TEMPLATED STEP 2: Running initialization logic for UserProfile('{username}')")
        self.username = username
        self.is_admin = is_admin
        print("TEMPLATED STEP 2 COMPLETE: Object configuration finished.")

# When we call the constructor, the Template Method (instantiation) runs:
print("--- Starting UserProfile(name='Alice') ---")
user = UserProfile(username="Alice")
# The final step of the template returns the initialized object.

print(f"Resulting object: {user.username}, Admin: {user.is_admin}")

Another real world example is DRF's views system design.

Remember how you manage your views using DRF's generic views?

Here is the code of the ListModelMixin class and the list() method is implemented as a template method.

class ListModelMixin:
    """
    List a queryset.
    """
    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())

        page = self.paginate_queryset(queryset)
        if page is not None:
            serializer = self.get_serializer(page, many=True)
            return self.get_paginated_response(serializer.data)

        serializer = self.get_serializer(queryset, many=True)
        return Response(serializer.data)

When you create your custom view like this one:

class PostListView(GenericAPIView, ListModelMixin):
    # Primitive Operation 1: Tell the Template which data to use.
    queryset = Post.objects.filter(published=True)

    # Primitive Operation 2: Tell the Template how to structure the data.
    serializer_class = PostSerializer

    # Template Method Hook: The base ListModelMixin.list() method calls
    def get(self, request, *args, **kwargs):
        return self.list(request, *args, **kwargs)

# The list() method (inside ListModelMixin) is the Template:
# 1. Calls get_queryset() (which uses our Primitive Operation: queryset)
# 2. Checks for pagination
# 3. Calls get_serializer() (which uses our Primitive Operation: serializer_class)
# 4. Returns Response

You are using a template method pattern without even noticing it, Django create the skeleton of the algorithm for you and you just override the hooks.

Conclusion

The Template Method pattern solves code duplication by defining a fixed algorithm in a base class, forcing subclasses to only implement the unique steps. Python developers may use this pattern implicitly in their code, but understanding The Template Method makes it much easier to understand other people's code and helps create a flexible architecture.

DEV Community

A Design Pattern Every Python Developer Should Know

The problem template method solves

So why is it important for Python devs?

Conclusion

Top comments (0)