Sune Debel

Posted on Jul 23, 2020 • Edited on Apr 3, 2021

Be More Lazy, Become More Productive

#functional #python #productivity

In this oxymoronically titled article, we study laziness as a core aspect of functional programming in Python. I'm not talking about hammock driven development or some such leisurely titled paradigm, but lazy evaluation. We'll see how lazy evaluation can make you more productive by improving re-usability and composeability through refactoring a small example, introducing lazyness along the way.

Simply put, lazy evaluation means that expressions are not evaluated until their results are needed. Contrast this with eager evaluation, which is the norm in imperative programming. Under eager evaluation, functions immediately compute their results (and perform their side-effects) when they are called. As an example, consider this python function called get_json which calls a web api as a side-effect and parses the response as json:

import urllib.request
import json


def get_json(url: str) -> dict
    with urllib.request.urlopen(url) as response:
        content = response.read()
        return json.loads(content)

Now, imagine that we want to implement a retry strategy with a simple back-off mechanism. We could take the eager approach and adapt get_json:

import time


def get_json(url: str, retry_attempts: int = 3) -> dict:
    last_exception: Exception
    for attempt in range(1, retry_attempts + 1):
        try:
            with urllib.request.urlopen(url) as response:
                content = response.read()
                return json.loads(content)
        except Exception as e:
            time.sleep(attempt)
            last_exception = e
    raise last_exception

That works, but the solution has a huge shortcoming: We can't re-use the retry strategy for other types of HTTP requests. Or alternatively, but equivalently: get_json violates the single responsibility principle because it now has three responsibilities:

Calling an api
Parsing json
Retrying

This makes it hard to re-use. Let's fix it by being lazy. To keep things general, we'll define a type-alias that models lazy values that are produced with or without side-effects. Let's call this alias Effect since it allows us to treat side-effects as first-class values that can be manipulated by our program, thereby taking the "side" out of "side-effect".

from typing import Callable, TypeVar


A = TypeVar('A')
Effect = Callable[[], A]

We'll use this alias to implement the function retry which can take any Effect and retry it with the same backoff mechanism from before:

def retry(request: Effect[A],
          retry_attempts: int = 3) -> A:
    for attempt in range(1, retry_attempts + 1):
        try:
            return request()
        except Exception as e:
            time.sleep(attempt)
            last_exception = e
    raise last_exception


def get_json_with_retry(url: str, retry_attempts: int = 3) -> dict:
    return retry(lambda: get_json(url), retry_attempts=retry_attempts)

retry treats the results of (for example) HTTP requests as lazy values that can be manipulated. I'm using the term lazy values here because it fits nicely with our theme, but really I could just have said that retry uses get_json as a higher-order function. By treating functions as lazy values that can be passed around, retry achieves more or less total re-usability.

So far so good. Now lets implement a function that executes lazy values in parallel called run_async. Obviously, we want to be able to use run_async with get_json and retry:

from typing import Iterable
from multiprocessing import Pool


def run_async(effects: Iterable[Effect[A]]) -> Iterable[A]:
    with Pool(5) as pool:
        results = [pool.apply_async(effect) for effect in effects]
        return [result.get() for result in results]


def get_json_async_with_retry(urls: Iterable[str], retry_attempts: int = 3) -> Iterable[dict]:
    # lambda url=url is a small "hack" to prevent the url to
    # be mutated in the closures of the lambdas by the for loop
    effects = [lambda url=url: get_json_with_retry(url, retry_attempts) for url in urls]
    return run_async(effects)

(This won't actually work since we have lambdas in the mix and multipocessing doesn't like that without third party libraries, but bear with me.)

That's all fine, but you might object that we've made a mess of our code. I agree. In particular I think that our "glue" functions get_json_with_retry and get_json_async_with_retry are embarrasingly clumsy. What's missing, in my view, is a general solution for glueing lazy values together that would make these specialized glue functions redundant.

To achieve that, we'll use the following dogma:

Functions that perform side effects return Effect instances. In other words, rather than peforming a computation or side-effect directly, they return a lazy description of a result (and/or side-effect) that can be combined with functions that operate on Effect instances.
Functions that operate on Effect instances return new Effect instances

With this scheme, any lazy result can be composed infinitely with functions that operate on lazy results. Indeed, functions that operate on lazy results can be composed with themselves!

So let's maximize the re-usability of our solution by turning our laziness up to The Dude levels of lethargy.

Lets start by refactoring get_json to return an Effect.

def get_json(url: str) -> Effect[dict]:
    def effect() -> dict:
        with urllib.request.urlopen(url) as response:
            content = response.read()
            return json.loads(content)
    return effect

Pretty straight-forward. Now let's do the same for retry and run_async

def retry(request: Effect[A],
          retry_attempts: int = 3) -> Effect[A]:
    def effect() -> A:
        for attempt in range(1, retry_attempts + 1):
            try:
                return request()
            except Exception as e:
                time.sleep(attempt)
                last_exception = e
        raise last_exception
    return effect


def run_async(effects: Iterable[Effect[A]]) -> Effect[Iterable[A]]:
    def effect() -> Iterable[A]:
        with Pool(5) as pool:
            results = [pool.apply_async(effect) for effect in effects]
            return [result.get() for result in results]
    return effect

With this in hand we can compose any variation of our functions to our hearts desire with minimal effort:

url = ...
get_json_with_retry: Effect[dict] = retry(get_json(url))

urls = [...]
get_json_async_with_retry: Effect[Iterable[dict]] = run_async(
    [retry(get_json(url)) for url in urls]
)

First, realise that we could further re-use both get_json_with_retry and get_json_async_with_retry with any function that operates on Effect instances. Also, notice that lazyness (or higher-order functions) is what enables us to program with this degree of re-use, and what allows compositionality on this high level of abstraction (ultimately on the level of entire programs).

When functional programmers claim that programming in functional style makes you more productive, this is the reason: functional programming (which often involves lazy evaluation) can drastically improve re-useability and compositionality, which means you can do much more with much less. All of these advantages were part of my motivation for authoring the library pfun that makes it possible to write Python in functional style without all of the boilerplate and ceremony in this example.

As an added bonus, functional programs are often more predictable and easy to reason about. Also, with few modifications, the pattern we have developed here can be extended to enable completely type-safe dependency injection and error handling. Moreover, static type checking becomes more useful because all functions must return values, even if they merely produce side-effects (in which case they'll return Effect[None]).

In our effort to refactor our example, we have halfway re-invented a common functional pattern: The enigmatic IO type (which we call Effect in our example, and which I hope you don't find mysterious at all at this point). There is much confusion about what IO is and does, especially outside the functional programming clubs of Haskell and Scala programmers. As a consequence you'll sometimes hear IO explained as:

Functions that perform side-effects return IO
Functions that do not perform side-effects do not

While this explanation is not strictly speaking incorrect, it's wildly inadequate because it doesn't mention anything about lazyness which is a core feature of IO. Based on this naive explanation, you might be tempted to try something like:

from typing import TypeVar, Generic


A = TypeVar('A')


class IO(Generic[A]):
    def __init__(self, value: A):
        self.value = value


def get_json(url: str) -> IO[dict]:
    with urllib.request.urlopen(url) as response:
        content = response.read()
        return IO(json.loads(content))

This IO implementation simply tags return values of functions that perform IO, making it clear to the caller that side-effects are involved. Whether this is useful or not is a matter of some controversy, but it doesn't bring any of the functional programming benefits that we have discussed in this article because this IO version is eager.

In summary: lazyness (or higher-order functions) enables radical re-use and compositionality, both of which makes you more productive. To get started with functional programming in Python, checkout the pfun documentation, or the github repository.

DEV Community

Be More Lazy, Become More Productive

suned / pfun

Functional, composable, asynchronous, type-safe Python.

Top comments (0)