I had two courses in programming at the university, in Java and C. The Java course taught object-oriented programming: objects, classes, inheritance and all that.
After graduating as a physics major, I got my first job as an algorithm and software developer. My first algorithm required loading data into the algorithm when the application starts. The algorithm would then use the data and dynamic inputs to calculate outputs.
Having learned Java, I thought: "I need to create a class for my algorithm, and it should have setters and getters for the data and the algorithm inputs." My first implementation of the algorithm then looked like this:
algorithm = new Algorithm();
algorithm.add_data(data);
# Wait for an input event
algorithm.process(input);
output = algorithm.get_result();
# Wait for the next input
algorithm.process(input);
output = algorithm.get_result();
The add_data
is a setter for the data required by the algorithm, and process
is a method that processes algorithm inputs into outputs and stores them to the internal state for later retrieval with the get_result
method.
I thought this was flexible design. With the add_data
method, users of the class would be able to add more data to the algorithm at run-time. Also, the process
function was very flexible in that it didn't return anything. Instead, it stored the algorithm output into the internal state and left it to the user to decide how they'd like to retrieve the output. Maybe they would like to get the algorithm result in pretty-printed form instead of "raw" format! In that case, I could just add another get_result_prettyprinted
method.
However, this is not good design. Any user of the Algorithm
class would need a user guide to know which methods to call and in which order. It would also be hard for the developer to cover all the corner cases of what to do when the user calls the methods in the "wrong" order.
Luckily I was surrounded by more experienced programmers who, through the wonders of code review, taught me better.
We can get rid of the add_data
by loading the data in the constructor:
algorithm = Algorithm(data)
algorithm.process(input)
output = algorithm.get_result()
What if we need to add more data to the Algorithm
at run-time? That's what the add_data
was so good for! But we don't have that requirement now and may never have, so why add it? Even if we had to add more data to Algorithm
, we could create a new Algorithm
instance by combining the old and new data with something like Algorithm(old_data + new_data)
.
Can we get rid of the process
method? Yes:
algorithm = Algorithm(data)
output = algorithm.compute(input)
We return the output
from the compute
function and get rid of the process
. But what about the great plan about Algorithm
class having multiple alternative methods for accessing the result in the format required by the user? Surely that would be useful!
Maybe it could, but it's not the algorithm's job to do formatting. If we need pretty-printing, it would be another class' or function's responsibility.
Can we trim the class even more? Some say that classes with one (public) method should not be classes. Let's get rid of the class as well:
from functools import partial
def algorithm(data, input):
...
return output
output = algorithm(data, input)
# Or if you need the algorithm elsewhere
algo = partial(algorithm, data)
output = algo(data)
Instead of using functools.partial
, we could also write our own factory function returning a named function. This solution is also type-safe:
import typing
# Data types, use e.g. frozen dataclasses
AlgorithmData = ...
AlgorithmInput = ...
AlgorithmOutput = ...
# Function type, use `typing.Protocol` for more flexibility
Algorithm = typing.Callable[[AlgorithmInput], AlgorithmOutput]
def make_algorithm(data: AlgorithmData) -> Algorithm:
def algorithm(input: AlgorithmInput) -> AlgorithmOutput:
...
return algorithm
data = ...
algorithm = make_algorithm(data)
input = ...
output = algorithm(input)
And there you have it, we've replaced the whole class with a pure function with no side-effects. We ended up doing some functional programming by replacing the class with internal state with a pure function. The key point in all of the above is not, however, that we replaced the class with a function. The key point is that we trimmed our algorithm to a single responsibility.
My learning from the above is that there's a danger lurking in object-oriented designs: It's easy to end up with massive classes having too many responsibilities. Functional programming tends to, in my opinion, lead to more decoupled designs. This doesn't mean that one paradigm is better than the other. But be careful with classes.
Top comments (0)