Pedro S

Posted on Jul 20, 2020

When to use OOP in Python if you are not developing large applications

#python #beginners #oop #datatypes

Having started studying programming from a data science perspective, I had a lot of trouble understanding not exactly what object-oriented programming (OOP) was, but why should I use it. After all, the code I wrote seemed to do just well without custom classes and objects.

Inspired by an introduction on OOP in Python I watched in a recent online conference (in Brazilian Portuguese) – congrats on the talk, Maria Clara! –, I decided to write an introduction to OOP that will not focus on syntax or Python perks (I will assume you already know fairly well how to declare classes in Python), but will actually focus on when OOP should be used. I will present: (1) to what kind of code and issues OOP was created, (2) whether it has a place in non-complex applications and (3) what features Python has that can be used besides ordinary class declarations for most of the projects.

Why OOP in the first place?

OOP was developed as a strategy to organize code in large and complex applications.

Let's say we are making an application in a bank that will read the balance of an account from a database and decrease it under request of a user, if there are enough funds. It could follow this very simplified layout:

from db_connection import db_connection

def get_balance(db_connection, account_no):
    ...
    return balance

def withdraw(balance, amount):
    if amount <= balance:
        return balance - amount
    else:
        raise Exception('Insufficient funds')

def update_balance(db_connection, account_no, new_balance):
    ...

if __name__ == '__main__':
    account_no = input('Type account number: ')
    to_withdraw = float(input('Type amount to be withdrawn: '))
    old_balance = get_balance(db_connection, account_no)
    new_balance = withdraw(old_balance, to_withdraw)
    update_balance(db_connection, account_no, new_balance)

What if we want to allow a certain category of bank accounts to withdraw more money than available in the balance under a certain interest rate, as a loan? Our code quickly becomes way more complex:

from db_connection import db_connection

ACCOUNT_CATEGORIES_ALLOWED_TO_LOAN = set(...)

def get_balance(db_connection, account_no):
    ...
    return balance

def get_account_category(db_connection, account_no):
    ...
    return account_category

def withdraw(balance, amount, is_allowed_to_loan):
    if not is_allowed_to_loan and amount > balance:
        raise Exception('Insufficient funds')
    else:
        return balance - amount

def update_balance(db_connection, account_no, new_balance):
    ...

def register_loan(db_connection, account_no, amount_loaned):
    ...

if __name__ == '__main__':
    # Get account info
    account_no = input('Type account number: ')
    to_withdraw = float(input('Type amount to be withdrawn: '))
    acct_category = get_account_category(db_connection, account_no)
    # Withdrawal operation
    old_balance = get_balance(db_connection, account_no)
    is_allowed_to_loan = acct_category in ACCOUNT_CATEGORIES_ALLOWED_TO_LOAN
    new_balance = withdraw(old_balance, to_withdraw, is_allowed_to_loan)
    # Updates
    update_balance(db_connection, account_no, new_balance)
    if new_balance < 0:
        register_loan(db_connection, account_no, new_balance)

Now imagine that there could be many different bank account categories, each of them with different consequences for different operations and available funds. The code base would quickly become spaghetti code: a bunch of different functions and flow control statements that is difficult to understand and to maintain.

That is where OOP steps in. It associates a specific set of data with specific functions to act on them, in a way that the relation between the information your code is dealing with and what it is able to do with it is very clear. Let's check how this is done.

Abstraction

Code in OOP is organized through abstractions of real-life objects. In this sense, the bank account in our application example would be a class, a "thing" with its own characteristics (the data, called "attributes" in OOP) and actions (the "methods"), just like it is in the real world:

class Account:
    def __init__(self, balance):
        self.balance = balance

    def withdraw(self, amount):
        if amount <= self.balance:
            self.balance -= amount
        else:
            raise Exception('Insufficient funds')

Just from taking a quick look at the snippet above, you can know that (1) every account has a certain balance (2) every account can receive a "withdraw" action. This is different from what we had in our previous spaghetti application: if we were to change anything regarding bank accounts in our system, we would have to search through all the code to find where the changes should be made – and we would have to hope that we were making the change in all the necessary places. Classes help everything be more concentrated.

Our application could now be simplified to this:

from db_connection import db_connection
from classes.accounts import Account

def get_account(db_connection, account_no) -> Account:
    ...
    return Account(balance)

def update_database(db_connection, account_obj: Account):
    ...

if __name__ == '__main__':
    account_no = input('Type account number: ')
    to_withdraw = float(input('Type amount to be withdrawn: '))
    account = get_account(db_connection, account_no)
    account.withdraw(to_withdraw)
    update_database(db_connection, account)

Notice that all the actions related to changing the data related to the account (ie, the account's state) is not present in this script anymore. The act of withdrawing money can now be all in a separate script where the definition of the Account class is - a package called "classes", with a script called "accounts.py", for example. Any change related to what happens when money is withdrawn from an account should be made in that separate script; any change related to how a user withdraws money (what information is requested, for example), should be made in our main script.

If you paid attention to the type annotations, you may have noticed that the database-related functions now deal with Account objects directly. This makes it easier if, in addition to withdrawing money, we also want the user to be able to call other methods from the Account class - that would just require the addition of some more lines, with no need to instantiate new objects.

Encapsulation

Our Account class can have its balance easily edited during runtime. If we do account.balance = 0.0005, the balance would change, even though that would be a strange amount for an ordinary account in dollars.

That is why it is recommended that the attributes of a class be encapsulated, ie, hidden from the outside world (the rest of the code). In Python, this can be done with the help of the @property decorator (or, alternatively, with the convention of naming attributes with leading underscores¹):

class Account:
    def __init__(self, balance):
        self.balance = balance

    def withdraw(self, amount):
        if amount <= self.balance:
            self.balance -= amount
        else:
            raise Exception('Insufficient funds')

    @property
    def balance(self):
        # Nothing special about getting the balance;
        # we will just return it.
        return self._balance

    @balance.setter
    def balance(self, new_value):
        # When changing the balance, however,
        # we want to enforce certain rules
        if (new_value * 100) % 1 > 0:
            raise Exception('Balance can only have up to two decimal houses')
        else:
            self._balance = new_value

Now, any time that the balance attribute of the Account class is set up with no respect to the rules defined in the Account class itself, an exception is raised:

>>> acct = Account(55.663)
...
Exception: Balance can only have up to two decimal houses

Encapsulation allows the implementation of the attributes to be reserved to the code of the class itself. Instead of checking in our main script if the new balance is a valid value, this action is reserved to the class declaration. Again, this results in a more organized code.

Inheritance

Inheritance is a nice feature of OOP that allows classes to be related to each other. When one class inherits another, all the attributes and the methods of the class it is inheriting from are automatically attached to it, with no code repetition being necessary. In our bank application, this allows different Account types to be easily implemented, as a different HighIncomeAccount type, for example:

class HighIncomeAccount(Account):
    pass

Just the lines of code above are enough to create a different data structure that has the same attributes and methods of the main Account class (and is recognized as an instance of it, although "indirectly", in practice), at the same time it can be recognized as an object of a different type:

>>> simple_account = Account(55)
>>> high_income_account = HighIncomeAccount(99955)
>>> all(hasattr(acct, 'withdraw')
...     for acct in (simple_account, high_income_account))
True
>>> isinstance(high_income_account, Account)
True
>>> type(simple_account) is type(high_income_account)
False

In our application, we would have to change our get_account function to create either an Account or a HighIncomeAccount object depending on the case. However, besides that change, the rest of the code would be able to continue calling account.withdraw in the same way as before. This is how OOP programs are seen to work: as "messages" (such as the withdraw order) being transmitted from one part of the code to another.

Polymorphism

Inheritance can be better used in our application by taking advantage of polymorphism: the same method can produce different results depending on which object it is called. We can, for example, change how withdraw works for a HighIncomeAccount:

class HighIncomeAccount(Account):
    def withdraw(self, amount):
        diff = self.balance - amount
        if diff >= 0:
            super().withdraw(amount)
        else:
            self.amount_loaned = diff
            self.balance = diff

That way, the exception regarding insufficient funds is raised only on Account objects, but not on HighIncomeAccount objects:

>>> simple_account = Account(55)
>>> high_income_account = HighIncomeAccount(99955)
>>> simple_account.withdraw(99999999)
...
Exception: Insufficient funds
>>> high_income_account.withdraw(99999999)
>>> high_income_account.balance
-99900044

And, once again, our main script representing the user interaction can remain unchanged (besides the database interactions, which should be updated to consider the new amount_loaned attribute). All the logic regarding the bank accounts is concentrated in the classes definition. The code base as a whole is, therefore, much easier to read and maintain.

Using OOP in simpler applications

All the above makes a lot of sense if you are dealing with code for complex systems. It is a different reality, however, if you write code for exploratory data analysis, for example, which is much more objective: given a certain dataset, tasks are executed one after another in order to provide certain insights (results). In this case, classes may not be necessary, as your code may not have to deal with different data structures. If everything is a DataFrame and all your functions can act on any DataFrame, there is not much reason to waste your time creating classes and declaring different methods. Much of the features of OOP, such as inheritance and polymorphism, in fact, would just not be useful at all.

As a rule of thumb, creating custom classes is useful when you have the need to associate specific data and actions. That was the case in the application example above: we needed a way to associate a "balance" with a certain "withdraw" action. As a different example, it can also be useful if we are building a scraper that collects information from different sources or in different ways, as the scraper of a hospitals database that also looks for the distance between one hospital and another and checks a different database for the number of beds in the hospital:

class Hospital():
    def __init__(self, address):
        self.address = address
        self.beds_no = self.get_number_of_beds()

    def get_number_of_beds(self):
        ...
        return beds_no

    def get_distance(self, to: 'Hospital'):
        ...
        return distance

Using such a class in your program can make information be transmitted much more easily between different parts of your code. It is clearer and shorter to call Hospital.get_distance(to=another_hospital) when necessary than to retrieve an address, call a separate function like get_distance(from=one_address, to=another_address) and deal with scattered information.

Another good application of data and actions being put together is when you need a different custom data type. In Python, data types such as list and dict can be seen as classes with special methods - as any other class, you can inherit from them and change their behavior. Let's say you need a list that only accepts instances of a dict, and you need to be sure of that, for any obscure reason. Then you can be creative and do:

class ListOfDicts(list):
    def __init__(self):
        # We will not accept iterables as an argument to the constructor,
        # or else ListOfDicts({'a': 'dict'}) will result in ['a'].
        super()

    def append(self, item):
        self._execute_if_is_dict(super().append, item)

    def insert(self, idx, obj):
        self._execute_if_is_dict(super().insert, idx, obj)

    def __setitem__(self, item):
        self._execute_if_is_dict(super().__setitem__, item)

    def _execute_if_is_dict(self, action, *args):
        if not isinstance(args[-1], dict):
            raise Exception('Only dicts are accepted as items')
        else:
            action(*args)

This approach should only be used if you are really sure of which methods you need to override. It shows, however, how Python can be flexible. If you ever catch yourself asking what if a certain data type could behave in a specific way, do some research: it is probable that someone has already written a custom class that does exactly what you need.

Beyond classes: useful data structures

You may be tempted to create a class to encapsulate a simple set of information, for example:

>>> class Person:
...     def __init__(self, name, age, address):
...         self.name = name
...         self.age = age
...         self.address = address
>>> holmes = Person('Sherlock Holmes', 60, '221B Baker Street')

Do not do it this way for simple data structures like this one. You can aggregate data like that in a simple dict, and that will not raise questions regarding the possibility of any special method being attached to your Person class – which is, actually, very simple:

>>> holmes = {
...     'name': 'Sherlock Holmes',
...     'age': 60,
...     'address': '221B Baker Street'
... }

This will equally allow you to retrieve information from the "holmes" object in a very direct way. It is true, however, that you may need a template, ie, a way of ensuring that every possible Person have three different attributes associated with it: a name, an age, and an address. That is the use case of a NamedTuple.

Named tuples

A named tuple is, like a tuple, an immutable ordered collection. However, its items can be retrieved based on a named index, just like in a dict. In the end, they are like an immutable dict that must be created from a specific template:

>>> from collections import namedtuple
>>> Person = namedtuple('Person', ['name', 'age', 'address'])
>>> holmes = Person('Sherlock Holmes', 60, '221B Baker Street')

Instantiating an object from a named tuple is very similar to instantiating an object from a custom class. Accessing the attributes is also done with dot notation and, besides all that, printing the object will exhibit a user-friendly representation:

>>> holmes.age
60
>>> print(holmes)
Person(name='Sherlock Holmes', age=60, address='221B Baker Street')

Data classes

Named tuples may present issues in some applications:

A named tuple can be compared as equal to another that carries the same fields. The holmes object we created could be considered equal to a named tuple Character(name='Sherlock Holmes', age=60, address='221B Baker Street'), for example;
In the same way, a named tuple is also considered equal to a tuple carrying the same fields: holmes == ('Sherlock Holmes', 60, '221B Baker Street') returns True.
Named tuples are iterable. Part of your code may iterate on a Person named tuple and expect it to return a name, an age and then an address; if you add a different field to the named tuple definition (a country attribute, for example), you may break this other part unwillingly.
You may want to change the values of an attribute in the named tuple. However, as tuples are immutable, that is not possible.
You may want more complexity. Maybe you want to query Wikipedia before creating your holmes object, and then save the resulting link to the named tuple itself. This is not possible, as you cannot change the methods underlying the named tuple (unless you create a new custom class yourself).
Composing the attributes in a named tuple based on other named tuples (as if doing class inheritance) is complicated and may result in obscure code.

These issues require a complex data structure - which is solved with the use of classes. However, much of the work related to the creation of a class to hold different attributes was made easier in Python 3.7 with the addition of the @dataclass decorator (see the documentation and the discussion reported in PEP 557). Its basic use eliminates some of the boilerplate necessary when creating a class, at the same time it adds a lot of advanced functionality for when you need something more complex than both a dict and a named tuple:

from dataclasses import dataclass, field

@dataclass
class Person:
    name: str
    age: int
    address: str
    wikipedia_page: str = field(init=False, repr=False)

    def __post_init__(self):
        self.wikipedia_page = get_wikipedia_page(self.name)

def get_wikipedia_page(query):
    ...
    return page_address

The code above is equivalent to this one:

class Person:
    def __init__(self, name: str, age: int, address: str):
        self.name = name
        self.age = age
        self.address = address
        self.wikipedia_page: str = get_wikipedia_page(self.name)

    def __repr__(self):
        attrs_dict = vars(self)
        attrs_dict.pop('wikipedia_page')
        attrs_as_str = ', '.join(f'{k}={v.__repr__()}' for k, v in attrs_dict.items())
        return f'{type(self).__name__}({attrs_as_str})'

def get_wikipedia_page(query):
    ...
    return page_address

What the @dataclass decorator does is to look for the class variables that contain a type annotation and make both a __init__ constructor and a __repr__ method with them. There is also extra functionality: the field function, for example, is telling the decorator that this field should be taken care of by the __post_init__ function and that it should not show up in the __repr__ result. The dataclass class also contains extra functionality that allows for a finer control of how the object will be instantiated (see the options of the class constructor and the field function), compared to others (see eq and order parameters of the constructor) and transformed into different data types (asdict and astuple methods) or in just a different object with different fields (replace method). This is a good amount of fine tuning in a much simpler code structure, as seen from the reduction of lines above.

Conclusion

OOP does not have much space in simple, procedural programs. When necessary, however, they can add a lot of functionality to your data structures at the same time they can make code that is easier to scale and maintain. For everyday scripting (as in much of data science tasks), a dict, a named tuple or the simplification provided by the @dataclass decorator are all good alternatives to the creation of a custom class if there is no necessity of putting together specific data and functions.

Let me know your thoughts and comments. This is my first article on programming and any criticism is very appreciated 😄

What do you think about OOP when not building complex applications? Do you think functional programming or other programming styles can scale just as well?

Cover image by Ross Sneddon on Unsplash.