Having started studying programming from a data science perspective, I had a lot of trouble understanding not exactly what object-oriented programming (OOP) was, but why should I use it. After all, the code I wrote seemed to do just well without custom classes and objects.
Inspired by an introduction on OOP in Python I watched in a recent online conference (in Brazilian Portuguese) – congrats on the talk, Maria Clara! –, I decided to write an introduction to OOP that will not focus on syntax or Python perks (I will assume you already know fairly well how to declare classes in Python), but will actually focus on when OOP should be used. I will present: (1) to what kind of code and issues OOP was created, (2) whether it has a place in non-complex applications and (3) what features Python has that can be used besides ordinary class declarations for most of the projects.
Why OOP in the first place?
OOP was developed as a strategy to organize code in large and complex applications.
Let's say we are making an application in a bank that will read the balance of an account from a database and decrease it under request of a user, if there are enough funds. It could follow this very simplified layout:
from db_connection import db_connection
def get_balance(db_connection, account_no):
...
return balance
def withdraw(balance, amount):
if amount <= balance:
return balance - amount
else:
raise Exception('Insufficient funds')
def update_balance(db_connection, account_no, new_balance):
...
if __name__ == '__main__':
account_no = input('Type account number: ')
to_withdraw = float(input('Type amount to be withdrawn: '))
old_balance = get_balance(db_connection, account_no)
new_balance = withdraw(old_balance, to_withdraw)
update_balance(db_connection, account_no, new_balance)
What if we want to allow a certain category of bank accounts to withdraw more money than available in the balance under a certain interest rate, as a loan? Our code quickly becomes way more complex:
from db_connection import db_connection
ACCOUNT_CATEGORIES_ALLOWED_TO_LOAN = set(...)
def get_balance(db_connection, account_no):
...
return balance
def get_account_category(db_connection, account_no):
...
return account_category
def withdraw(balance, amount, is_allowed_to_loan):
if not is_allowed_to_loan and amount > balance:
raise Exception('Insufficient funds')
else:
return balance - amount
def update_balance(db_connection, account_no, new_balance):
...
def register_loan(db_connection, account_no, amount_loaned):
...
if __name__ == '__main__':
# Get account info
account_no = input('Type account number: ')
to_withdraw = float(input('Type amount to be withdrawn: '))
acct_category = get_account_category(db_connection, account_no)
# Withdrawal operation
old_balance = get_balance(db_connection, account_no)
is_allowed_to_loan = acct_category in ACCOUNT_CATEGORIES_ALLOWED_TO_LOAN
new_balance = withdraw(old_balance, to_withdraw, is_allowed_to_loan)
# Updates
update_balance(db_connection, account_no, new_balance)
if new_balance < 0:
register_loan(db_connection, account_no, new_balance)
Now imagine that there could be many different bank account categories, each of them with different consequences for different operations and available funds. The code base would quickly become spaghetti code: a bunch of different functions and flow control statements that is difficult to understand and to maintain.
That is where OOP steps in. It associates a specific set of data with specific functions to act on them, in a way that the relation between the information your code is dealing with and what it is able to do with it is very clear. Let's check how this is done.
Abstraction
Code in OOP is organized through abstractions of real-life objects. In this sense, the bank account in our application example would be a class, a "thing" with its own characteristics (the data, called "attributes" in OOP) and actions (the "methods"), just like it is in the real world:
class Account:
def __init__(self, balance):
self.balance = balance
def withdraw(self, amount):
if amount <= self.balance:
self.balance -= amount
else:
raise Exception('Insufficient funds')
Just from taking a quick look at the snippet above, you can know that (1) every account has a certain balance (2) every account can receive a "withdraw" action. This is different from what we had in our previous spaghetti application: if we were to change anything regarding bank accounts in our system, we would have to search through all the code to find where the changes should be made – and we would have to hope that we were making the change in all the necessary places. Classes help everything be more concentrated.
Our application could now be simplified to this:
from db_connection import db_connection
from classes.accounts import Account
def get_account(db_connection, account_no) -> Account:
...
return Account(balance)
def update_database(db_connection, account_obj: Account):
...
if __name__ == '__main__':
account_no = input('Type account number: ')
to_withdraw = float(input('Type amount to be withdrawn: '))
account = get_account(db_connection, account_no)
account.withdraw(to_withdraw)
update_database(db_connection, account)
Notice that all the actions related to changing the data related to the account (ie, the account's state) is not present in this script anymore. The act of withdrawing money can now be all in a separate script where the definition of the Account
class is - a package called "classes", with a script called "accounts.py", for example. Any change related to what happens when money is withdrawn from an account should be made in that separate script; any change related to how a user withdraws money (what information is requested, for example), should be made in our main script.
If you paid attention to the type annotations, you may have noticed that the database-related functions now deal with Account
objects directly. This makes it easier if, in addition to withdrawing money, we also want the user to be able to call other methods from the Account
class - that would just require the addition of some more lines, with no need to instantiate new objects.
Encapsulation
Our Account
class can have its balance easily edited during runtime. If we do account.balance = 0.0005
, the balance would change, even though that would be a strange amount for an ordinary account in dollars.
That is why it is recommended that the attributes of a class be encapsulated, ie, hidden from the outside world (the rest of the code). In Python, this can be done with the help of the @property
decorator (or, alternatively, with the convention of naming attributes with leading underscores1):
class Account:
def __init__(self, balance):
self.balance = balance
def withdraw(self, amount):
if amount <= self.balance:
self.balance -= amount
else:
raise Exception('Insufficient funds')
@property
def balance(self):
# Nothing special about getting the balance;
# we will just return it.
return self._balance
@balance.setter
def balance(self, new_value):
# When changing the balance, however,
# we want to enforce certain rules
if (new_value * 100) % 1 > 0:
raise Exception('Balance can only have up to two decimal houses')
else:
self._balance = new_value
Now, any time that the balance
attribute of the Account
class is set up with no respect to the rules defined in the Account
class itself, an exception is raised:
>>> acct = Account(55.663)
...
Exception: Balance can only have up to two decimal houses
Encapsulation allows the implementation of the attributes to be reserved to the code of the class itself. Instead of checking in our main script if the new balance is a valid value, this action is reserved to the class declaration. Again, this results in a more organized code.
Inheritance
Inheritance is a nice feature of OOP that allows classes to be related to each other. When one class inherits another, all the attributes and the methods of the class it is inheriting from are automatically attached to it, with no code repetition being necessary. In our bank application, this allows different Account
types to be easily implemented, as a different HighIncomeAccount
type, for example:
class HighIncomeAccount(Account):
pass
Just the lines of code above are enough to create a different data structure that has the same attributes and methods of the main Account
class (and is recognized as an instance of it, although "indirectly", in practice), at the same time it can be recognized as an object of a different type:
>>> simple_account = Account(55)
>>> high_income_account = HighIncomeAccount(99955)
>>> all(hasattr(acct, 'withdraw')
... for acct in (simple_account, high_income_account))
True
>>> isinstance(high_income_account, Account)
True
>>> type(simple_account) is type(high_income_account)
False
In our application, we would have to change our get_account
function to create either an Account
or a HighIncomeAccount
object depending on the case. However, besides that change, the rest of the code would be able to continue calling account.withdraw
in the same way as before. This is how OOP programs are seen to work: as "messages" (such as the withdraw
order) being transmitted from one part of the code to another.
Polymorphism
Inheritance can be better used in our application by taking advantage of polymorphism: the same method can produce different results depending on which object it is called. We can, for example, change how withdraw
works for a HighIncomeAccount
:
class HighIncomeAccount(Account):
def withdraw(self, amount):
diff = self.balance - amount
if diff >= 0:
super().withdraw(amount)
else:
self.amount_loaned = diff
self.balance = diff
That way, the exception regarding insufficient funds is raised only on Account
objects, but not on HighIncomeAccount
objects:
>>> simple_account = Account(55)
>>> high_income_account = HighIncomeAccount(99955)
>>> simple_account.withdraw(99999999)
...
Exception: Insufficient funds
>>> high_income_account.withdraw(99999999)
>>> high_income_account.balance
-99900044
And, once again, our main script representing the user interaction can remain unchanged (besides the database interactions, which should be updated to consider the new amount_loaned
attribute). All the logic regarding the bank accounts is concentrated in the classes definition. The code base as a whole is, therefore, much easier to read and maintain.
Using OOP in simpler applications
All the above makes a lot of sense if you are dealing with code for complex systems. It is a different reality, however, if you write code for exploratory data analysis, for example, which is much more objective: given a certain dataset, tasks are executed one after another in order to provide certain insights (results). In this case, classes may not be necessary, as your code may not have to deal with different data structures. If everything is a DataFrame and all your functions can act on any DataFrame, there is not much reason to waste your time creating classes and declaring different methods. Much of the features of OOP, such as inheritance and polymorphism, in fact, would just not be useful at all.
As a rule of thumb, creating custom classes is useful when you have the need to associate specific data and actions. That was the case in the application example above: we needed a way to associate a "balance" with a certain "withdraw" action. As a different example, it can also be useful if we are building a scraper that collects information from different sources or in different ways, as the scraper of a hospitals database that also looks for the distance between one hospital and another and checks a different database for the number of beds in the hospital:
class Hospital():
def __init__(self, address):
self.address = address
self.beds_no = self.get_number_of_beds()
def get_number_of_beds(self):
...
return beds_no
def get_distance(self, to: 'Hospital'):
...
return distance
Using such a class in your program can make information be transmitted much more easily between different parts of your code. It is clearer and shorter to call Hospital.get_distance(to=another_hospital)
when necessary than to retrieve an address, call a separate function like get_distance(from=one_address, to=another_address)
and deal with scattered information.
Another good application of data and actions being put together is when you need a different custom data type. In Python, data types such as list
and dict
can be seen as classes with special methods - as any other class, you can inherit from them and change their behavior. Let's say you need a list
that only accepts instances of a dict
, and you need to be sure of that, for any obscure reason. Then you can be creative and do:
class ListOfDicts(list):
def __init__(self):
# We will not accept iterables as an argument to the constructor,
# or else ListOfDicts({'a': 'dict'}) will result in ['a'].
super()
def append(self, item):
self._execute_if_is_dict(super().append, item)
def insert(self, idx, obj):
self._execute_if_is_dict(super().insert, idx, obj)
def __setitem__(self, item):
self._execute_if_is_dict(super().__setitem__, item)
def _execute_if_is_dict(self, action, *args):
if not isinstance(args[-1], dict):
raise Exception('Only dicts are accepted as items')
else:
action(*args)
This approach should only be used if you are really sure of which methods you need to override. It shows, however, how Python can be flexible. If you ever catch yourself asking what if a certain data type could behave in a specific way, do some research: it is probable that someone has already written a custom class that does exactly what you need.
Beyond classes: useful data structures
You may be tempted to create a class to encapsulate a simple set of information, for example:
>>> class Person:
... def __init__(self, name, age, address):
... self.name = name
... self.age = age
... self.address = address
>>> holmes = Person('Sherlock Holmes', 60, '221B Baker Street')
Do not do it this way for simple data structures like this one. You can aggregate data like that in a simple dict
, and that will not raise questions regarding the possibility of any special method being attached to your Person
class – which is, actually, very simple:
>>> holmes = {
... 'name': 'Sherlock Holmes',
... 'age': 60,
... 'address': '221B Baker Street'
... }
This will equally allow you to retrieve information from the "holmes" object in a very direct way. It is true, however, that you may need a template, ie, a way of ensuring that every possible Person
have three different attributes associated with it: a name, an age, and an address. That is the use case of a NamedTuple
.
Named tuples
A named tuple is, like a tuple
, an immutable ordered collection. However, its items can be retrieved based on a named index, just like in a dict
. In the end, they are like an immutable dict
that must be created from a specific template:
>>> from collections import namedtuple
>>> Person = namedtuple('Person', ['name', 'age', 'address'])
>>> holmes = Person('Sherlock Holmes', 60, '221B Baker Street')
Instantiating an object from a named tuple is very similar to instantiating an object from a custom class. Accessing the attributes is also done with dot notation and, besides all that, printing the object will exhibit a user-friendly representation:
>>> holmes.age
60
>>> print(holmes)
Person(name='Sherlock Holmes', age=60, address='221B Baker Street')
Data classes
Named tuples may present issues in some applications:
- A named tuple can be compared as equal to another that carries the same fields. The
holmes
object we created could be considered equal to a named tupleCharacter(name='Sherlock Holmes', age=60, address='221B Baker Street')
, for example; - In the same way, a named tuple is also considered equal to a tuple carrying the same fields:
holmes == ('Sherlock Holmes', 60, '221B Baker Street')
returnsTrue
. - Named tuples are iterable. Part of your code may iterate on a
Person
named tuple and expect it to return a name, an age and then an address; if you add a different field to the named tuple definition (acountry
attribute, for example), you may break this other part unwillingly. - You may want to change the values of an attribute in the named tuple. However, as tuples are immutable, that is not possible.
- You may want more complexity. Maybe you want to query Wikipedia before creating your
holmes
object, and then save the resulting link to the named tuple itself. This is not possible, as you cannot change the methods underlying the named tuple (unless you create a new custom class yourself). - Composing the attributes in a named tuple based on other named tuples (as if doing class inheritance) is complicated and may result in obscure code.
These issues require a complex data structure - which is solved with the use of classes. However, much of the work related to the creation of a class to hold different attributes was made easier in Python 3.7 with the addition of the @dataclass
decorator (see the documentation and the discussion reported in PEP 557). Its basic use eliminates some of the boilerplate necessary when creating a class, at the same time it adds a lot of advanced functionality for when you need something more complex than both a dict
and a named tuple:
from dataclasses import dataclass, field
@dataclass
class Person:
name: str
age: int
address: str
wikipedia_page: str = field(init=False, repr=False)
def __post_init__(self):
self.wikipedia_page = get_wikipedia_page(self.name)
def get_wikipedia_page(query):
...
return page_address
The code above is equivalent to this one:
class Person:
def __init__(self, name: str, age: int, address: str):
self.name = name
self.age = age
self.address = address
self.wikipedia_page: str = get_wikipedia_page(self.name)
def __repr__(self):
attrs_dict = vars(self)
attrs_dict.pop('wikipedia_page')
attrs_as_str = ', '.join(f'{k}={v.__repr__()}' for k, v in attrs_dict.items())
return f'{type(self).__name__}({attrs_as_str})'
def get_wikipedia_page(query):
...
return page_address
What the @dataclass
decorator does is to look for the class variables that contain a type annotation and make both a __init__
constructor and a __repr__
method with them. There is also extra functionality: the field
function, for example, is telling the decorator that this field should be taken care of by the __post_init__
function and that it should not show up in the __repr__
result. The dataclass
class also contains extra functionality that allows for a finer control of how the object will be instantiated (see the options of the class constructor and the field function), compared to others (see eq
and order
parameters of the constructor) and transformed into different data types (asdict
and astuple
methods) or in just a different object with different fields (replace
method). This is a good amount of fine tuning in a much simpler code structure, as seen from the reduction of lines above.
Conclusion
OOP does not have much space in simple, procedural programs. When necessary, however, they can add a lot of functionality to your data structures at the same time they can make code that is easier to scale and maintain. For everyday scripting (as in much of data science tasks), a dict
, a named tuple or the simplification provided by the @dataclass
decorator are all good alternatives to the creation of a custom class if there is no necessity of putting together specific data and functions.
Let me know your thoughts and comments. This is my first article on programming and any criticism is very appreciated 😄
What do you think about OOP when not building complex applications? Do you think functional programming or other programming styles can scale just as well?
Cover image by Ross Sneddon on Unsplash.
-
Check the use of single and double leading underscores here: https://dbader.org/blog/meaning-of-underscores-in-python. This is a convention adopted in PEP8, the style guide of the Python language: https://www.python.org/dev/peps/pep-0008/. ↩
Top comments (0)