Python data model
Why len
and not list.length
?
This is a recurrent question when it comes to writing Python code, and if you are familiar with OOP, you probably asked the same question before.
To talk about this, we should take a look into special methods some other languages, like Ruby, call them magic methods.
These methods are not intended to be invoked directly by the user; instead, the interpreter invokes them.
As a result, we can deduce that the truth of the matter is len(sequence)
built-in function calls sequence.__len__()
behind the scenes.
That being said, our question remains unanswered, but actually, Python has a pretty good explanation for this behavior.
A user defined __len__
method to calculate the length of a sequence might be very custom, and vary from user to user.
But __len__
method of Python built-in types, like list
, tuple
, str
or any other sequence is more complicated, and faster, than that.
Since Python is written in C, invoking len
function with a built-in sequence type, Python, rather than iterating over the sequence and count how many items are inside it, looks for a C data structure field called ob_size
, which is way faster.
This is because, __len__
method of the built-in sequences is written to do so.
We can conclude, Python is quite flexible; therefore, we are free to create any custom type, customizing each special method of our classes as much as we need.
In this article, I will show you how defining a couple of special methods, you will be able to create classes which behaves as any built-in Python sequence.
Writing a Playlist class
To show you, how to create a Playlist
class which will behave as a Python builtin sequence
we will create a new Song
namedtuple , after all, what is a Playlist
without songs?
Why a namedtuple
and not a class
?
Since my Song
is only a “vessel” that don’t require any special logic and its data will remain the same during all the program execution, a namedtuple
is a better option to use, they also have a nice human-readable string representation, which is easier to read when we use print
.
from collections import namedtuple
song_attrs = ("name", "album", "artist")
Song = namedtuple("Song", song_attrs)
Ok, now our Song
type is done, then, let’s start with the Playlist
class.
from typing import List
class Playlist:
def __init__(self):
self.__songs: List[Song] = []
So far, we created a Playlist
class which has a songs
private attribute, this attribute is specifically a list
of Song
instances.
The next step is to implement a __len__
and a __getitem__
methods as well, Python does not use interfaces
, but it actually requires implementing at least those two methods in order to considere your class as a sequence
.
from typing import List
class Playlist:
def __init__(self):
self.__songs: List[Song] = []
@property
def songs(self) -> List[Song]:
return self.__songs
def __len__(self) -> int:
return len(self.songs)
def __getitem__(self, index: int) -> Song:
return self.songs[index]
def add_song(self, song: Song):
self.songs.append(song)
In the above’s example, we defined a property
which is actually optional, this is an equivalent of a getter
, the advantage of using properties is, since we didn’t define a setter
trying to set the songs
property will raise an exception rather than allowing the user to override the full attribute, making it “read-only”.
On the other hand, we defined a __len__
method, which actually returns the length of the list
of songs, then we can now use the next snippet.
my_playlist = Playlist()
album = "Soft Sounds"
artist = "Delta Sleep"
names = "Strongthany,Dotwork,Camp adventure,Sans Soleil"
for name in names.split(","):
my_playlist.add_song(Song(name, album, artist))
len(my_playlist)
# Out: 4
But we defined a __getitem__
as well, this method, “unlocks” the sequence[index]
syntax; therefore, we can access any of our songs by index.
my_playlist[0].name
# Out: Strongthany
my_playlist[-1]
# Out: Song(name='Sans Soleil', album='Soft Sounds', artist='Delta Sleep')
But that’s not all! It also “unlocks” slicing
making us able to use the following syntax:
my_playlist[1:3]
# Out: [Song(name='Dotwork', album='Soft Sounds', artist='Delta Sleep'), Song(name='Camp adventure', album='Soft Sounds', artist='Delta Sleep')]
my_playlist[::-1]
# Out: [Song(name='Sans Soleil', album='Soft Sounds', artist='Delta Sleep'), Song(name='Camp adventure', album='Soft Sounds', artist='Delta Sleep'), Song(name='Dotwork', album='Soft Sounds', artist='Delta Sleep'), Song(name='Strongthany', album='Soft Sounds', artist='Delta Sleep')]
This ability, allows us to iterate over our sequence
using a for
sentence, as if the Playlist
instances were built-in list
instances, this is because, we actually delegated all the sequence
implementation to our __songs
attribute.
for song in my_playlist:
print(song.name, "by ", song.artist)
# Out:
# Strongthany by Delta Sleep
# Dotwork by Delta Sleep
# Camp adventure by Delta Sleep
# Sans Soleil by Delta Sleep
Although, no, it is not all yet! Many of the Python built-in modules, require a sequence
to work, and, since our Playlist
class is technically a sequence
, we can use them with our instances.
for song in reversed(my_playlist):
print(song.name, "by ", song.artist)
# Out:
# Sans Soleil by Delta Sleep
# Camp adventure by Delta Sleep
# Dotwork by Delta Sleep
# Strongthany by Delta Sleep
from random import choice
choice(my_playlist)
# Out: Song(name='Dotwork', album='Soft Sounds', artist='Delta Sleep')
Conclusion
Python data model provides much flexibility, I haven’t worked (yet) with other language that actually allows you to emulate its built-in types, Go slices
are useful, yes, but you cannot create a custom structure that fulfills your program necessities and behave as a built-in slice
.
As you might have guessed, yes, you can emulate any other built-in type in Python.
For this article, I based myself on “Fluent Python, 2nd Edition” a great book written by Luciano Ramalho one of my biggest influences to become a Pythonista, please take a look into the book! You’ll learn many new things, just as I did.
Top comments (0)