When you need to describe a "data", you commonly use classes (instead of dict
) – and Python offers a very elegant way to manage this.
A Data Objects represents data which can be saved inside a database. This concept is in the heart of SQLAlchemy, but as the name should be obvious: it's for SQL Database (in general). Today, there are now document databases too (like MongoDB, ArangoDB, RethinkDB that I love so much, or even PostgreSQL). So, a "data" is like a "structured and typed document" that you save "as is". That's not the same paradigm, not the same controls. There are advantages and disadvantages, but we won't debate that here.
The topic, today, is that Python can help a lot to define our "data classes" in a "generic" way with controls and initialization.
At this time, you generally use a "pure plain object"… this will change.
For example, to describe a "user":
class User:
username = ""
email = ""
password = ""
level = 0
Of course, you need a bit more controls. You require a constructor to initialize the properties, maybe a setter to make the password and to avoid it to be represented in JSON...
And there comes "dataclass", a very cool built-in package in Python.
Dataclasses?
If you read the corresponding documentation page, you'll discover a treasure. This package offers easy to use, but powerful, decorator and functions to manage a "data object" by defining a class.
A dataclass defines constructor, operator, and managed fields if they have got annotations. Either the property is only for internal use. This helps a lot to manage what is part of the data, and what is not.
Let me show you the very basic way to use it, using annotations instead of values:
from dataclasses import dataclass
@dataclass
class User:
username: str
email: str
password: str
level: int
example = "foo" # This is not a field, no annotation
Now, the class has got
__init__
function to create the object with keywords argument, a__repr__()
method, and a__eq__
method that overrides the equal operator.
There are more things to see later, but let's check the usage :
user1 = User(
username="John",
email="me@test.com",
password="foobar",
level=0,
)
user2 = User(
username="John",
email="me2@test.com",
password="foobar",
level=0,
)
# show a nice object representation
print(user1)
# try comparison
print(user1 == user2) # False, email differs
OK, that's nice, but we can do more… a lot more!
Dataclasses fields
Let's imagine we want to create a user without setting its level, because we define that the level should be "0" by default.
The issue is that the __init__
constructor defines it. So at this time we must provide a value when we build the object.
The dataclasses
packages provides a function named field
that will help a lot to ease the development.
from dataclasses import dataclass, field
@dataclass
class User:
username: str
email: str
password: str
level: int = field(default=0) # set field as optional
# test
user1 = User(
username="joe",
email="me@test.com",
password="123456") # no level provided
# but the level is set to 0
print(user1.level)
# >> 0
And that's not the end. We can do a lot of things.
Not always need for constructor, use "post init"
Sometimes you want to make something when an object is instantiated. So, the first reaction is to create a constructor. But, of course, here, the dataclass
decorator provides one and it's well made to manage default values.
That's why you can create a __post_init__
method. It is called right after the constructor.
For example, let's make a check on the password length.
""" User management """
from dataclasses import dataclass, field
@dataclass
class User:
"""A user object"""
username: str
email: str
password: str
level: int = field(default=0) # set field as optional
def __post_init__(self):
if len(self.password.strip()) < 8:
raise ValueError("Password must be at least 8 characters long")
That's enough to make some validation.
Of course, you can manage this with setter or getter, but I only show you an example.
When dataclass becomes central
You may think that it is only a gadget, a "too simple management" of class that represents data.
Now, it's time to see when simplicity provides controls.
I will present how this may help you to create an API with Quart and a bit of Quart Schema. If you already use Flask, that will not be a problem as Quart is a "fork" which only make it asynchronous.
Before the use of this, you probably do something like this:
@api.route("/user")
async def user():
user = User(
username="John",
email="me@test.com",
password="foo"
)
return jsonify({
"username": user.username,
"email": user.email,
"level": user.level,
})
Of course, you probably created methods to transform the data to JSON, or to dict
. But now, with dataclass + quart-schema, it's way more explicit.
First, you must declare that the application is encapsulated to a Schema:
from quart import Quart
from quart_schema import QuartSchema, validate_response
api = Quart(__name__)
QuartSchema(api)
Then, you are able to return an object, no need to jsonify
or to manage transformation!
@api.route("/user")
async def user() -> User:
return User(
username="John",
email="me@test.com",
password="foo"
)
That works:
http -b :5000/user
{
"email": "me@test.com",
"level": 0,
"password": "12345678",
"username": "John"
}
OK, but… The password…
Hide fields
The problem here is of course that the password is sent to the response.
Of course, you will hash the password in database and you must never send back the password
There are plenty of possibilities, but I will propose you one that I prefer.
In "my" design view, there are several kind of data to manage:
- a user, that is the representation of what I can show to everybody
- a data user, that is the representation of what I manage in database
- some specific user representation for "login" or "registration" process
So, here is my example:
@dataclass
class User:
"""User data"""
username: str
email: str
level: int = field(default=0, kw_only=True)
@dataclass
class DataUser(User):
"""Data class for User in database"""
password: str = field(kw_only=True)
def hash_password(self):
"""Hash password with SHA1"""
self.password = sha1(self.password.encode()).hexdigest()
The DataUser
class inherits fields from User
. We must force kw_only
to ensure that field with default values doesn't interfere with derived class fields.
@app.route("/user/<email>")
async def get_user(email: str) -> User:
"""Get a user"""
user = db("users").table("users").get(email)
del user["password"]
return User(**user)
I want to insist: I'm SURE that the password will never be sent back to the response, because the
User()
construction will raise an exception if I provide the password in argument.
And to save a user:
@app.route("/user", methods=["POST"])
async def create_user():
"""Create a new user"""
sent = await request.json
user = DataUser(**sent)
user.hash_password()
res = db("app").table("users").insert(asdict(user))
# check errors... then
del res["password"]
return User(**res)
asdict()
is taken from the dataclasses
package, it builds a complete dictionary from your dataclass. So it's easy to use with a document database like Mongo or RethinkDB.
Note that using
validate_request
andvalidate_response
from Quart Schema simplies a lot the method. For example:
@app.route("/user", methods=["POST"])
@validate_request(DataUser)
async def create_user(data: DataUser) -> User:
"""Create a new user"""
data.hash_password()
res = db("app").table("users").insert(asdict(data))
# check errors... then
del res["password"]
return User(**res)
Last words
Anyway, what I hope you to understand is the interest of using dataclass
and the fields, field, asdict
and other functions to make your data structures easy to use and to manage.
Top comments (2)
Hi...do u have experienced in Machine learning programming?
I need to trained my dataset and get results over multiple supervised Machine learning algorithms.
Can u help me in the above context?
Hi, yes, I did many models with TensorFlow and Keras.
But I'm a bit busy to give large help at this time. You can probably contact me instead of using article comments ?