loading...
4Degrees

SQLAlchemy Table Relationships

mpraver1 profile image Moshe Praver ・4 min read

As a company that relies on data-driven intelligence and insights, 4Degrees naturally has a complex database. In a previous post we provided an introductory SQLAlchemy guide to help new interns and employees understand the basics of the SQLAlchemy ORM.

In this post we are going to build on that understanding and dive deeper into how multiple connected tables are mapped to python classes and how one might go about querying data that spans multiple tables.

Table Relationships

The most common table relationship is known as a 1:n relationship. As an example, a user may post to a blog and have multiple blog posts which are stored in the user_blog_post table. In order to accomplish this we define two python classes and add a column called user_id to the blog post table which serves as a link between the two tables.

class user(db.Model):
      id = db.Column(db.Integer, primary_key=True)
      username = db.Column(db.String)
      email = db.Column(db.String)

class user_blog_post(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String)
    body = db.Column(db.String)
        user_id = db.Column(db.Integer)

An astute reader may note that we could technically stop at this point. We have two tables and a column that links the two. While this is technically true, it leaves room for error. What if we accidentally created a user_blog_post table for a user that doesn’t exist?

To solve this problem, we introduce the main tool used in defining table relationships: the ForeignKey. The ForeignKey is an argument accepted by the Column object, which indicates that the value of this column must be constrained to an existing value in another table. In essence, if we intend for our two tables to be related, we ought to be sure that our user_id reference doesn’t refer to a record that doesn’t exist.

user_id = db.Column(db.Integer, ForeignKey('user.id'))

If we now tried to create a row in the user_blog_post table using a user_id that does not exist as an id value in the user table (eg. 55), we get an error and the row is not created:

psycopg2.errors.ForeignKeyViolation: insert or update on table "user_blog_post" violates foreign key constraint "user_blog_post_user_id_fkey"
DETAIL:  Key (user_id)=(55) is not present in table "user".

The existence of a ForeignKey in a table implies a many-to-one relationship between itself and its parent table. Put another way, if we choose to place the constrained user_id value in the user_blog_post class such that there must exist a matching id column in the user class, there can be multiple blog posts for each user, but only one user for each blog post.

This brings us to a second useful tool in SQLAlchemy for table relationships: the db.relationship() method. It is important to understand that this method is not truly necessary for defining table relationships. In fact, as we mentioned above, the only truly necessary item for a table relationship is the column that links two tables (ie. user_id).

Rather, the db.relationship() method leverages the implicit many-to-one relationship created by a ForeignKey column and exposes this linkage between the two tables to the mapped python classes:

blog_posts = db.relationship("user_blog_post", backref='user')

By placing this line within the user class, SQLAlchemy ORM establishes user.blog_posts and user_blog_posts.user attributes that can be accessed in our python environment. What’s more, our python environment understands the many-to-one relationship:

newBlogPost = user_blog_post(title='first', body='first post body', user_id=1)
print(newBlogPost.user)
 None
newUser = user(username='Moshe', email='moshe@4degrees.ai')
print(newUser.blog_posts)
 [ ]

Note, that the user table does not actually have a blog_post column and the user_blog_post table does not actually have a user column. This is clear in the SQLAlchemy code below and the SQL that it outputs:

newUser = user(username='Moshe', email='moshe@4degrees.ai')
newUser.blog_posts = [
    user_blog_post(title='first', body='first post body'),
    user_blog_post(title='second', body='second post body')
]
db.session.add(newUser)
db.session.commit()

SQL
INSERT INTO user (username, email) VALUES (?, ?)
('Moshe', 'moshe@4degrees.ai')
INSERT INTO user_blog_post (title, body, user_id) VALUES (?, ?, ?)
('first', 'first post body', 1)
INSERT INTO user_blog_post (title, body, user_id) VALUES (?, ?, ?)
('second', 'second post body', 1)

Querying Table Relationships

Now that we have an understanding of how to define new table relationships, let’s quickly dive into an exercise that will demonstrate how to query across multiple related tables. Suppose we want to email all users who have written a blog post which contain the words “machine learning” in the body of their blog posts. We will then send them all an invitation to an upcoming machine learning conference. How do we get this list?

The data we need is email in the user table, but the data we need to filter by is body in the user_blog_post table. In order to accomplish this we can use the Query.join() method:

theEmails = user.query.with_entities(user.email).join(user_blog_post).\
    filter(user_blog_post.body.like('%machine learning%')).all()

Interestingly, SQLAlchemy ORM implicitly knows how to join the user and user_blog_post tables, because there is only one ForeignKey between them. In a more complicated situation, that would not be the case and we would explicitly describe the ForeignKey relationship:

theEmails = user.query.with_entities(user.email).join(user_blog_post, user.id == user_blog_post.user_id).filter(user_blog_post.body.like('%machine learning%')).all()

4Degrees

We combine the ability to search your team's network easily and intuitively with automated data entry and intelligent alerts so you never miss the chance to move a relationship or deal forward.

Discussion

markdown guide