loading...

SQLAlchemy Cascading Deletes

zchtodd profile image zchtodd ・4 min read

I thought I’d do a quick write up of something that I’ve found confusing for a while now, but never taken the time to look into. Anyone who works with databases has probably heard of foreign key constraints — used wisely, they keep us from making a total hash of our database. Using the classic example of artists and songs, foreign key constraints can help us make sure our song table isn’t littered with mysterious orphans that have no associated artist.

When deleting an artist we’d usually like to have all of that artist’s songs deleted from the song table simultaneously. This is where the ON DELETE CASCADE option of foreign key constraints comes into play. Given that option, most databases will take care of “cascading” the parent deletion down to the children. Here’s an example from PostgreSQL.

CREATE TABLE artist (
        id SERIAL NOT NULL, 
        PRIMARY KEY (id)
)

CREATE TABLE song (
        id SERIAL NOT NULL, 
        artist_id INTEGER, 
        PRIMARY KEY (id), 
        FOREIGN KEY(artist_id) REFERENCES artist (id) ON DELETE cascade
)

How does this translate into SQLAlchemy land? This is where I had some faulty assumptions.

from sqlalchemy import create_engine, Column, Integer, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship


Base = declarative_base()


class Artist(Base):
    __tablename__ = "artist"

    id = Column(Integer, primary_key=True)
    songs = relationship("Song", cascade="all, delete")


class Song(Base):
    __tablename__ = "song"

    id = Column(Integer, primary_key=True)
    artist_id = Column(Integer, ForeignKey("artist.id"))


if __name__ == "__main__":
    uri = "postgresql://pguser:pguser@localhost/test"
    engine = create_engine(uri, echo=True)

    Base.metadata.create_all(engine)

    Session = sessionmaker(bind=engine)
    session = Session()

    artist = Artist()
    artist.songs.append(Song())

    session.add(artist)
    session.commit()

    session.delete(artist)
    session.commit()

My assumption was that adding the cascade argument to the relationship resulted in the same schema as above. Unfortunately, that’s not… true. To be fair, the SQLAlchemy docs are pretty clear on this point. Basically, what’s happening is that the ORM is managing deletions on your behalf if the delete option is specified on the relationship cascade argument.

If we take a look at the SQL that’s emitted during the above session, this becomes clearer.

2018-06-23 15:16:17,868 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-06-23 15:16:17,870 INFO sqlalchemy.engine.base.Engine INSERT INTO artist DEFAULT VALUES RETURNING artist.id
2018-06-23 15:16:17,870 INFO sqlalchemy.engine.base.Engine {}
2018-06-23 15:16:17,873 INFO sqlalchemy.engine.base.Engine INSERT INTO song (artist_id) VALUES (%(artist_id)s) RETURNING song.id
2018-06-23 15:16:17,874 INFO sqlalchemy.engine.base.Engine {'artist_id': 3}
2018-06-23 15:16:17,880 INFO sqlalchemy.engine.base.Engine COMMIT
2018-06-23 15:16:17,887 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-06-23 15:16:17,888 INFO sqlalchemy.engine.base.Engine SELECT artist.id AS artist_id 
FROM artist 
WHERE artist.id = %(param_1)s
2018-06-23 15:16:17,888 INFO sqlalchemy.engine.base.Engine {'param_1': 3}
2018-06-23 15:16:17,893 INFO sqlalchemy.engine.base.Engine SELECT song.id AS song_id, song.artist_id AS song_artist_id 
FROM song 
WHERE %(param_1)s = song.artist_id
2018-06-23 15:16:17,893 INFO sqlalchemy.engine.base.Engine {'param_1': 3}
2018-06-23 15:16:17,898 INFO sqlalchemy.engine.base.Engine DELETE FROM song WHERE song.id = %(id)s
2018-06-23 15:16:17,898 INFO sqlalchemy.engine.base.Engine {'id': 3}
2018-06-23 15:16:17,901 INFO sqlalchemy.engine.base.Engine DELETE FROM artist WHERE artist.id = %(id)s
2018-06-23 15:16:17,901 INFO sqlalchemy.engine.base.Engine {'id': 3}
2018-06-23 15:16:17,903 INFO sqlalchemy.engine.base.Engine COMMIT

SQLAlchemy is using its knowledge of the Song and Artist relationship to first delete all songs related to an artist before attempting to delete from the artist table.

This works fine, but gives us a couple of problems.

First, this will only work in the context of the SQLAlchemy ORM. The database still has the foreign key constraint, so deleting via raw SQL from the artist table is going to fail if there are related song rows. Secondly, this is a simple example, but real relationships could have several levels to them, and SQLAlchemy will end up issuing a blizzard of queries in some cases. This is much slower than allowing the database to enforce the relationship internally.

To have SQLAlchemy create the same schema we originally saw, we can define the foreign key as follows:

artist_id = Column(Integer, ForeignKey("artist.id", ondelete="cascade"))

The interesting thing here is what happens if we define “cascade” on the foreign key but not on the parent relationship. When deleting artists through the ORM, the ON DELETE CASCADE inside the database is ignored, and the ORM continues to handle deletions itself. The default behavior in this case is to set artist_id to NULL on any records whose parent have been deleted.

Defining “cascade” in both places results in the ORM performing separate deletes on the song and artist tables.

If we want to rely on the database to perform these cascading deletions, we need to use the passive_deletes argument as follows:

songs = relationship("Song", cascade="all, delete", passive_deletes=True)

I believe the reasoning behind this behavior is that relying on the database to cascade changes might not work in some cases, if for example using SQLite, where foreign key constraints are off by default.

Posted on by:

zchtodd profile

zchtodd

@zchtodd

I'm a software developer that loves to write, make cool little apps, and help others learn about programming!

Discussion

pic
Editor guide
 

So just the two, right?

artist_id = Column(Integer, ForeignKey("artist.id", ondelete="cascade"))
songs = relationship("Song", cascade="all, delete", passive_deletes=True)

with passive_deletes=True set, do you need cascade="all, delete"? It sounds like passive_deletes says "skip doing what I do, defer to DB", so my impression is we can remove cascades?

 

I'd love to see how you handle many-to-many relationships with association tables such as "users_roles" where each user can have many roles, and vice versa. :)

 

Thanks a lot for this clear content! it helped me a lot!