Safe update operation in PostgreSQL using SQLAlchemy

#python #postgres #database

If you are using the default isolation mode in PostgreSQL. When you try to update your database in your web application, ACID transaction will not safe you from race condition. if you run the code below with more than one process running. (which multi processes are very common for web servers)

from sqlalchemy import create_engine
from sqlalchemy.orm import Session
# setup
engine = create_engine(...)
session = Session(engine)
# read from the table Account
account = session.query(Account).get(1)
# modify the record, account is decimal
account.amount = account.amount + 100
session.commit()

Why is this not safe? This is because SELECT will not pose any locks in row level. It blocks only when update happens which it locks the row exclusively (FOR UPDATE/FOR NO KEY UPDATE). Race condition could happen in this scenario.

Transaction 1 (T1)	Transaction 2 (T2)
START
SELECT amount	START
amount += 100	SELECT amount
UPDATE amount	amount += 100
COMMIT
	UPDATE amount
	COMMIT `SAME value as T1`

Since T1 doesn't commit before T2 reads, so T2 will select the same value as T1 did. which eventually commits the same thing again.

So, how to prevent this situation in default isolation mode. (Read Committed mode)

Solution 1: Just don't read

The above snippet is implementing with an anti-pattern called
read-modify-write. One way to prevent that is to not read and change the value directly with the column value. Given read is not super necessary in this scenario.

# same session setting as above
session.query(Account).filter_by(id=1)\
     .update({"amount": Account.amount + 100})
session.commit()

Solution 2: Update Lock

There is some scenarios that you want to do a complicated modification and you just have to read first. In this case, you could use update lock when you read from the database. In SQLAlchemy, there is a method called with_for_update which locks the row you want to change with a FOR UPDATE lock. FOR UPDATE lock is not self compatible, another transactions have to wait until this transaction release the lock.

# same session setting as above
# locks the row that id = 1
account = session.query(Account).filter_by(id=1)\
     .with_for_update().one()
account.amount = account.amount + 100
session.commit()  # save and release the lock

Solution 3: Version tracking (optimistic locking)

If you want to rollback all other processes which run any updates on the same rows. You can add a version column in the table to track the updates on this row. Let say we have version v in some rows, when you wants to update the row, you will search the row along with version v and update the version to v + 1.

update account set version = v + 1, ... where version = v ...;

This is quite annoying to implement ourselves, Fortunately, most of the ORM library support version tracking. In SQLAlchemy, you can set the version column name in the mapper, so when you update, it will manage the version for you and if some processes work on the same row, it will raise an Exception.

class Account(Base):
    __tablename__ = "account"
    ...
    version = Column(Integer, nullable=False)

    __mapper_args__ = {"version_id_col": version}

def version_tracking(change):
    try:
        account = session.query(Account).get(1)
        account.amount = account.amount + change
        print_account(account, change)
        session.commit()
    except StaleDataError:
        print("someone has changed the account, plz retry.")
        # some actions...

All the solutions given above, assume we are using the Read Committed mode which is the default isolation model in Postgres. I also assume you are not using any aggregation, when you select the target rows. There are some more ways to prevent the same problem, such as using a stricter isolation mode so that the transactions will act like working serially. However, we will not go through this in this article.