DEV Community

Vinay Arvind Badgujar
Vinay Arvind Badgujar

Posted on

Understanding Serialization Anomalies from First Principles

Yeah so what exactly is a serialization anomaly in the context of database transactions?

Suppose we have two transactions, T1 and T2, running concurrently. They both read and write some of the same rows.

After both transactions finish, the database reaches some final state.

As we all know, if transactions are allowed to freely interleave their reads and writes, we can end up with anomalies like lost updates, dirty reads, non-repeatable reads, etc.

One way databases reason about correctness is through serializability.

The idea is simple:

  • Run T1 then T2 (serial execution) → Final State 1
  • Run T2 then T1 (serial execution) → Final State 2

These two serial executions don't necessarily produce the same final state.

Now suppose T1 and T2 actually execute concurrently. If the final state produced by this concurrent execution is equivalent to either of the valid serial executions, then the schedule is serializable.

If the concurrent execution produces a state that cannot be obtained by any serial ordering of those transactions, then you've hit a serialization anomaly (or equivalently, the schedule is not serializable).

Example

Suppose we have two rows:

A = 100
B = 100
Enter fullscreen mode Exit fullscreen mode

T1

read A
read B
write A = A + B
Enter fullscreen mode Exit fullscreen mode

T2

read A
read B
write B = A + B
Enter fullscreen mode Exit fullscreen mode

At first glance, you might expect:

  • T1 updates A to 200
  • T2 updates B to 200

giving the final state:

A = 200
B = 200
Enter fullscreen mode Exit fullscreen mode

Let's compare that with the serial executions.

T1 → T2

T1:
reads A=100, B=100
writes A=200

State:
A=200
B=100

T2:
reads A=200, B=100
writes B=300

Final:
A=200
B=300
Enter fullscreen mode Exit fullscreen mode

T2 → T1

T2:
reads A=100, B=100
writes B=200

State:
A=100
B=200

T1:
reads A=100, B=200
writes A=300

Final:
A=300
B=200
Enter fullscreen mode Exit fullscreen mode

Neither serial execution produces (A=200, B=200).

That result is only possible if both transactions read the original values (100, 100) before either transaction committed.

That's the serialization anomaly.

Each transaction made a decision based on a snapshot that the other transaction later invalidated. The resulting database state cannot be explained by any serial execution of T1 and T2, so the concurrent schedule is not serializable.

Top comments (0)