No one should have to learn about transaction isolation levels.

Once upon a time, I was trying to fix a bug in a checkout system that would occasionally cause a limited item to be oversold. We didn’t have an appreciation yet for the wide world of race conditions so we started by just putting everything into a transaction. Transactions are good, right?

We still oversold the last item sometimes. The core problem was a “check for absence before creating an entry” pattern. Transactions don’t solve this at all. We needed a locking system. Ruby on Rails had really good support for SELECT FOR UPDATE so we used that liberally and hoped for the best.

We still oversold the last item sometimes. People got pretty upset, so after a year or two my boss told me it was time to actually fix it and I set aside a week to figure it out. Now, this process was complex enough that we needed to lock multiple things in the database before we could make changes. And of course, the “create” process had a near match in the “update” process that also needed to cooperate in the locking mechanism.

Oh and I forgot about the deadlocks. Oh man, the deadlocks. Wrapping transactions around everything really brings out the deadlocks. We implemented some kind of “retry on deadlock” failure handling and tried to forget about them. I suppose the forgetting part was successful, in the end.

Anyhow so now we have a process that tries to SELECT FOR UPDATE on multiple records in the database before it proceeds to do work. We really really really wanted this process to be high-throughput, so we tried to minimize the amount of time that various things were locked. We would lock one record, then run some validations, then lock a more important record, then run more validations, etc., all to minimize the mutex and maximize parallelism. That’s good, right?

No. No it’s bad. See, databases have an esoteric concept of transaction isolation levels that no one usually cares about because you just want transactions to be transactions and work transactionally because this stuff is hard enough already. But what does “transactionally” mean? Well the default level is “consistent read” which means “every query in the transaction will view the database as it existed during the first SELECT”.

Remember we were working with a “check for absence before creating an entry” pattern. If your first and most important query is “check for absence” then congratulations you’ve set your database point-in-time for consistent reads. Any time your process now takes to perform other work is time when parallel processes can be doing the exact same thing with the exact same “it’s all going to be fine” assumptions.

The solution was disappointing. We implemented a distributed mutex and accepted the loss of parallelism in checkout. It was the right trade.

DEV Community

No one should have to learn about transaction isolation levels.

Top comments (0)