CodingBlocks

Designing Data-Intensive Applications – Lost Updates and Write Skew

Mar 20 '23

What are lost updates, and what can we do about them? Maybe we don’t do anything and accept the write skew? Also, Allen has sharp ears, Outlaw’s gort blah spotterfiles, and Joe is just thinking about breakfast.

The full show notes for this episode are available at https://www.codingblocks.net/episode206.

News

Thank you for the amazing reviews!
- iTunes: JomilyAnv
Want to help us out? Leave us a review.

Great book!

Preventing Lost Updates

Last episode we talked about weak isolation, committed reads, and snapshot isolation
There is one major problem we didn’t discuss called “The Lost Update Problem”
Consider a read-modify-write transaction, now imagine two of them happening at the same time
Even with snapshot isolation, it’s possible that read can happen for transaction A before B, but the write for A happens first
- Incrementing/Decrementing values (counters, bank accounts)
- Updating complex values (JSON for example)
- CMS updates that send the full page as an update
Solutions:
- Atomic Writes – Some databases support atomic updates that effectively combine the read and write
  - Cursor Stability – locking the read object until the update is performed
  - Single Threading – Force all atomic operations to happen serially through a single thread
- Explicit Locking
  - The application can be responsible for explicitly locking objects, placing responsibility in the devs hands
  - This makes sense in certain situations – imagine a multiplayer game where multiple players can move a shared object. It’s not enough to lock the data and then apply both updates in order since the shared game world can react. (ie: showing that the item is in use)

Detecting Lost Updates

Locks can be tricky, what if we reused the snapshot mechanism we discussed before?
We’re already keeping a record of the last transactionId to modify our data, and we know our current transactionId. What if we just failed any updates where our current transaction id was less than the transactionId of the last write to our data?
This allows for naive application code, but also gives you fewer options…retry or give up
Note: MySQL’s InnoDB’s Repeatable Read feature does not support this, so some argue it doesn’t qualify as snapshot isolation

What if you didn’t have transactions?

If you didn’t have transactions, let alone a snapshot number, you could get similar behavior by doing a compare-and-set
Example: update account set balance = 10 where balance = 9 and id = ABC
This works best in simple databases that support atomic updates, but not great with snapshot isolation
Note: it’s up to the application code to check that updates were successful – Updating 0 records is not an error

Conflict resolution and replication

We haven’t talked much about replicas lately, how do we handle lost updates when we have multiple copies of data on multiple nodes?
Compare-and-Set strategies and locking strategies assume a single up-to-date copy of the data….uh oh
The options are limited here, so the strategy is to accept the writes and have an application process to decide what to do
- Merge: Some operations, like incrementing a counter, can be safely merged. Riak has special datatypes for these
- Last Write Wins: This is a common solution. It’s simple but inaccurate. Also the most common solution.

Write Skew and Phantoms

Write skew – when a race condition occurs that allows writes to different records to take place at the same time that violates a state constraint
- The example given in the book is the on-call doctor rotation
- If one record had been modified after another record’s transaction had been completed, the race condition would not have taken place
- write-skew is a generalization of the lost update problem
Preventing write-skew
- Atomic single-object locks won’t work because there’s more than one object being updated
- Snapshot isolation also doesn’t work in many implementations – SQL Server, PostgreSQL, Oracle, and MySQL won’t prevent write skew
  - Requires true serializable isolation
- Most databases don’t allow you to create constraints on multiple objects but you may be able to work around this using triggers or materialized views as your constraint
- They mention if you can’t use serializable isolation, your next best option may be to lock the rows for an update in a transaction meaning nothing else can access them while the transaction is open
Phantoms causing write skew
- Pattern
  - The query for some business requirement – ie there’s more than one doctor on call
  - The application decides what to do with the results from the query
  - If the application decides to go forward with the change, then an INSERT, UPDATE, or DELETE operation will occur that would change the outcome of the previous step’s Application decision
    - They mention the steps could occur in different orders, for instance, you could do the write operation first and then check to make sure it didn’t violate the business constraint
  - In the case of checking for records that meet some condition, you could do a SELECT FOR UPDATE and lock those rows
  - In the case that you’re querying for a condition by checking on records to exist, if they don’t exist there’s nothing to lock, so the SELECT FOR UPDATE won’t work and you get a phantom write – a write in one transaction changes the search result of a query in another transaction
Snapshot isolation avoids phantoms in read-only queries, but can’t stop them in read-write transactions

Materializing conflicts

The problem we mentioned with phantom is there’d no record/object to lock because it doesn’t exist
What if you were to have a set of records that could be used for locking to alleviate the phantom writes?
- Create records for every possible combination of conflicting events and only use those to lock when doing a write
  - “materializing conflicts” because you’re taking the phantom writes and turning them into lock records that will prevent those conflicts
    - This can be difficult and prone to errors trying to create all the combinations of locks AND this is a nasty leakage of your storage into your application
      - Should be a last resort

Resources We Like

The 12 Factor App and Google Cloud (cloud.google.com)

Tip of the Week

Docker’s Buildkit is their backend builder that replaces the “legacy” builder by adding new non-backward compatible functionality. The way you enable buildkit is a little awkward, either passing flags or setting variables as well as enabling the features per Dockerfile, but it’s worth it! One of the cool features is the “mount” flag that you can pass as part of a RUN statement to bring in files that are not persisted past that layer. This is great for efficiency and security. The “cache” type is great for utilizing Docker’s cache to save time in future builds. The “bind” type is nice for mounting files you only need temporarily. like source code in for a compiled language. The “secret” is great for temporarily bringing in environment variables without persisting them. Type “ssh” is similar to “secret”, but for sharing ssh keys. Finally “tmpfs” is similar to swap memory, using an in-memory file system that’s nice for temporarily storing data in primary memory as a file that doesn’t need to be persisted. (github.com)
Did you know Google has a Google Cloud Architecture diagramming tool? It’s free and easy to use so give it a shot! (cloud.google.com)
ChatGTP has an app for slack. It’s designed to deliver instant conversation summaries, research tools, and writing assistance. Is this the end of scrolling through hundreds of messages to catch up on whatever is happening? /chatgpt summarize (salesforce.com)
Have you heard about ephemeral containers? It’s a convenient way to spin up temporary containers that let you inspect files in a pod and do other debugging activities. Great for, well, debugging! (kubernetes.io)

Episode source