DEV Community

yazn zamel
yazn zamel

Posted on

Rethinking ACID: An Inquiry Into Their Fundamental Properties

ACID

Defining ACID by "THE TRANSACTION" example

Image description

Banking systems relies on transaction for money going in and out , they rely on a method called "double entry" , which goes as following : when a user deposit a certain amount of cash a Transaction of addition (++) is made on the liability side indicating the bank owes the customer XX$ amount of money , also another transaction is made on the Assets side which indicate the bank now have more cash. The Assets and Liabilities amounts are stored in a separate account , At the end of the day to ensure accuracy and balance, the bank will subtract the total liabilities from the total assets. In a properly functioning system, this difference must equal zero. This means that every dollar (or other currency) added to the assets has a corresponding entry on the liabilities side, maintaining the balance.

Now, imagine a scenario where multiple failures occur during the processing of online transactions. Unlike physical transactions, which can be easily traced, a lost online transaction can vanish without a trace, leaving no physical record behind. This creates a critical challenge for the bank, as tracking and resolving these issues becomes difficult, potentially leading to significant financial losses and undermining the trust of customers."

Transaction : a series of queries written by a developer to execute a certain logic. The transaction ends once "COMMIT" word is present.

Image description

Atomicity

This concept came from the actual atom concept that the atom can not be divided (at least at the past time) , and they applied the same concept on the a database transaction.
Physical atom properties :

All-or-Nothing Nature:
  • Atom: An atom is the smallest unit of an element that can't be divided without losing its fundamental properties. If an atom is split, it no longer retains the characteristics of the original element.

  • Transaction: A transaction in a database must be fully completed or not at all. If any part of the transaction fails, the entire transaction is rolled back to ensure nothing partial remains.

Integrity
  • Atom: Integrity in an atom means it maintains the specific identity of the element (like the number of protons) that defines it. It remains consistent with its element’s characteristics unless it undergoes a fundamental change.

  • Transaction: Integrity in a transaction ensures the database remains consistent and correct. All changes are applied together to maintain the database’s accuracy. If a transaction fails, it prevents any partial, inconsistent states by rolling back.

Jack : why atomicity , why don't we just keep things simple ?

Author : Since you asked, let's consider a straightforward scenario. Imagine that at the end of the month, your company sends your expected salary to your bank account. Now, suppose the bank, responding to your complaint about things being too complex, decides to simplify by hosting everything on a single database but fails to apply the Atomic property. During the transaction from the company's account to yours, the system crashes and needs to be restarted. Unfortunately, the first part of the transaction—where the money is withdrawn from the company's account—completes successfully, but just as the system is about to deposit the money into your account, it crashes!!

Author : So now what ? do you want to keep it simple ? does it sound simple now to you when you have bills , family , loan ?

yeah man , that's why we care about every detail and not make things complex.

How we as software engineers , Developers or people who care manage such kind of issues ?

When such failure happens , we will have to rollback the partial successfully executed queries (which is the withdrawn from company account) to ensure the atomic property.

what if the transaction was very long that consist of 10s or 100 of queries ? well it's better that we stay safe in such situation. However there is a method in postgresql called 2PC (Two Phase Commit) that can be used in distributed transactions. With 2PC, a transaction is prepared and partially committed in a way that it can be safely completed or rolled back later. This process involves:

Consistency (The sacrificed property)

Jack : now what is this consistency ? we already solved everything with the Atomicity ....

Author : you wish , lemme ask you this : if a transaction is committed with a new change , will another transaction running at the same time or after it in a few see that change ?

Jack : huh ! , just go ahead with your example man.

Alright , following up on the company salary transfer , imagine it's the end of the month and you have promised your son you will get him the new PS5 , once your salary is transferred. Now probably your son is waiting for your salary more than you do. You opened the bank application and still can't find it there , you open it again and there it is !!!. Now you went with him to buy him the ps5 and while you are at the cashier you get the following message "your transaction is declined , due to insufficient balance" , huh ???

Look what happened :

Image description

A lot of databases gave up on consistency as a trade-off for performance.

When looking at Consistency , you should consider two points :

  1. consistency of read operation
  2. consistency of referential relationship

consistency of read operation

This type of consistency is usually tied to type of synchronization between databases in Leader-Follower architecture .

The inconsistency arise due to the type of synchronization between follower - leader database , options comes as follow :

Image description

Isolation

Isolation :

The capability of a database system to enable multiple transactions to access the same data simultaneously without causing interference with one another.

Isolation is like whether a database want to be social , but it will be vulnerable (not security wise though) and it's somehow something we can configure , it can very social , moderately social , not social.

Very Social (Lower isolation) : Increases the ability of many users to access the same data at the same time

Moderately Social : Somehow in between , wait till your read the third point....

Not Social (Lower Isolation) : Decrease the ability of many users to access the same data (lower concurrency) --> needs more system resources (some stakeholders are fine with it though).

Let's go back to jack , he makes us explain it better.

Image description

Jack : i will be chill with this one , let's see what you got now . How can this property save my bank account ?

Author : thanks man , this time i want us to blend to the real world scenarios , in a bank account you have a credit card with certain balance if this credit card goes in negative under 10000$ you either lose some credit card score or you get charged some fee.

How isolation could saves you from unnecessary fees :

Dirty Read : when two transactions (t1 , t2) are accessing the same data , however one of those transaction reads something that is not really written to the database , but promised to be there .

Dirty read : because it read a dirty record

Image description

Nonrepeatable read : when two transactions (t1 , t2) are happening the same time , one of the transactions read a value twice (or more) in the same transaction but one or more reads produces different values.

Image description

Phantom read : when two transactions (t1 , t2) are happening the same time , one of the transactions insert a row which will be found in the second transaction as it is a value from the void.

You may ask what is the problem with phantom read , the issue resides that while i am doing a transaction the data should be consistent.

Image description

Durability

Empower a Commited transaction to survive a system failure. means , i commit - you write to disk.

The question is "isn't this how databases operate ? Once i commit it writes to the disk ?"

The answer to this question could vary depending on the database engine you are using , since some database have been implemented to be highly performing and reduce latency which in fact makes it write to your backend server cache or to the cache of the database server and then writing the data in batches to the disk. This doesn't only make writing to the database fast but in fact makes reading from the database fast as well since the recent written data is still cached.

So the key takeaways from this property is to always check the configurations available for the database and tune it based on your needs and application requirements.

Critical applications where data loss tolerance is near zero requires you to flush committed transactions from the database engine to the disk often.

Top comments (0)