DEV Community

Iman Tumorang
Iman Tumorang

Posted on • Originally published at blog.xendit.engineer on

How I was performing Row Locking for Read-Write Transactions in Postgres

It took me 2 days to attempt to solve this, but the solution is actually very simple and elegant.

Concurrent Acces by Xendit

Today's problem was complex with an obscure solution. I have been working on the refactoring of our core-payment system that handles the payment transaction to/from each bank that we support.

While developing this new core service, one of the services is related to generating a payment code, let’s call it the Payment Code Generator service. The payment_code needs to be random but also unique. In short, requires a counter to generate, which means we need to store the counter.

To simplify the scenario, let’s say I’m building a service that will generate a payment code.

In this service, I have 2 tables, master_counter and payment_code.

The master_counter table schema

BEGIN;
CREATE TABLE IF NOT EXISTS master_counter(
  id varchar(255) NOT NULL PRIMARY KEY, 
  user_id VARCHAR(255) NOT NULL, 
  counter bigint NOT NULL, 
  created_at timestamptz NOT NULL, 
  updated_at timestamptz NOT NULL, 
  deleted_at timestamptz, 
  CONSTRAINT master_counter_user_id_unique_idx UNIQUE (user_id)
);
COMMIT;
Enter fullscreen mode Exit fullscreen mode

And the payment_code schema

BEGIN;
CREATE TABLE IF NOT EXISTS payment_code(
  id varchar(255) NOT NULL PRIMARY KEY, 
  payment_code varchar(255) NOT NULL, 
  user_id varchar(255) NOT NULL, 
  created_at timestamptz NOT NULL, 
  updated_at timestamptz NOT NULL, 
  CONSTRAINT payment_code_unique_idx UNIQUE(payment_code, user_id)
);
COMMIT;
Enter fullscreen mode Exit fullscreen mode

In the master_counter, there is a counter column that will always increase based on received requests. If we draw into a diagram, the flow will look something like this:

Flow the Generating Payment Code

  1. When the user requests a new payment code, we will get its current counter value.
  2. The counter value is then increased by 1 and the algorithm is applied. (simply: hash the counter value to be a random character)
  3. After the new counter value is hashed, the hashed value is stored to the payment_code table with the unique constraint.
  4. The counter value is updated based on user id again to master_counter.
  5. The hashed counter value is returned to the user. This is a payment code that will be used by the user.

While there are a lot of algorithms used to achieve this, the above flow sufficiently represents the simplified over all process.

Problems Started

Initially, when written in SQL, the query looks similar to this:

BEGIN;
SELECT counter FROM master_counter;

// Do Addition to counter in application
// Apply Generator Logic in Application

INSERT INTO payment_code(payment_code) VALUES(generated_value_with_counter);

UPDATE master_counter SET counter=<new_value_from_application>;

COMMIT;
Enter fullscreen mode Exit fullscreen mode

However, I’m using GoLang, so there is native support for transactions. This can be seen in the example shown below:

tx,_:=db.Begin() // Begin the transaction

counter, err:= SelectCounter(tx) // Select the `counter`
if err != nil {
 tx.Rollback() // Rollback if any error occurred
}

counter++ // increase the counter
generatedCode:=ApplyAlgorithm(counter) // apply the algorithm

err:= SaveGeneratedCode(tx, generatedCode) // save the payment_code
if err != nil {
 tx.Rollback() // Rollback if any error occurred
}

err:=UpdateCounter(tx, counter) // update the counter to DB
if err != nil {
 tx.Rollback() // Rollback if any error occurred
}

tx.Commit() // commit the transaction
Enter fullscreen mode Exit fullscreen mode

This function works perfectly when used as a single request. However, when load testing with only 2 concurrent users produced the following errors:

pq: duplicate key value violates unique constraint \"my_unique_payment_code_index\"
Enter fullscreen mode Exit fullscreen mode

The “duplication error on payment_code” is happening because the generated payment_code must be unique.

While the application is behaving correctly by returning the error, from a business point of view, this is a problem. It should not be returning a duplication error, when the user is requesting a “ new ” payment_code that has never been used?

My first assumption is, if there’s a duplication, it means, that there’s a race condition here. The race condition occurred when I did the update and read on the counter value. The counter will be hashed for the payment code, and the payment code must be unique, but, because of the race condition, the application is reading the same value of the counter which is causing duplicate key violations.

In simple, the error occurs because we have met with a race condition where the counters are not updated when concurrent requests come in, causing duplication to occur when we attempt to create new payment codes in parallel requests.

First Attempt! — Using Isolation Level

Knowing this, I took to the internet to search and learn about the transaction, bring me to these articles:

I realized that there’s also an isolation level on the transaction. Reading all those articles made me remember my old days in college life. I remember I’ve learned this, but I had never found a case when I really needed to understand or apply this in practice.

From this research, I concluded that the isolation level on the transaction is not enough to lock the concurrent read-write-update processes in the database.

To address this, I added the isolation level to the application.

_, err := tx.ExecContext(ctx, "SET TRANSACTION ISOLATION LEVEL REPEATABLE READ")

if err != nil {
   return
}
Enter fullscreen mode Exit fullscreen mode

After this fix, I re-ran the application, which returned a new error:

pq: could not serialize access due to concurrent update
Enter fullscreen mode Exit fullscreen mode

This fix resulted in the avoiding the duplication error, but it also completely blocked concurrent requests. This is no different from having 1 concurrent request to a lot of concurrent requests.

Since from a business standpoint we need this system would be able to handle requests concurrently, I broadened my search to get advice from peers; colleagues, friends outside of work and even many Senior DBA from other companies.

Thanks to this crowdsourced effort of knowledge sharing, I realized, that the solution was row-level locking. Since I only need to read and update the counter value, we only need a row-level locking mechanism.

Second Attempt! — Using SELECT FOR UPDATE

Since I just want row-level locking, I didn’t need a strict isolation level because an isolation level is for table scope.

So my second try is:

  • Isolation level: READ COMMITTED (Postgres default isolation level)
  • Row-level locking: SELECT FOR UPDATE

The change is only adding the FOR UPDATE in my select query for my counter value. This new SQL query presents as follows:

BEGIN;
SELECT counter FROM master_counter **FOR UPDATE** ; // notice this

// Do Addition to counter in application
// Apply Generator Logic in Application

INSERT INTO payment_code(payment_code) VALUES(generated_value_with_counter);

UPDATE master_counter SET counter=<new_value_from_application>;

COMMIT;
Enter fullscreen mode Exit fullscreen mode

This worked successfully, and in the end all it took was adding the FOR UPDATE to solve the problem. While this may not be the right end state solution, for now at least, it solves our current problem.

I discovered that there is a lot of things that you need to understand about SELECT ... FOR UPDATE — I found this article most helpful http://shiroyasha.io/selecting-for-share-and-update-in-postgresql.html

It’s amazing that this issue could be solved with only a simple SELECT ... FOR UDPATE. Maybe later there will be a new issue, but we leave that for our future selves 😎


Top comments (0)