OverComing Weak Consistency using Cache

#redis #database #node

Recently I had faced off with weak consistency for the first time in my professional career(< 2 yrs). I didn’t know that the DB which we were using in our PROD env was weakly consistent when reads are done using secondary replica. (how silly of me 😓)

Weak/Eventual Consistency is a guarantee that when an update is made in a distributed database, that update will eventually be reflected in all nodes that store the data, resulting in the same response every time the data is queried.

What was I doing?

Consider two tables

Users table, having the below columns

id name age
Relationship table, which has the below columns

contactId id

id is the primary key of the users table

Usually the relationship is created whenever some info is to be persisted for a particular user in the users table. When there is no relationship existing, a new DB row is created. And a new entry is saved in the relationship table mapping to the new user entry

When, new details are to be saved, the users id is fetched from the relationship table using the contactId and the details are persisted.

What was problem?

Consider this scenario, when someone lands on a webpage, we will have some details like ip, timezone, email which can be auto collected.

When trying to save the first field, email we check the Relationship table, if there is no data present, a new row is created in the users table and add a new relationship entry is also created.

When the next update of ip is to be done we check the Relationship table, and get the id and do the updates.

function getRelationship(contactId) {}
function createRelationship(contactId, user) {}
function createUser() {}
function updateUser(relationship, dataToUpdate) {}

function forEachUpdate(contactId, dataToUpdate) {
    relationship = getRelationship(contactId);
    if(!relationship) {
        newUser = createUser()
        relationship = createRelationship(contactId, user);
    }
    updateUser(relationship, dataToUpdate);
}

// main
forEachUpdate('sasi@hotmail.com', {email: 'sasi@hotmail.com'})
forEachUpdate('sasi@hotmail.com', {ip: '127.0.0.1'})

The problem happens when we don’t find a relationship when checking after a few milliseconds of delay. But how? A new entry was added previously? What happened was, The writes are done in a primary replica, the reads goes to secondary replica. Which meant that the reads and writes were going to different servers which will not have the exact same data at any given time. This is called replication lag. Because of this duplicate user entries were created in the users table.

What is the intermediate fix?

Pointing the read queries to primary solved the issue. But, there is a reason why we have read replicas so that the primary will not be overloaded with all the read queries and respond quickly for writes. So this was not a permanent fix. For a permanent fix I had to switch to a different database, for which I didn’t have the time.

Solution:

Every time, I have to update some data it’s done sequentially. So, whenever a relationship is created, a copy of it is stored in Redis(an in-memory cache) too.

The sudo-code will look something like the below.

function saveInRedis(contactId, user) {}
function saveInDB(contactId, user) {}
function createRelationship(contactId, user) {
    saveInDB(contactId, user);
    saveInRedis(contactId, user);
}

Therefore, before every update Redis is checked first to see if there is any relationship, if there is none, then the DB is checked

The function to getRelationship has been transformed like the below

function getDataFromRedis(contactId) {}
function getRelationship(contactId) {
    relationShip = getDataFromRedis(contactId);
    if(!relationShip) {
        // check in DB
    }
    return relationShip
}

How I Cut 22.3 Seconds Off an API Call with Sentry 👀

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

DEV Community

OverComing Weak Consistency using Cache

How I Cut 22.3 Seconds Off an API Call with Sentry 👀

Top comments (0)

Read next

How to Choose the Right Database for Your Full Stack Project

How to deploy Google Cloud Functions with PNPM workspaces

Aurora Limitless - Reference Tables

Command Query Separation (CQS): A Guide to Better Software Design