DEV Community

Chanond Wongpiya
Chanond Wongpiya

Posted on

πŸš€ Scaling to 93K RPM: Moving Quota Management from SQL to Redis

Picture this: your favorite artist just announced their upcoming concert near you. You eagerly go online to place an order for tickets only to get error response and a very slow experience on the e-ticketing platform. Ultimately, customers get angry for not being able to purchase their favorite artist's concert.

So.....what happened?

What happen was the DB's CPU reached 100% very rapidly causing the following requests to be slow or dropping off due to timeout. Upon further investigation, it was a DB lock on a very hot row which was used to count the quota for the event. We have a table which keeps track of the purchased tickets for events in order to prevent orders from going over the quota.

eventId showtimeId quota reserved
1 1 15,000 10,000

This particular row keeps getting query and update very rapidly, which causes locks and eventually causes the DB's CPU to reach 100% since each DB process is waiting to be executed.

The solution?

Enters Redis to the rescue. Redis keeps data in RAM which is faster than DB's IO to HDD....Sure Redis is faster, but consider this.

function reserveTicket(showtimeId, amount) {
    const reserved = parseInt(redis.get('{key}'));
    if (reserved < amount) {
        throw error("not enough tickets");
    }
    .....
    // Order create logic
    ....
    redis.incrBy('{key}', amount);
    ....
}
Enter fullscreen mode Exit fullscreen mode

What wrong here? What's wrong here is that by the time we reach redis.incrBy, the amount might already reached or surpassed the actual quota. Upon a bit more research, we found something called Lua script. Lua script is a script which executes on Redis. This will ensure the quota checking and increment are atomic. The reserve logic became this.

const reserveLuaScript = `
    local key = KEYS[1]
    local incomingQty = ARGV[1]
    local quota = ARGV[2]
    local currentQty = redis.call("GET", key)

    if not tonumber(incomingQty) or (quota and not tonumber(quota)) then
        return -1  -- Error code indicating invalid input
    end

    if not quota or (currentQty and tonumber(currentQty) + tonumber(incomingQty) <= tonumber(quota)) then
        -- increase the redis qty inside key
        redis.call("INCRBY", key, tonumber(incomingQty))
        return 1
    else
        return 0
    end
`;

function reserveTicket(showtimeId, amount, currentQuota) {
    db.transaction(async (tx) => {
        ....
        // order create logic here
        ....
        const result = redis.eval(reserveLuaScript, {
            keys: ['{key}'],
            arguments: [String(incomingQty), String(quota)],
        });
        if (result !== 1) {
            throw error('failed');
        }
    });
    ..... other logic here
}
Enter fullscreen mode Exit fullscreen mode

This seems to be the end of it right? Well, not exactly. Since Redis is a cache, this means that the data there is not persistent. We decided to sync back to source of truth (the database). We turned it into a "passive observer" with a background worker polling Redis every 5 seconds and updates the count to the DB. This gives us eventual consistency that doesn't kill performance.

Result?

We were able to accept from 5,000 orders per minute to 93,240 orders per minute. Our DB load has decreased significantly. And ultimately, we can stop praying for each event to be a success.

Key takeaway

  1. It is important to identify why a certain behavior happens. In this case, a major increase in DB lock during every ticket-selling events.
  2. Your system design should reflect the actual usage and traffic of behavior of the users. It is not wrong to have a "hot" row if the traffic gradually comes, but, in an e-ticketing system, this is a bad design.

Found this helpful? Feel free to suggest for a topic you want me to deep dive on!

Top comments (0)