DEV Community

Hitender Shekhawat
Hitender Shekhawat

Posted on

The 10-Minute Race: Scaling the "Cancel Order" Button to 100K+ Requests Per Second

We've all been there. You order a pint of ice cream on an ultra-fast delivery app like Flipkart Minutes. The app promises it'll arrive in 10 minutes.

But what if 10 minutes pass, your ice cream is stuck in traffic, and you want to cancel? The app should give you a "Cancel Order" button.

Now look at it from the driver's perspective. If they just arrived at your doorstep, you shouldn't be able to cancel the order at the last second and get free ice cream. The button must vanish the exact millisecond the driver arrives.

Handling this for one person is easy. But how do you build this when 60,000 people are ordering at the same time, flooding your system with 100,000+ requests every single second?

Let's dive in.


1. The Context: A Tale of Two Pages

To understand the scale, we have to look at how customers actually behave on the app:

  • The Map Viewers — Most users sit on the map screen watching the delivery icon move. The UI automatically refreshes every 30 seconds. Over a 15-minute delivery, that's 30 hits per user.
  • The Chat Viewers — Some users open the customer support chat. The app checks cancellation eligibility once when the page opens. But if the driver arrives while the user is typing, the button needs to disappear in real-time without a page refresh.

Combined, these actions generate a massive wall of traffic — 100,000+ Requests Per Second (RPS) — hitting our backend.


2. The Problem Statement

We have three core systems to work with:

System Responsibility
UI The customer's app (Map and Chat pages)
Eligibility Service The brain that decides: "Should I show the cancel button right now?"
Fulfillment The muscle on the ground tracking driver GPS and status

The Rules:

  • If Time > 10 minutesShow the Cancel Button.
  • If Driver Status = "Arrived at Doorstep"Hide the Cancel Button immediately.
  • The system must handle 100k+ RPS without breaking a sweat.

3. The Solutions: From Bad to Brilliant

Solution 1: The Chain Reaction (Direct API Calls)

The most obvious idea is a simple chain. Every time a user's app refreshes, the Eligibility Service asks Fulfillment for the latest update.

Why it's a bad idea: The Fulfillment system is incredibly busy updating live GPS coordinates and assigning drivers. If you hammer it with 100,000 read requests every second just to check a button status, its database will lock up, causing the entire app to crash.


Solution 2: The "Are We There Yet?" Loop (Background Workers)

To protect Fulfillment, we introduce background workers in the Eligibility Service to constantly poll order statuses and save them to a local cache.

Why it's a bad idea: Running continuous loops over 60,000 active orders is incredibly wasteful. More than 95% of the time, the driver hasn't arrived and the time hasn't run out. You are burning massive amounts of server energy asking the same question over and over. Plus, the 2-second polling delay opens a window where a clever user could abuse the cancel button.


Solution 3: Smart Math + Event-Driven Push (The Big Tech Way)

This is how tech giants scale to millions of users. We do two things:

  1. Stop using timers to track time.
  2. Use events for real-world changes.

Step A — Do the Math Upfront (The Read Path)

When an order is placed at 8:00 PM, we don't set a 10-minute timer. Instead, the Eligibility Service saves a dead-simple stamp in a super-fast in-memory cache (Redis):

Cancellation_Allowed_After: 8:10 PM
Enter fullscreen mode Exit fullscreen mode

When the Map UI polls every 30 seconds, the Eligibility Service does a sub-millisecond lookup from the cache and runs this ultra-fast block of code:

const passed_sla = current_time > promised_delivery_timestamp;
const is_at_doorstep = (dp_status === "ARRIVED_AT_DOORSTEP");

if (passed_sla && !is_at_doorstep) {
    return show_cancel_button = true;
} else {
    return show_cancel_button = false;
}
Enter fullscreen mode Exit fullscreen mode

No databases are harmed. No background loops are running. The clock does the work for us.

Step B — The Live Switch (The Write / Push Path)

What about the support chat page that needs real-time accuracy?

  1. The moment the driver reaches the doorstep, Fulfillment emits a notification to a message broker (Kafka): "Order 123: Driver Arrived!"
  2. The Eligibility Service listens to this event and immediately flips the status in Redis to Driver_Arrived = True.
  3. It instantly broadcasts a small message directly to the active Chat UI via a persistent pipeline (WebSockets).
  4. The button vanishes from the customer's screen instantly.


The Takeaway

Scaling systems to massive traffic isn't about buying bigger servers — it's about being lazy in a smart way.

  • Don't track time with active loops — calculate timestamps upfront.
  • Don't ask for updates — let the real world send you events when things change.

Top comments (0)