Designing a Real-Time Billing System That Survives Peak Hours

#architecture #performance #systemdesign

In most restaurants, billing looks simple from the outside: take an order, calculate totals, print a bill. In reality, billing sits at the center of a noisy, highly concurrent system.

At peak hours, multiple things happen at once:

Captains create or modify orders
The kitchen updates item status
Discounts or taxes change mid-order
Inventory updates happen asynchronously
Network latency spikes
Printers misbehave at the worst possible moment

The requirement sounds small: billing must never block operations. But the implication is big. If billing slows down, the queue grows. If the queue grows, staff panic. If staff panic, they bypass the system.

This article shares real-world lessons from designing a real-time billing system under these conditions.

No theory. Just constraints, mistakes, and hard-earned trade-offs.

Constraints (The Ones That Actually Matter)
Before touching architecture, we had to accept some non-negotiable constraints.

Peak load is unpredictable: Lunch rush, dinner rush, festival days, weekends — traffic is bursty. A system that works fine at 20 bills/hour can collapse at 200.
Latency tolerance is near zero: A billing screen that freezes for 2 seconds feels broken to staff. Humans perceive slowness faster than engineers expect.
Hardware is inconsistent: Low-end Android devices, old printers, mixed network quality. You cannot assume ideal conditions.
Data correctness > elegance: A wrong total is worse than a slow UI. Financial data must be correct, auditable, and replayable.

These constraints shaped every decision that followed.

What Went Wrong (Early Mistakes)

Mistake 1: Treating billing as a synchronous operation
Our first approach tightly coupled:

Order creation
Tax calculation
Inventory update
Bill generation
Print trigger All in one request. Under load, a slow printer or inventory lock would block billing entirely. The UI froze because the backend was “doing the right thing.” Lesson: Billing is not one action. It’s a pipeline.

Mistake 2: Recalculating everything on every change
Every time an item was added or removed, we recomputed:

Subtotals
Taxes
Discounts
Round-offs This worked in isolation but failed under concurrency. Multiple rapid edits caused race conditions and inconsistent totals. Lesson: Idempotent, incremental calculations beat full recomputation.

Mistake 3: Assuming the network is reliable
We initially assumed the backend was the source of truth. When the network dropped, billing stalled.
Staff didn’t wait. They wrote bills manually.
Lesson: If your system pauses, humans route around it. Permanently.

Solution Approach (High-Level, No Secrets)
The final design wasn’t fancy. It was defensive.

1. Event-driven billing model
Instead of “generate bill,” we moved to billing events:

ITEM_ADDED
ITEM_REMOVED
DISCOUNT_APPLIED
TAX_UPDATED
BILL_FINALIZED Each event is immutable and timestamped. The bill is a projection of these events.

Why this helped:

Easy to replay
Easy to audit
Partial failures don’t corrupt state

2. Separate “fast path” and “slow path”
We split operations into two categories:
Fast path (must be instant):

UI updates
Line-item totals
Running subtotal

Slow path (can lag slightly):

Inventory sync
Printer communication
Analytics
Remote sync Billing completion only depends on the fast path.

Key idea: Never block the fast path on external systems.

3. Local-first with eventual sync
The device maintains a local ledger:

Bills are finalized locally
Each finalized bill gets a local unique ID
Sync happens asynchronously Conflict resolution is simple:
Bills are append-only
No bill is ever edited after finalization This eliminated entire classes of network-related failures.

4. Deterministic calculation engine
We moved all calculations into a deterministic module:

Same inputs always produce the same output
No floating-point surprises
Explicit rounding rules
Versioned tax logic This allowed:
Safe replays
Backward compatibility
Debugging past bills reliably

5. Idempotent operations everywhere
Every billing action includes an idempotency key.
If the same event is sent twice:

It is safely ignored
Or merged without side effects This mattered during retries, crashes, and reconnects.

Performance Decisions That Actually Helped
Avoid shared locks
We stopped locking “the bill” as a whole. Instead:

Line items are updated independently
Totals are derived, not locked

Precompute where humans wait
Humans wait on screen transitions, not background syncs. We optimized perceived performance, not raw throughput.

Backpressure instead of failure
If the system is under stress:

Slow non-critical features
Never drop billing actions Dropping logs is acceptable. Dropping bills is not.

Lessons Learned

Billing is a trust system: Once staff distrust billing totals, no UI improvement will fix it.
Real-time does not mean synchronous: Real-time means predictable latency, not doing everything at once.
Audibility beats cleverness: If you can’t explain a bill 6 months later, the design is wrong.
Humans optimize faster than code: If the system slows them down, they will invent workarounds immediately.

Final Takeaway

A billing system survives peak hours because it is forgiving.

Forgiving of: