DEV Community

Siddhant Jain
Siddhant Jain

Posted on • Originally published at keelstack.me

Why Your SaaS Node Backend Will Fail at 10k Requests/Minute (and How to Stress‑Proof It Without Rewriting)

At 1k active users, your Node backend feels like a rock.

At 3k–5k users, Stripe webhooks start retrying, background jobs pile up, and you notice the first “duplicate charge” ticket.

At 8k–10k requests per minute, you’re in a live incident: jobs vanish on deploy, webhook duplicates double‑bill customers, and MFA state drifts, leaving users locked out.

Node is great—but naïve implementations won’t survive SaaS‑scale.

Here’s exactly what breaks and how to stress‑proof it without a full rewrite.

If you’re:

  • building a Node.js + TypeScript SaaS backend,
  • handling Stripe webhooks, background jobs, and auth,
  • and worried that your current architecture will fall apart at 3k–10k requests per minute,

then this post is for you.


What Actually Breaks at 10k RPM in Node

1. Silent Job Loss & Race Conditions

If your background jobs rely on setTimeout or an in‑memory array, a simple git push will wipe them out.

But the real pain starts when workers race for the same job.

Example: A Stripe checkout.session.completed event triggers a job to deliver a license.

Two workers both see the job as “pending” → both claim it → customer receives two licenses.

Pattern that fails:

// Naive in‑memory queue
const jobs = [];

setInterval(() => {
  const job = jobs.shift();
  if (job) process(job);
}, 1000);
Enter fullscreen mode Exit fullscreen mode

What survives:

  • Persistent queue (Redis, RabbitMQ, Postgres with SKIP LOCKED).
  • Atomic claim: the first worker to “lock” the job wins; others skip it.
  • Crash recovery: jobs are persisted before execution, so a worker crash doesn’t lose them.

2. Stripe Webhook Race Conditions

Stripe retries slow webhooks. If your handler is not idempotent, each retry creates a new charge, subscription, or email.

Fragile handler:

app.post('/stripe-webhook', async (req, res) => {
  const event = req.body;
  await db.invoices.insert({ stripeId: event.id });
  await sendReceiptEmail();
  res.sendStatus(200);
});
Enter fullscreen mode Exit fullscreen mode

If two identical events arrive concurrently, both will insert duplicate rows.

Idempotency fix:

  • Use a unique constraint on (stripe_event_id, event_type).
  • Or wrap the handler in an atomic guard that checks a “processed” flag before doing work.

3. Auth & MFA State Drift

When your authentication relies on in‑memory sessions or local cookies without server‑side validation, you risk:

  • Users being able to bypass MFA after a session token is stolen.
  • “MFA required” being enforced only in the UI, not on the API.

Example: A user enables MFA, but the API still allows them to change their billing email without a second factor. An attacker with a stolen session can compromise the account.

What’s needed:

  • Stateless tokens (JWT) with explicit permissions.
  • Per‑action MFA enforcement on sensitive routes (e.g., POST /api/billing/change-email), not just a flag in the UI.

How to Stress‑Test Your SaaS Node Backend

Before you hit 10k RPM, know where you’ll break. Here’s a simple stress‑test recipe you can run today:

Tools

  • autocannon or hey for HTTP load.
  • Stripe CLI to replay webhooks.
  • A script to kill workers randomly.

Tests to Run

  1. Auth endpoint
    autocannon -c 100 -p 10 http://localhost:3000/api/v1/auth/login
    Watch for 5xx errors and 99th‑percentile latency. If you see spikes >1s, your session store might be the bottleneck.

  2. Concurrent Stripe webhooks
    Use Stripe CLI to fire 50 identical events simultaneously:
    stripe trigger checkout.session.completed --repeat 50
    Then check your DB for duplicate records. If you see any, your webhook handler isn’t idempotent.

  3. Crash recovery
    Start a long‑running job (e.g., 10s sleep).
    While it’s running, kill the worker process (kill -9).
    Verify the job is retried or resumed, not lost.

What to Measure

  • Error rate (should stay at 0%).
  • Job loss count (should be 0).
  • Duplicate transaction count (should be 0).

How KeelStack Already Hardens This

KeelStack Engine was built to survive exactly these failure modes on a production‑like SaaS workload. It ships with:

  • Atomic job queue using Redis‑Lua or PostgreSQL SKIP LOCKED. Jobs are persisted before execution; if a worker crashes, they’re re‑claimed by another worker with exponential backoff.
  • Idempotency guard for all mutating endpoints. Stripe webhooks are wrapped with a composite key (event_id + event_type), and the result is cached. Duplicate events return a 200 without re‑executing business logic. In stress‑tests with KeelStack, we see <1% error rate and zero duplicate transactions even when firing 100 identical Stripe webhooks per second.
  • Per‑action MFA enforcement at the API level. The auth module includes a requireMfaFor(route) helper that validates the MFA token on sensitive operations—not just on login.

These aren’t marketing claims; they’re the exact patterns you’d need to implement yourself. KeelStack ships them by default so you can focus on your unique product logic.


Practical Checklist: Hardening Your Node SaaS Before 10k RPM

  1. Use persistent queues – Redis, RabbitMQ, or Postgres with SKIP LOCKED. Never rely on in‑memory arrays or setTimeout for jobs.
  2. Idempotency keys on all webhooks and billing actions – store the result of every mutating operation keyed by a unique identifier (e.g., Stripe event ID + user ID).
  3. Stateless sessions + per‑action MFA enforcement – store only a JWT; validate MFA on sensitive API endpoints, not just in the UI.
  4. Crash‑safe job runners – jobs should be saved to the database before execution starts, and marked as done after success.
  5. Stress‑test with 2–3x your expected peak – use autocannon and simulate webhook floods to catch race conditions early.
  6. Add structured logging – correlate logs with request IDs so you can trace a job from creation to completion across worker restarts.
  7. Enforce test coverage – write integration tests for failure scenarios (e.g., duplicate webhooks, worker crashes). If you can’t reproduce it in CI, it will happen in production.

For deep‑dives on each of these topics, check out our previous posts:


Ship Safe, Not Just Fast

If you’re building a SaaS backend in Node, you don’t have to rediscover these hard‑earned lessons at 3am when your first real‑world traffic spike hits. The patterns above are proven and can be integrated incrementally—or you can start from a foundation that already has them built in.

KeelStack Engine is a production‑tested Node + TypeScript starter that includes idempotency, persistent job queues, per‑user LLM token budgets, and a full auth/billing stack. It’s 100% source code you can access under license terms and deploy anywhere.

👉 Get instant access to KeelStack Engine – skip the weeks of wiring and jump straight to building features that matter.

Top comments (0)