Siddhant Jain

Posted on Mar 21

Why Your "Vibe Coded" SaaS Will Fail at 100 Users (and How to Fix It)

#node #saas #cursor #ai

It's 2026. You just built a functional SaaS MVP in four hours using Cursor and Claude.
It looks great, the happy path works, and you're ready to tweet your launch.

But there's a hidden tax on AI-generated code: Architectural Debt.

When you vibe-code without a strict foundation, the LLM takes the path of least
resistance. It puts database logic in your routes, skips error handling, and ignores
race conditions. It builds a prototype, not a product.

This isn't a skill problem. It's a structural problem. And it only shows up at scale.

The "Vibe Coding" Trap

Most developers hit their first wall not at launch — but at 100 users.

That's when:

Two users double-click "Subscribe" at the same time.
Stripe retries a slow webhook and hits your server twice.
A background job fails silently, and the user never gets their report.
One user with an AI feature loops a prompt and burns $200 of your OpenAI credits in 20 minutes.

None of these show up in development. None of them show up in your happy-path tests.
They show up in production, at 2am, when you're not watching.

The fix isn't "write better prompts." The fix is building on a foundation that makes
these failure modes structurally impossible.

1. The Race Condition That Kills Conversions

Most AI-generated Stripe integrations look like this:

1. Receive webhook.
2. Check if processed = true in DB.
3. If not, provision the license.

This is broken.

Stripe retries webhooks. If two requests hit your server at the same millisecond —
which happens regularly under real load — both will see processed = false, and
you'll double-provision (or double-charge) the user.

This isn't hypothetical. It's a confirmed race condition pattern that shows up in
production at real-world webhook retry rates.

The Fix: Atomic Idempotency

The correct approach is not "check then set." It's atomic SET NX (Set if Not Exists).

In Redis, this means:

// WRONG — race condition between check and set
const isProcessed = await store.isProcessed(eventId);
if (!isProcessed) {
  await store.markProcessed(eventId);
  await provisionLicense();
}

// CORRECT — atomic, no race condition
const claimed = await store.tryClaimKey(eventId);
if (claimed) {
  await provisionLicense();
}

The difference: tryClaimKey() is a single atomic Redis SET NX operation.
Either you claim it or you don't. There is no window between the check and the claim.

In KeelStack Engine, every webhook handler uses webhookDeduplicationGuard
middleware which wraps tryClaimKey() automatically:

router.post(
  '/webhooks/stripe',
  webhookDeduplicationGuard(idempotencyStore, 'stripe'),
  stripeWebhookHandler,
)

Pro tip: If your backend doesn't use an Idempotency-Key header for mutating
requests, you are not production-ready.

2. Why "Spaghetti Prompts" Break Your Architecture

As your project grows, your AI context window gets cluttered. With a flat file structure,
the AI starts hallucinating. It forgets where your auth logic lives, starts inventing new
ways to call your database, and quietly breaks layer boundaries you thought were stable.

This isn't a Cursor or Claude problem. It's a map problem.

AI agents write better code when they have clear, enforced boundaries. Without them,
they wander.

The Fix: The 8-Layer "Constitution"

KeelStack Engine uses a strict Hexagonal (Ports & Adapters) architecture across
eight explicit layers:

Layer	Purpose	AI Write?
01-Core	Security, errors, middleware, guards	❌ NO
02-Common	DTOs, types, utilities	✅ YES
03-Policies	Business rules, billing gates, access guards	❌ NO
04-Modules	Feature modules: auth, billing, users, tasks	✅ YES
05-Infra	DB schema, Stripe/Redis/Resend gateways	❌ NO
06-Background	Worker pool, retry-safe job runner, event bus	✅ YES
07-AI	LLMClient, cost controls, AI boundary rules	❌ NO
08-Web	Express routes, OpenAPI spec	✅ YES

The .cursorrules file enforces these boundaries at the Cursor / Claude level:

AI can write to 02-Common, 04-Modules, 06-Background, 08-Web.
AI cannot touch 01-Core, 03-Policies, 05-Infra/schema.ts, or 07-AI/LLMClient.ts.

The result: your AI agent writes architecture-compliant code the first time, without
you needing to explain the layer rules in every prompt.

This .cursorrules file is free and open source on GitHub. Drop it in any
Node.js project root and Cursor loads it automatically.

3. The $500 AI Loop

You've seen the horror stories. A developer leaves an AI agent running, a loop occurs,
and they wake up to a $500 OpenAI bill. One user finds a way to trigger your AI feature
in a loop, and your margins disappear by end of day.

If you're building an AI SaaS, you cannot rely on the AI to behave. You need
hard governance at the infrastructure level.

The Fix: Centralized LLM Client with Hard Budget Caps

Every LLM call in KeelStack Engine goes through a single llmClient singleton
in src/07-AI/llm/LLMClient.ts. No exceptions.

This client enforces:

Per-user token budgets — hard caps on what a single user can spend per hour, per day, or per feature.
Cost attribution — every call includes a feature field so you know exactly which part of your product is eating your margin.
Automatic retry on 429/503 — rate limit errors don't crash your app; they backoff and retry.
Request timeouts — runaway prompts are killed after a configurable threshold.

const response = await llmClient.complete({
  userId: 'usr_123',
  feature: 'report_gen',        // cost attribution
  systemPrompt: 'You are...',
  userMessage: userInput,
  // budget, timeout, retry — all enforced automatically
})

One user cannot burn your monthly budget in an afternoon. It's structurally prevented.

4. The Background Job That Vanishes

AI-generated background job implementations typically look like this:

setTimeout(async () => {
  await processReport(jobId);
}, 0);

This is not a background job. This is a deferred function call with no retry,
no timeout, no logging, and no recovery.

If your server restarts, the job disappears. If processReport() throws, the user
never gets their result and you never find out why.

The Fix: Retry-Safe Job Runner with Dead-Letter Logging

KeelStack Engine uses real Node.js worker_threads — not setTimeout, not
setImmediate — with a RetryableJobRunner that provides:

Exponential backoff with jitter — failed jobs retry at increasing intervals, not all at once.
Per-attempt timeouts — a stuck job doesn't block the worker thread forever.
Dead-letter logging — jobs that exhaust retries are logged with full context, not silently dropped.
NonRetryableError — for bad-input errors that should fail fast without burning retry budget.

const runner = new RetryableJobRunner(async (payload) => {
  if (!isValid(payload)) throw new NonRetryableError('Bad payload')
  await processReport(payload)
  return { ok: true }
}, { maxAttempts: 5, baseDelayMs: 500, timeoutMs: 30_000 })

The async pattern exposed to clients is 202 + poll — the canonical
production pattern for long-running operations:

POST /api/v1/tasks    → { status: "accepted", jobId: "...", pollUrl: "..." }
GET  /api/v1/tasks/:jobId → { status: "processing" | "done" | "failed", result }

5. The Auth Bug That Leaks User Data

AI-generated password comparison often looks like this:

if (storedHash === inputHash) {
  return user;
}

This is vulnerable to timing attacks. An attacker can measure the response
time of failed comparisons to enumerate valid usernames.

The correct approach is crypto.timingSafeEqual() — a constant-time comparison
that doesn't leak information through timing.

KeelStack Engine uses:

Argon2id password hashing (OWASP 2023 parameters: 65MB memory, 3 iterations).
crypto.timingSafeEqual() for all password comparisons.
Brute-force lockout per IP on auth endpoints (30 req / 10 min).
Refresh token rotation — tokens are single-use and rotated on every refresh.
Transparent PBKDF2 → Argon2id migration on next login for any legacy hashes.

None of this is complicated to implement. It's just easy to skip when you're
prompting an AI to "add auth."

What 100 Users Actually Reveals

Here's the honest summary of what breaks at 100 users when you build on an
AI-generated flat foundation:

Failure Mode	Root Cause	Production Cost
Duplicate Stripe charges	No atomic idempotency on webhooks	Chargebacks, trust loss
Double-provisioned licenses	Race condition in check-then-set	Revenue leak
Jobs vanishing silently	`setTimeout` instead of real workers	User churn, support tickets
$500 AI bill overnight	No per-user LLM budget caps	Direct margin destruction
Auth timing leaks	String comparison instead of `timingSafeEqual`	Potential data breach
Architecture rot	Flat file structure, no layer boundaries	Weeks of refactoring debt

All of these are structurally preventable. None of them require more prompts.
They require a foundation that makes the wrong thing hard to build.

Stop Building Prototypes. Start Shipping Engines.

You can spend three weeks debugging AI-generated spaghetti after your first 100 users
expose every race condition and edge case. Or you can start with a foundation that
already handles them.

KeelStack Engine is not a template. It's a production-grade Node.js + TypeScript
environment designed specifically for the AI coding era:

563 unit tests · 37 e2e checks · 91.7% statement coverage, enforced by CI
Idempotency middleware, webhook deduplication guard, retry-safe job runner
Per-user LLM token budgets with cost attribution
Open-source .cursorrules — AI writes architecture-compliant code the first time
15 copy-paste prompts for Cursor, Claude, and Copilot
SaaS blueprints: AI Report Generator, Lead Finder API
One-time payment. Your source code, your rules.

Explore KeelStack Engine →