DEV Community

Siddhant Jain
Siddhant Jain

Posted on

Why Your "Vibe Coded" SaaS Will Fail at 100 Users (and How to Fix It)

It's 2026. You just built a functional SaaS MVP in four hours using Cursor and Claude.
It looks great, the happy path works, and you're ready to tweet your launch.

But there's a hidden tax on AI-generated code: Architectural Debt.

When you vibe-code without a strict foundation, the LLM takes the path of least
resistance. It puts database logic in your routes, skips error handling, and ignores
race conditions. It builds a prototype, not a product.

This isn't a skill problem. It's a structural problem. And it only shows up at scale.


The "Vibe Coding" Trap

Most developers hit their first wall not at launch — but at 100 users.

That's when:

  • Two users double-click "Subscribe" at the same time.
  • Stripe retries a slow webhook and hits your server twice.
  • A background job fails silently, and the user never gets their report.
  • One user with an AI feature loops a prompt and burns $200 of your OpenAI credits in 20 minutes.

None of these show up in development. None of them show up in your happy-path tests.
They show up in production, at 2am, when you're not watching.

The fix isn't "write better prompts." The fix is building on a foundation that makes
these failure modes structurally impossible.


1. The Race Condition That Kills Conversions

Most AI-generated Stripe integrations look like this:

1. Receive webhook.
2. Check if processed = true in DB.
3. If not, provision the license.
Enter fullscreen mode Exit fullscreen mode

This is broken.

Stripe retries webhooks. If two requests hit your server at the same millisecond —
which happens regularly under real load — both will see processed = false, and
you'll double-provision (or double-charge) the user.

This isn't hypothetical. It's a confirmed race condition pattern that shows up in
production at real-world webhook retry rates.

The Fix: Atomic Idempotency

The correct approach is not "check then set." It's atomic SET NX (Set if Not Exists).

In Redis, this means:

// WRONG — race condition between check and set
const isProcessed = await store.isProcessed(eventId);
if (!isProcessed) {
  await store.markProcessed(eventId);
  await provisionLicense();
}

// CORRECT — atomic, no race condition
const claimed = await store.tryClaimKey(eventId);
if (claimed) {
  await provisionLicense();
}
Enter fullscreen mode Exit fullscreen mode

The difference: tryClaimKey() is a single atomic Redis SET NX operation.
Either you claim it or you don't. There is no window between the check and the claim.

In KeelStack Engine, every webhook handler uses webhookDeduplicationGuard
middleware which wraps tryClaimKey() automatically:

router.post(
  '/webhooks/stripe',
  webhookDeduplicationGuard(idempotencyStore, 'stripe'),
  stripeWebhookHandler,
)
Enter fullscreen mode Exit fullscreen mode

Pro tip: If your backend doesn't use an Idempotency-Key header for mutating
requests, you are not production-ready.


2. Why "Spaghetti Prompts" Break Your Architecture

As your project grows, your AI context window gets cluttered. With a flat file structure,
the AI starts hallucinating. It forgets where your auth logic lives, starts inventing new
ways to call your database, and quietly breaks layer boundaries you thought were stable.

This isn't a Cursor or Claude problem. It's a map problem.

AI agents write better code when they have clear, enforced boundaries. Without them,
they wander.

The Fix: The 8-Layer "Constitution"

KeelStack Engine uses a strict Hexagonal (Ports & Adapters) architecture across
eight explicit layers:

Layer Purpose AI Write?
01-Core Security, errors, middleware, guards ❌ NO
02-Common DTOs, types, utilities ✅ YES
03-Policies Business rules, billing gates, access guards ❌ NO
04-Modules Feature modules: auth, billing, users, tasks ✅ YES
05-Infra DB schema, Stripe/Redis/Resend gateways ❌ NO
06-Background Worker pool, retry-safe job runner, event bus ✅ YES
07-AI LLMClient, cost controls, AI boundary rules ❌ NO
08-Web Express routes, OpenAPI spec ✅ YES

The .cursorrules file enforces these boundaries at the Cursor / Claude level:

  • AI can write to 02-Common, 04-Modules, 06-Background, 08-Web.
  • AI cannot touch 01-Core, 03-Policies, 05-Infra/schema.ts, or 07-AI/LLMClient.ts.

The result: your AI agent writes architecture-compliant code the first time, without
you needing to explain the layer rules in every prompt.

This .cursorrules file is free and open source on GitHub. Drop it in any
Node.js project root and Cursor loads it automatically.


3. The $500 AI Loop

You've seen the horror stories. A developer leaves an AI agent running, a loop occurs,
and they wake up to a $500 OpenAI bill. One user finds a way to trigger your AI feature
in a loop, and your margins disappear by end of day.

If you're building an AI SaaS, you cannot rely on the AI to behave. You need
hard governance at the infrastructure level.

The Fix: Centralized LLM Client with Hard Budget Caps

Every LLM call in KeelStack Engine goes through a single llmClient singleton
in src/07-AI/llm/LLMClient.ts. No exceptions.

This client enforces:

  • Per-user token budgets — hard caps on what a single user can spend per hour, per day, or per feature.
  • Cost attribution — every call includes a feature field so you know exactly which part of your product is eating your margin.
  • Automatic retry on 429/503 — rate limit errors don't crash your app; they backoff and retry.
  • Request timeouts — runaway prompts are killed after a configurable threshold.
const response = await llmClient.complete({
  userId: 'usr_123',
  feature: 'report_gen',        // cost attribution
  systemPrompt: 'You are...',
  userMessage: userInput,
  // budget, timeout, retry — all enforced automatically
})
Enter fullscreen mode Exit fullscreen mode

One user cannot burn your monthly budget in an afternoon. It's structurally prevented.


4. The Background Job That Vanishes

AI-generated background job implementations typically look like this:

setTimeout(async () => {
  await processReport(jobId);
}, 0);
Enter fullscreen mode Exit fullscreen mode

This is not a background job. This is a deferred function call with no retry,
no timeout, no logging, and no recovery.

If your server restarts, the job disappears. If processReport() throws, the user
never gets their result and you never find out why.

The Fix: Retry-Safe Job Runner with Dead-Letter Logging

KeelStack Engine uses real Node.js worker_threads — not setTimeout, not
setImmediate — with a RetryableJobRunner that provides:

  • Exponential backoff with jitter — failed jobs retry at increasing intervals, not all at once.
  • Per-attempt timeouts — a stuck job doesn't block the worker thread forever.
  • Dead-letter logging — jobs that exhaust retries are logged with full context, not silently dropped.
  • NonRetryableError — for bad-input errors that should fail fast without burning retry budget.
const runner = new RetryableJobRunner(async (payload) => {
  if (!isValid(payload)) throw new NonRetryableError('Bad payload')
  await processReport(payload)
  return { ok: true }
}, { maxAttempts: 5, baseDelayMs: 500, timeoutMs: 30_000 })
Enter fullscreen mode Exit fullscreen mode

The async pattern exposed to clients is 202 + poll — the canonical
production pattern for long-running operations:

POST /api/v1/tasks    → { status: "accepted", jobId: "...", pollUrl: "..." }
GET  /api/v1/tasks/:jobId → { status: "processing" | "done" | "failed", result }
Enter fullscreen mode Exit fullscreen mode

5. The Auth Bug That Leaks User Data

AI-generated password comparison often looks like this:

if (storedHash === inputHash) {
  return user;
}
Enter fullscreen mode Exit fullscreen mode

This is vulnerable to timing attacks. An attacker can measure the response
time of failed comparisons to enumerate valid usernames.

The correct approach is crypto.timingSafeEqual() — a constant-time comparison
that doesn't leak information through timing.

KeelStack Engine uses:

  • Argon2id password hashing (OWASP 2023 parameters: 65MB memory, 3 iterations).
  • crypto.timingSafeEqual() for all password comparisons.
  • Brute-force lockout per IP on auth endpoints (30 req / 10 min).
  • Refresh token rotation — tokens are single-use and rotated on every refresh.
  • Transparent PBKDF2 → Argon2id migration on next login for any legacy hashes.

None of this is complicated to implement. It's just easy to skip when you're
prompting an AI to "add auth."


What 100 Users Actually Reveals

Here's the honest summary of what breaks at 100 users when you build on an
AI-generated flat foundation:

Failure Mode Root Cause Production Cost
Duplicate Stripe charges No atomic idempotency on webhooks Chargebacks, trust loss
Double-provisioned licenses Race condition in check-then-set Revenue leak
Jobs vanishing silently setTimeout instead of real workers User churn, support tickets
$500 AI bill overnight No per-user LLM budget caps Direct margin destruction
Auth timing leaks String comparison instead of timingSafeEqual Potential data breach
Architecture rot Flat file structure, no layer boundaries Weeks of refactoring debt

All of these are structurally preventable. None of them require more prompts.
They require a foundation that makes the wrong thing hard to build.


Stop Building Prototypes. Start Shipping Engines.

You can spend three weeks debugging AI-generated spaghetti after your first 100 users
expose every race condition and edge case. Or you can start with a foundation that
already handles them.

KeelStack Engine is not a template. It's a production-grade Node.js + TypeScript
environment designed specifically for the AI coding era:

  • 563 unit tests · 37 e2e checks · 91.7% statement coverage, enforced by CI
  • Idempotency middleware, webhook deduplication guard, retry-safe job runner
  • Per-user LLM token budgets with cost attribution
  • Open-source .cursorrules — AI writes architecture-compliant code the first time
  • 15 copy-paste prompts for Cursor, Claude, and Copilot
  • SaaS blueprints: AI Report Generator, Lead Finder API
  • One-time payment. Your source code, your rules.

Explore KeelStack Engine →

Top comments (0)