AI-Generated Backends Break in Production. We Replaced Code with Specs.

#ai #backend #architecture #security

AI backend tools generate code. Fast. The problem is not how the code is generated - it's what's missing from the output.

Generated code ships without structural guarantees. No enforced transaction boundaries. No mandatory audit logging. No invariant checks before commit. No risk classification before deployment.

The code works until it doesn't, and when it breaks in production, you're debugging code you didn't write.

We made a different choice.

The Problem with Generated Code

When an AI generates your backend code, you get:

No structural safety: Transaction boundaries are optional, not enforced. A payment can process while the order creation fails.
No audit trail: Logging is ad-hoc. "What happened at 3am?" has no guaranteed answer.
Hidden complexity: 6 months later, nobody knows what the generated code does. Including the AI that wrote it.
Security gaps: 45% of AI-generated code contains vulnerabilities (Stanford/UIUC). Not because the AI is bad - because code is too flexible a medium to guarantee safety.
No rollback strategy: External calls with side effects aren't separated from reversible operations.

For prototypes, this is fine. For production backends handling payments, reservations, and user data? It's a structural risk.

The Alternative: Specs Instead of Code

Fascia doesn't generate code. It generates structured specifications - and executes them directly.

When you describe your business in natural language, AI produces specs that define:

Entities: Business objects with fields, relationships, status machines, and invariants
Tools: API endpoints with typed input/output, trigger types, and flow graphs
Policies: Design-time rules that block unsafe patterns before deployment

At runtime, a deterministic executor (written in Go, ~50ms cold start on Cloud Run) reads the spec and follows it. No code interpretation. No LLM inference. No variability.

The "no LLM at runtime" rule isn't about what competitors do - it's about what production systems need. Determinism. Auditability. Predictable behavior under every condition.

The 9-Step Execution Contract

Every API endpoint follows the same sequence:

Validate input against JSON Schema from the spec
Authorize - JWT verification, RBAC role check, row-level ownership
Check policies - design-time rules enforced deterministically
Start transaction - explicit boundary, no auto-commit
Execute flow graph - a DAG of typed nodes (Read, Write, Transform, If/Switch)
Enforce invariants - business rules checked before commit
Commit or rollback - all-or-nothing, no partial state
Write audit log - append-only, unconditional, every execution
Return typed response - matches the output schema from the spec

No shortcuts. No "well, this endpoint is special." The rigidity is the feature.

With generated code, any of these steps can be missing. With specs, they're mandatory. The format doesn't allow you to skip step 4 or forget step 8.

Design-Time Safety

The Risk Engine classifies every API endpoint before deployment:

Green: All writes in transactions, no unbounded queries, no unsafe patterns. Deploy freely.
Yellow: External call without retry, high row impact. Deploy with acknowledgment.
Red: Payment without rollback, write without transaction, unbounded UPDATE. Cannot deploy.

Red signals cannot be dismissed. Fix the design.

This is the key difference from generated code. In a codebase, these patterns are caught by code review (maybe) or by production incidents (definitely). In a spec-driven system, they're caught by static analysis before anything runs.

The Tradeoff

All business logic must be captured in the spec at design time. The runtime cannot improvise. This means:

Complex conditional logic must be modeled as flow graph branches
Custom business rules use a restricted Value DSL (no arbitrary code)
External API calls are explicit nodes with retry/timeout configuration

This is a real constraint. Generated code is more flexible - you can do anything, including unsafe things.

We think the constraint is worth it. Production backends should be provable, not flexible.

Specs vs Generated Code

Dimension	Generated Code	Spec-Driven (Fascia)
Transaction boundaries	Optional (developer's choice)	Mandatory (format enforces)
Audit logging	Ad-hoc	Every execution, unconditional
Unsafe pattern detection	Code review / production incident	Static analysis before deploy
Readability at month 6	Depends on code quality	Spec is always readable
Rollback strategy	Hope for the best	Explicit compensation flows
Response time	Varies by implementation	12-50ms (deterministic executor)