ppcvote

Posted on Mar 31 • Originally published at ultralab.tw

Why Your SaaS Needs AI-Ready Interfaces: Architecture Lessons from Three Products

#ai #architecture #saas #multillm

TL;DR

Every system you're building today will be asked within three years: "Can this integrate AI?"

If your answer is "we'd need to rewrite it," you've lost. If your answer is "it's already wired up — just flip the switch," you've won.

This article is a practical guide distilled from the real-world pitfalls we hit across three production products (Mind Threads, UltraProbe, Ultra Advisor). This isn't theory — it's running code.

Current State: One API Key Powering Three Products

Let's be honest. All three of our products currently run on Google Gemini only:

Product	AI Use Case	Model
Mind Threads	Social copy generation (35 posts/day)	Gemini 2.0 Flash
UltraProbe	AI security scanning (12 attack vectors)	Gemini 2.5 Flash
Ultra Advisor	Insurance OCR + product classification	Gemini 2.0 Flash

This was the right call early on. Gemini Flash's free tier is generous, it's fast, and its Chinese language capability is serviceable. But this architecture has three fatal problems:

Problem 1: Single Point of Failure

In February 2026, Google had an API rate-limiting incident. All three of our products went down simultaneously. One API key, one provider, three products. This isn't architecture — it's gambling.

Problem 2: Use-Case Mismatch

Gemini Flash works great for "generating social copy" but isn't stable enough for "precise security analysis" or "structured OCR." Different tasks need different models, but we were locked into one.

Problem 3: No Upgrade Path

When clients ask: "Can your system use Claude? Can it run GPT-4o?" Our only answer was: "Yes, but we'd need to change the code." That's not a product-ready answer.

The Solution: 7 Design Principles for AI-Ready Architecture

Every principle below comes from real mistakes we made in production.

Principle 1: Model Router — Don't Lock Into Any Single Provider

// Wrong approach: importing a specific SDK directly
import { GoogleGenerativeAI } from '@google/generative-ai'

// Right approach: unified interface + routing
interface AIProvider {
  generate(prompt: string, config: AIConfig): Promise<AIResponse>
}

const router = createAIRouter({
  primary: 'gemini-2.5-flash',
  fallback: ['claude-sonnet-4-6', 'gpt-4o-mini'],
  routing: {
    'content-generation': 'gemini',    // Copy uses Gemini (fast)
    'security-analysis': 'claude',      // Security analysis uses Claude (precise)
    'structured-extraction': 'gemini',  // OCR uses Gemini (multimodal)
    'code-generation': 'claude',        // Code uses Claude (logic)
  }
})

Why it matters: Models update on a monthly cadence. Today's best model may be surpassed in three months. Your architecture shouldn't require business logic changes just to swap models.

Principle 2: Prompt Template Registry — Prompts Are Assets, Not Strings

Our biggest mistake: hardcoding prompts directly in API handlers.

// Wrong approach: prompts scattered across API files
const ANALYSIS_PROMPT = `You are an AI security auditor specializing in prompt injection defense...`

// Right approach: centralized management + version control
const promptRegistry = {
  'probe.scan-prompt': {
    version: '2.1',
    template: loadTemplate('probe/scan-prompt.md'),
    model: 'claude-sonnet-4-6',
    temperature: 0.3,
    maxTokens: 4096,
    schema: ScanResultSchema,  // Zod schema for validation
  },
  'threads.generate-post': {
    version: '1.4',
    template: loadTemplate('threads/generate-post.md'),
    model: 'gemini-2.0-flash',
    temperature: 1.0,
    maxTokens: 1024,
  }
}

Why it matters: Prompts are your core IP. Scattered throughout your code, you can't track which version performs best, can't A/B test, and can't let non-engineers optimize them.

Principle 3: Response Cache — Don't Ask the Same Question Twice

Ultra Advisor got this right: insurance product classification results are cached in Firestore. The same product queried a second time returns from cache instead of hitting Gemini.

// Ultra Advisor's caching strategy (in production)
async function lookupProduct(insurer: string, name: string) {
  const cached = await db.collection('productCache')
    .where('insurer', '==', insurer)
    .where('productName', '==', name)
    .limit(1).get()

  if (!cached.empty) {
    await cached.docs[0].ref.update({
      searchCount: FieldValue.increment(1)
    })
    return cached.docs[0].data()
  }

  // Cache miss — call Gemini
  const result = await gemini.generate(classifyPrompt(insurer, name))
  await db.collection('productCache').add({ ...result, insurer, name })
  return result
}

Result: Same-product queries dropped from 2-3 seconds to 50ms. Gemini API costs reduced by 60%.

Principle 4: Structured Output — AI Responses Must Be Validatable

A pitfall UltraProbe hit: Gemini sometimes returns malformed JSON, breaking the entire scan result.

// Right approach: validate AI output with Zod
import { z } from 'zod'

const ScanResultSchema = z.object({
  grade: z.enum(['A', 'B', 'C', 'D', 'E', 'F']),
  score: z.number().min(0).max(100),
  vulnerabilities: z.array(z.object({
    name: z.string(),
    severity: z.enum(['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'NONE']),
    finding: z.string().max(100),
    suggestion: z.string().max(100),
  }))
})

// Validate immediately after AI response
const raw = await aiRouter.generate(prompt)
const parsed = ScanResultSchema.safeParse(JSON.parse(raw))
if (!parsed.success) {
  // Retry with stricter prompt, or fallback to another model
}

Why it matters: AI output is non-deterministic. Your system can't throw a 500 Error just because the AI returned a weird format.

Principle 5: BYOK (Bring Your Own Key) — Let Clients Use Their Own Key

Mind Threads already implements this pattern: users can enter their own Gemini API key in settings, bypassing the platform's usage limits.

// Mind Threads BYOK implementation
async function getApiKey(userId: string) {
  const settings = await db.collection('userSettings').doc(userId).get()
  const userKey = settings.data()?.geminiApiKey

  if (userKey) {
    return { key: userKey, source: 'user', unlimited: true }
  }

  return { key: process.env.GEMINI_API_KEY, source: 'platform', unlimited: false }
}

Why it matters:

Reduces your API costs
Lets Pro users bypass platform limits
Future-proofs for multi-provider key support (Gemini key, OpenAI key, Anthropic key)

Principle 6: MCP Server — Make Your System Callable by AI Agents

This is the most important trend of 2026. MCP (Model Context Protocol) lets AI Agents directly operate your system.

Ultra KB already has an Agent-Ready architecture (Notion knowledge base readable/writable by AI), but we don't have a formal MCP Server yet. Planned interfaces:

// Planned MCP Server tool definitions
const tools = [
  {
    name: 'ultraprobe_scan',
    description: 'Scan a System Prompt for security vulnerabilities',
    input: { prompt: 'string', language: 'zh-TW | en' },
    output: { grade: 'A-F', vulnerabilities: 'array' }
  },
  {
    name: 'ultrakb_query',
    description: 'Query documents on a specific topic from the knowledge base',
    input: { query: 'string', collection: 'string' },
    output: { documents: 'array', relevance: 'number' }
  },
  {
    name: 'threads_generate',
    description: 'Generate a Threads post',
    input: { topic: 'string', persona: 'viral|knowledge|story|quote' },
    output: { content: 'string', hashtags: 'array' }
  }
]

Why it matters: When users of Claude Desktop, Cursor, Windsurf, and similar tools can directly call your service, you're not just a SaaS — you're part of the AI ecosystem.

Principle 7: Observability — AI Calls Must Be Trackable

The most critical gap in our current setup: AI call observability.

// Every AI call should be logged
interface AICallLog {
  id: string
  timestamp: Date
  model: string
  provider: string
  endpoint: string              // Which API triggered the call
  promptTokens: number
  completionTokens: number
  latencyMs: number
  cost: number                  // Estimated cost
  success: boolean
  retryCount: number
  cacheHit: boolean
  userId?: string               // Who triggered it
}

Why it matters: Without this, you don't know how much you're spending on AI per month, which features consume the most tokens, or which model fails most often. No data, no optimization.

Our Implementation Roadmap

Phase 1 (Now) — Brand Layer

[x] Label all services with AI-Ready capabilities
[x] Document existing AI integration points
[x] Standardize prompt management practices

Phase 2 (Q2 2026) — Technical Layer

[ ] Build AI Router middleware (multi-model)
[ ] Migrate all prompts to template registry
[ ] Add Zod schema validation for all AI outputs
[ ] Implement response cache layer

Phase 3 (Q3 2026) — Ecosystem Layer

[ ] UltraProbe MCP Server
[ ] Ultra KB semantic search (RAG)
[ ] AI Observability Dashboard
[ ] Multi-provider BYOK support

What You Should Do Right Now

Regardless of your product's stage, these three actions have the lowest cost and highest impact:

Extract AI calls into standalone functions. Don't fetch(gemini_url) directly in your business logic. A single aiService.generate() is enough.
Separate prompts into dedicated files. .md or .txt — doesn't matter, just don't hardcode them in your source.
Log token counts and latency for every AI call. console.log is fine for now — build a dashboard later.

These three things take less than half a day combined, but they determine whether your system will be "AI-ready" or "needs a rewrite" three years from now.

Conclusion

AI isn't a feature — it's infrastructure.

Just as you wouldn't wait until you need search to add database indexes, you shouldn't wait until a client asks "can this integrate AI?" to start redesigning your architecture.

We validated this methodology across three products and tens of thousands of API calls. It's not perfect, but it's running.

If you're building a SaaS, add AI-ready interfaces now. Future you will thank present you.

Min Yi Chen — Founder, Ultra Creation Co., Ltd.
Currently operating 6 AI products with 200+ daily AI calls

Want to make your system AI-Ready? Free consultation

Originally published on Ultra Lab — we build AI products that run autonomously.

Try UltraProbe free — our AI security scanner checks your website for vulnerabilities in 30 seconds: ultralab.tw/probe

DEV Community