Every SaaS founder I talk to right now is asking some version of: "How do we add AI to our product?" The question is right but often framed wrong. The question isn't how to add AI, it's which specific user problems in your product have AI as the right solution, and how to integrate it without breaking what already works.
Here are the patterns I use when integrating AI into production SaaS products for US clients.
Pattern 1: LLM as a Feature, Not the Foundation
The most common mistake I see: building an entirely LLM-dependent product where every core workflow goes through an AI call. The problems compound:
- LLM API latency (1–10 seconds) is incompatible with synchronous UI flows
- LLM costs at scale are unpredictable and can make unit economics collapse
- LLM outputs are non-deterministic, the same input can produce different outputs, making testing and QA much harder
The better pattern is AI as an enhancement layer on top of deterministic core functionality:
Your SaaS product core → deterministic, fast, testable
↓ (optional AI layer)
AI enhancement → summarization, suggestions, classification
↓
User gets core value with or without AI; AI makes it better
If your product only works when the LLM call succeeds, you've built fragility into your core value proposition.
Pattern 2: Async AI Processing with Job Queues
Don't call LLM APIs synchronously in request handlers. The latency will kill your UI responsiveness.
// Bad: synchronous LLM call in request handler
app.post('/documents/:id/summarize', async (req, res) => {
const document = await getDocument(req.params.id);
const summary = await openai.chat.completions.create({...}); // 3-8 seconds
await saveS ummary(document.id, summary);
res.json({ summary }); // User waits 3-8 seconds staring at a spinner
});
// Good: queue job, return immediately, update UI via polling or WebSocket
app.post('/documents/:id/summarize', async (req, res) => {
const job = await aiQueue.add('summarize-document', {
documentId: req.params.id,
userId: req.user.id,
});
res.json({ jobId: job.id, status: 'processing' }); // Returns in <100ms
});
// Worker process handles the actual LLM call
aiQueue.process('summarize-document', async (job) => {
const document = await getDocument(job.data.documentId);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Summarize the following document concisely.' },
{ role: 'user', content: document.content },
],
});
const summary = response.choices[0].message.content;
await saveS ummary(document.id, summary);
await notifyUser(job.data.userId, 'summarize-complete', { documentId: document.id });
});
The frontend polls /jobs/:jobId/status or receives a WebSocket event when the summary is ready.
Pattern 3: Prompt Engineering as Code
Prompts are code. They should be versioned, tested, and reviewed like code. Don't embed prompts as inline strings scattered through your codebase:
// prompts/summarization.ts
export const DOCUMENT_SUMMARY_PROMPT = {
version: 'v2.1',
system: `You are a precise document summarizer for a US business context.
Produce summaries that are:
- 3-5 sentences maximum
- Written in active voice
- Focused on actionable information and key decisions
- Free of filler phrases like "The document discusses..." or "This text covers..."`,
user: (document: string) => `Summarize this document:\n\n${document}`,
};
// Usage
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: DOCUMENT_SUMMARY_PROMPT.system },
{ role: 'user', content: DOCUMENT_SUMMARY_PROMPT.user(document.content) },
],
});
Version your prompts. When a prompt change improves quality, it should go through code review. When a prompt regression slips through, you can identify exactly which version introduced it.
Pattern 4: Output Validation
LLM outputs are untrustworthy for structured data. Always validate:
import { z } from 'zod';
const ClassificationSchema = z.object({
category: z.enum(['bug', 'feature', 'question', 'billing', 'other']),
priority: z.enum(['low', 'medium', 'high', 'urgent']),
confidence: z.number().min(0).max(1),
reasoning: z.string(),
});
async function classifyTicket(ticketText: string) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
response_format: { type: 'json_object' }, // Force JSON output
});
const raw = JSON.parse(response.choices[0].message.content!);
const result = ClassificationSchema.safeParse(raw);
if (!result.success) {
// LLM returned malformed output, fall back to manual review queue
await queueForManualClassification(ticketText);
return { category: 'other', priority: 'medium', confidence: 0, reasoning: 'AI classification failed' };
}
return result.data;
}
Never pass raw LLM output directly to a database write or a downstream system without validation.
Pattern 5: Cost Control
LLM costs can surprise you at scale. Build cost controls from the start:
// Track LLM costs per tenant per month
async function trackLLMUsage(tenantId: string, model: string, tokens: number) {
const costPerToken = MODEL_COSTS[model] || 0;
const cost = tokens * costPerToken;
await db.query(`
INSERT INTO llm_usage (tenant_id, model, tokens, cost_usd, date)
VALUES ($1, $2, $3, $4, CURRENT_DATE)
ON CONFLICT (tenant_id, model, date)
DO UPDATE SET tokens = llm_usage.tokens + $3, cost_usd = llm_usage.cost_usd + $4
`, [tenantId, model, tokens, cost]);
// Check monthly budget
const monthlyUsage = await getMonthlyUsage(tenantId);
if (monthlyUsage.cost_usd > PLAN_LIMITS[tenant.plan].monthly_ai_budget) {
throw new Error('AI_BUDGET_EXCEEDED');
}
}
Separate AI costs by tenant so you can charge for them, enforce plan limits, and identify which customers are unprofitable at their current plan tier.
Pattern 6: Retrieval-Augmented Generation (RAG) for Accuracy
For any AI feature that needs to answer questions about your product's data, customer documents, knowledge bases, CRM notes, pure LLM answers will hallucinate. Use RAG:
// Simplified RAG pattern
async function answerWithContext(question: string, tenantId: string): Promise<string> {
// 1. Embed the question
const questionEmbedding = await openai.embeddings.create({
input: question,
model: 'text-embedding-3-small',
});
// 2. Find relevant context from vector store
const relevantChunks = await vectorDb.query({
vector: questionEmbedding.data[0].embedding,
filter: { tenantId },
topK: 5,
});
// 3. Use retrieved context in the prompt, not the LLM's training data
const context = relevantChunks.map(c => c.text).join('\n\n');
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: `Answer based only on this context:\n\n${context}` },
{ role: 'user', content: question },
],
});
return response.choices[0].message.content!;
}
RAG grounds LLM answers in your actual data, dramatically reducing hallucinations for domain-specific questions.
AI integration done right enhances your SaaS product without introducing fragility. The patterns above are what I use when building AI-powered features for US clients in production environments.
If you're adding AI capabilities to a SaaS product and want it built properly, see my AI development work at waqarhabib.com/services/ai-development.
Originally published at waqarhabib.com
Top comments (0)