Gerus Lab

Posted on Mar 1

Stop Building AI Demos — Here's How We Ship AI Systems That Actually Scale

#ai #webdev #programming #architecture

Everyone's building AI agents. Twitter is full of "I built an AI agent in 15 minutes" posts. Cool. Now try running it for 10,000 users with real money on the line.

We've shipped AI-powered systems across fintech, edtech, SaaS, and Web3 at Gerus-lab. Not demos. Production systems that handle real traffic, real users, real edge cases. Here's what we learned — and what most tutorials won't tell you.

The Demo Trap

Most AI agent tutorials follow the same pattern:

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)

That's not a system. That's a function call with a credit card attached.

A production AI system needs:

Failure boundaries — what happens when the LLM hallucinates?
Cost controls — how do you stop a runaway agent from burning $500/hour?
Observability — can you explain WHY it made that decision?
Graceful degradation — does the app die when OpenAI has a bad day?

Pattern 1: The Hybrid Architecture

Pure AI systems are fragile. The best production systems use AI as one layer in a deterministic pipeline.

When we built ASHKA, an AI analytics dashboard, we didn't let the LLM touch raw data directly. Instead:

User Query → Intent Parser (LLM) → Query Builder (deterministic) → Database → Response Formatter (LLM)

The LLM handles natural language in and out. Everything in between is predictable code.

# Intent parsing with structured output
def parse_intent(user_query: str) -> AnalyticsIntent:
    response = client.chat.completions.create(
        model="gpt-4",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": INTENT_SCHEMA},
            {"role": "user", "content": user_query}
        ]
    )
    return AnalyticsIntent.model_validate_json(
        response.choices[0].message.content
    )

# Deterministic query building — no LLM here
def build_query(intent: AnalyticsIntent) -> SQLQuery:
    builder = QueryBuilder(allowed_tables=WHITELIST)
    return builder.from_intent(intent)  # Safe, predictable, auditable

Result: API responses under 200ms, AI accuracy above 90%, and zero SQL injection vectors. The LLM never sees your database.

Pattern 2: Queue-Based Agent Orchestration

Synchronous AI calls are a scalability killer. When we built Obrazno, an AI interior design SaaS, image generation could take 10-30 seconds. You can't hold an HTTP connection open that long.

The fix: task queues with progress tracking.

// NestJS + BullMQ pattern we use at Gerus-lab
@Processor('ai-tasks')
export class AITaskProcessor {
  @Process('generate-design')
  async handleGeneration(job: Job<DesignRequest>) {
    const { roomImage, style, preferences } = job.data;

    // Step 1: Generate mask (AI)
    await job.updateProgress(10);
    const mask = await this.maskService.generate(roomImage);

    // Step 2: In-painting (AI) 
    await job.updateProgress(40);
    const result = await this.inpaintService.apply(
      roomImage, mask, style
    );

    // Step 3: Post-processing (deterministic)
    await job.updateProgress(80);
    const final = await this.postProcess(result, preferences);

    await job.updateProgress(100);
    return { imageUrl: final.url, moodboard: final.moodboard };
  }
}

Key decisions:

BullMQ over raw Redis for retry logic and dead letter queues
Progress tracking so the frontend can show real-time status
Turborepo monorepo to share types between frontend and backend
Each AI step is independently retryable — if in-painting fails, we don't re-generate the mask

Pattern 3: Smart Fallback Chains

Your AI will fail. Plan for it.

When we integrated ChatGPT as a first-line customer support system, we built a three-tier fallback:

async def handle_support_query(query: str, context: dict) -> Response:
    # Tier 1: AI response with confidence scoring
    ai_response = await get_ai_response(query, context)

    if ai_response.confidence > 0.85:
        return Response(
            answer=ai_response.text,
            source="ai",
            needs_review=False
        )

    # Tier 2: Template matching for known patterns
    template = match_template(query, KNOWN_PATTERNS)
    if template:
        return Response(
            answer=template.fill(context),
            source="template",
            needs_review=False
        )

    # Tier 3: Route to human with AI-prepared context
    return Response(
        answer="Connecting you with our team...",
        source="human_escalation",
        needs_review=True,
        ai_summary=ai_response.text,  # Help the human
        suggested_tags=ai_response.tags
    )

The AI handles ~70% of queries autonomously. The remaining 30% get routed to humans — but with AI-prepared context, so the human response time drops by 40%.

Pattern 4: On-Chain Settlement, Off-Chain Intelligence

This one's specific to Web3, but the principle applies everywhere: keep AI off critical paths.

Our Solana raffle platform Ruffles handles 1000+ concurrent raffles. The AI could theoretically pick winners. But we'd never trust that for real money.

AI Layer (off-chain): Fraud detection, analytics, UX optimization
Settlement Layer (on-chain): Winner selection, payouts, escrow
Indexer (off-chain): Transaction monitoring, state sync

The architecture handles $100K+ in daily volume. AI adds intelligence. Blockchain adds trust. Neither replaces the other.

Pattern 5: The Telegram-Native AI Stack

Telegram Mini Apps are eating the traditional app market in certain verticals. We've built several — including ITOhub (social asset marketplace on TON) and TON DCA (automated investment protocol).

The pattern that works:

// Telegram Mini App + AI pipeline
const handleUserAction = async (ctx: TMAContext) => {
  // 1. TonConnect for wallet auth (no passwords)
  const wallet = await ctx.tonConnect.getWallet();

  // 2. AI for smart defaults and recommendations
  const suggestion = await aiService.suggestStrategy({
    portfolio: await getPortfolio(wallet.address),
    marketConditions: await getMarketData(),
    userHistory: await getUserPatterns(wallet.address)
  });

  // 3. Smart contract for execution (non-custodial)
  // User signs, contract executes, no middleman
  const tx = buildDCATx(suggestion.params);
  await ctx.tonConnect.sendTransaction(tx);
};

The key insight: AI recommends, blockchain executes, Telegram distributes. Each layer does what it's best at.

The Checklist

Before you ship an AI feature to production, ask yourself:

What happens when the AI is wrong? If you can't answer this, you're not ready.
What happens when OpenAI/Anthropic is down? Fallback plan or graceful degradation.
Can you explain the decision? Logging, tracing, confidence scores.
Is the AI on the critical path? Move it off if possible.
What's the cost per request? Model choice matters. GPT-4 for classification is like using a Ferrari for grocery runs.

Wrapping Up

The gap between an AI demo and an AI system is enormous. It's not about the model — it's about the architecture around it. Queues, fallbacks, hybrid pipelines, separation of concerns.

We've been building these systems across different domains at Gerus-lab — from DeFi protocols to SaaS products to Telegram bots. The patterns are surprisingly consistent regardless of the domain.

If you're building something with AI and want to skip the "demo phase" entirely, check out our work at gerus-lab.com or reach out directly. We ship systems, not demos.

What patterns have you found essential for production AI? Drop them in the comments.

DEV Community