Ale Santini

Posted on Mar 24

Show HN: I built a full AI ops system for a restaurant chain (236 employees, 18 months in production)

#ai #machinelearning #programming #showdev

TL;DR: 18 months building AI for a restaurant chain with 236 employees. Real production metrics, real code, real mistakes. This is what actually works (and what doesn't).

Production numbers:

94% accuracy on 23 KPI queries in natural language
€12,000 in suspicious transactions caught in 6 months
40% reduction in HR tickets
Response time: 180ms (down from 1.2s — this mattered more than accuracy)
2 → 140 daily messages after one change

I'll explain all of these. Starting with the one change that 10x'd adoption.

The One Change That 10x'd Adoption

Daily active messages: 2 → 140. One change.

I moved the interface from a web app to WhatsApp.

The AI was identical. The responses were identical. But managers had WhatsApp open all day. Opening a browser tab was friction they wouldn't accept.

The lesson: employees don't want AI. They want answers in the place they already look.

System 1: Natural Language → SQL KPI Engine

Managers were drowning in Excel. I built intent detection that converts plain language to one of 23 pre-validated query templates.

// Intent detection — NOT fine-tuned, just prompt engineering
$intent = llm_detect_intent($query, $schema_context);

// Map to one of 23 query templates (not free-form SQL generation)
$template = QueryRegistry::get($intent['type']);

// Fill params from entity extraction
$params = EntityExtractor::extract($intent['entities'], $date_context);

// Execute against read replica only — never production writes
$result = DB::readReplica()->execute($template, $params);

Why templates instead of full SQL generation from LLM:

Started with full LLM SQL generation. Disaster. Hallucinated JOINs, wrong table names, one query that locked a table for 40 seconds in production.

Switched to template matching. The LLM only does intent classification now. 23 templates cover 94% of real queries. Much safer, much cheaper.

System 2: RAG for HR/Policy Questions

40% reduction in HR tickets. 236 employees asking about schedules, policies, payroll.

The context size mistake everyone makes:

# What everyone does:
context = vector_search(query, top_k=20, max_tokens=4000)

# What actually works:
context = vector_search(query, top_k=5, max_tokens=800)
# Re-rank by recency + exact keyword match
# Add only top 3 chunks

Smaller context → faster response → higher adoption. I measured it.

The stack: 240-page HR manual + policy docs, chunked at 400 tokens with 50-token overlap. No LangChain. After 3 weeks I removed it — too much abstraction over things I needed to control. Replaced with ~200 lines of code I fully understand.

System 3: Audio Meeting Intelligence

Shift handoffs by voice note. Manager leaves note at 11pm. Next manager arrives at 6am.

Voice note → Whisper (local, not API) → Structured extraction → Push to Notion

The prompt pattern that cut hallucinations by 60%:

You are extracting operational intelligence from a restaurant shift handoff.
Extract ONLY:
1. Problems that need action (with urgency: now/today/this-week)
2. Stock alerts
3. Staff incidents
4. Customer complaints needing followup

Format as JSON. If unclear, mark as "needs_clarification".
DO NOT summarize. DO NOT add context. Only factual operational items.

The "DO NOT summarize" instruction is the key. LLMs want to be helpful and add context. For operational data, you want facts only.

System 4: Fraud Detection (Paid for the whole project)

Simple statistical anomaly detection. Not ML. Not neural networks.

Rolling 30-day average per employee per shift type
Flag transactions > 2.5σ from mean
Cross-reference with inventory consumption

Result: €12,000 in suspicious transactions flagged in 6 months.

Not all were fraud — some were data entry errors. But the attention to patterns changed behavior.

Everything I Removed (and why)

Removed	Reason
LangChain	Too much abstraction, replaced with 200 lines of custom code
Streaming responses	Managers started reading mid-sentence, got confused
GPT-4 for everything	Expensive + slow. Now: Haiku for classification, Opus for reasoning. Cost -80%
Conversation history > 3 exchanges	Context degraded after 3 turns. Truncate aggressively

18-Month Results

Metric	Before	After
HR tickets/week	45	27
Report generation	2h manual	0 (automated)
KPI query time	20min Excel	180ms
Fraud caught	unknown	€12,000 / 6 months
Daily AI interactions	0	140+

What I'd do differently

Start with WhatsApp, not web. Would have saved 3 months building an interface nobody used.
Template matching before LLM generation. For structured data queries, always.
Measure adoption from day 1. I didn't track usage for the first 2 months. Flying blind.
Smaller context windows. Instinct is to give LLM more context. Usually wrong.

Want something like this for your business?

I do consulting on production AI systems for SMBs. Not "add ChatGPT to your website" — actual systems that replace manual work.

Typical projects: $500-1500, delivered in 2-4 weeks.

What I can build:

Natural language → your database (no more Excel reports)
Internal knowledge assistant (HR, policy, training)
Meeting/audio intelligence → task extraction
Anomaly detection on transaction data

Get in touch — I reply within 24h

Happy to answer any technical questions in the comments.

DEV Community