DEV Community

Ale Santini
Ale Santini

Posted on

Show HN: I built a full AI ops system for a restaurant chain (236 employees, 18 months in production)

TL;DR: 18 months building AI for a restaurant chain with 236 employees. Real production metrics, real code, real mistakes. This is what actually works (and what doesn't).


Production numbers:

  • 94% accuracy on 23 KPI queries in natural language
  • €12,000 in suspicious transactions caught in 6 months
  • 40% reduction in HR tickets
  • Response time: 180ms (down from 1.2s — this mattered more than accuracy)
  • 2 → 140 daily messages after one change

I'll explain all of these. Starting with the one change that 10x'd adoption.


The One Change That 10x'd Adoption

Daily active messages: 2 → 140. One change.

I moved the interface from a web app to WhatsApp.

The AI was identical. The responses were identical. But managers had WhatsApp open all day. Opening a browser tab was friction they wouldn't accept.

The lesson: employees don't want AI. They want answers in the place they already look.


System 1: Natural Language → SQL KPI Engine

Managers were drowning in Excel. I built intent detection that converts plain language to one of 23 pre-validated query templates.

// Intent detection — NOT fine-tuned, just prompt engineering
$intent = llm_detect_intent($query, $schema_context);

// Map to one of 23 query templates (not free-form SQL generation)
$template = QueryRegistry::get($intent['type']);

// Fill params from entity extraction
$params = EntityExtractor::extract($intent['entities'], $date_context);

// Execute against read replica only — never production writes
$result = DB::readReplica()->execute($template, $params);
Enter fullscreen mode Exit fullscreen mode

Why templates instead of full SQL generation from LLM:

Started with full LLM SQL generation. Disaster. Hallucinated JOINs, wrong table names, one query that locked a table for 40 seconds in production.

Switched to template matching. The LLM only does intent classification now. 23 templates cover 94% of real queries. Much safer, much cheaper.


System 2: RAG for HR/Policy Questions

40% reduction in HR tickets. 236 employees asking about schedules, policies, payroll.

The context size mistake everyone makes:

# What everyone does:
context = vector_search(query, top_k=20, max_tokens=4000)

# What actually works:
context = vector_search(query, top_k=5, max_tokens=800)
# Re-rank by recency + exact keyword match
# Add only top 3 chunks
Enter fullscreen mode Exit fullscreen mode

Smaller context → faster response → higher adoption. I measured it.

The stack: 240-page HR manual + policy docs, chunked at 400 tokens with 50-token overlap. No LangChain. After 3 weeks I removed it — too much abstraction over things I needed to control. Replaced with ~200 lines of code I fully understand.


System 3: Audio Meeting Intelligence

Shift handoffs by voice note. Manager leaves note at 11pm. Next manager arrives at 6am.

Voice note → Whisper (local, not API) → Structured extraction → Push to Notion
Enter fullscreen mode Exit fullscreen mode

The prompt pattern that cut hallucinations by 60%:

You are extracting operational intelligence from a restaurant shift handoff.
Extract ONLY:
1. Problems that need action (with urgency: now/today/this-week)
2. Stock alerts
3. Staff incidents
4. Customer complaints needing followup

Format as JSON. If unclear, mark as "needs_clarification".
DO NOT summarize. DO NOT add context. Only factual operational items.
Enter fullscreen mode Exit fullscreen mode

The "DO NOT summarize" instruction is the key. LLMs want to be helpful and add context. For operational data, you want facts only.


System 4: Fraud Detection (Paid for the whole project)

Simple statistical anomaly detection. Not ML. Not neural networks.

  • Rolling 30-day average per employee per shift type
  • Flag transactions > 2.5σ from mean
  • Cross-reference with inventory consumption

Result: €12,000 in suspicious transactions flagged in 6 months.

Not all were fraud — some were data entry errors. But the attention to patterns changed behavior.


Everything I Removed (and why)

Removed Reason
LangChain Too much abstraction, replaced with 200 lines of custom code
Streaming responses Managers started reading mid-sentence, got confused
GPT-4 for everything Expensive + slow. Now: Haiku for classification, Opus for reasoning. Cost -80%
Conversation history > 3 exchanges Context degraded after 3 turns. Truncate aggressively

18-Month Results

Metric Before After
HR tickets/week 45 27
Report generation 2h manual 0 (automated)
KPI query time 20min Excel 180ms
Fraud caught unknown €12,000 / 6 months
Daily AI interactions 0 140+

What I'd do differently

  1. Start with WhatsApp, not web. Would have saved 3 months building an interface nobody used.
  2. Template matching before LLM generation. For structured data queries, always.
  3. Measure adoption from day 1. I didn't track usage for the first 2 months. Flying blind.
  4. Smaller context windows. Instinct is to give LLM more context. Usually wrong.

Want something like this for your business?

I do consulting on production AI systems for SMBs. Not "add ChatGPT to your website" — actual systems that replace manual work.

Typical projects: $500-1500, delivered in 2-4 weeks.

What I can build:

  • Natural language → your database (no more Excel reports)
  • Internal knowledge assistant (HR, policy, training)
  • Meeting/audio intelligence → task extraction
  • Anomaly detection on transaction data

Get in touch — I reply within 24h

Happy to answer any technical questions in the comments.

Top comments (0)