Santiago Fernández de Valderrama Aparicio

Posted on Feb 3

How I Built an AI Agent That Handled 90% of Customer Requests Without Human Intervention

#ai #llm #automation #productivity

In early 2024, I had a problem. My phone repair shop was processing thousands of customer inquiries a month across WhatsApp, phone calls, and walk-ins. My team was drowning in repetitive questions: "Is my phone ready?", "Do you have the screen for iPhone 14?", "Can I book for tomorrow at 5pm?"

Twelve months later, an AI agent named Jacobo was handling ~90% of those interactions autonomously. Customers got instant answers. My team focused on actual repairs. And when I sold the business in early 2025, the agent was a key part of what made it sellable.

Here's how I built it.

The Problem: Three Channels, One Bottleneck

Santifer iRepair had been running for 16 years when I started this project. We'd already automated the back-office with Airtable — 12 connected databases handling repairs, inventory, invoicing, the works. But customer communication was still manual.

The pain points:

WhatsApp: Customers expected instant replies. We couldn't deliver.
Phone calls: Staff interrupted mid-repair to answer "what's my repair status?"
Booking: Back-and-forth messages to find a slot that worked.

I needed something that could talk to customers across channels, understand what they wanted, and actually do things — not just generate text.

Architecture: A Router With Specialized Sub-Agents

The breakthrough came when I stopped thinking "chatbot" and started thinking "agent orchestration."

                    ┌─────────────────────┐
                    │   INCOMING REQUEST  │
                    │  (Voice/WhatsApp)   │
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │    MAIN ROUTER      │
                    │  (Intent Classifier)│
                    └──────────┬──────────┘
                               │
        ┌──────────────────────┼──────────────────────┐
        │                      │                      │
┌───────▼───────┐    ┌─────────▼────────┐   ┌────────▼────────┐
│ APPOINTMENTS  │    │    DISCOUNTS     │   │     ORDERS      │
│  Sub-Agent    │    │    Sub-Agent     │   │    Sub-Agent    │
└───────┬───────┘    └─────────┬────────┘   └────────┬────────┘
        │                      │                      │
        └──────────────────────┼──────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │   HITL HANDOFF      │
                    │ (When confidence    │
                    │  is low or          │
                    │  escalation needed) │
                    └─────────────────────┘

Main Router: Every incoming message hits the router first. It classifies intent and delegates to the right sub-agent via tool calling. No giant monolithic prompt trying to do everything.

Sub-Agents: Each one is laser-focused on a single domain:

Appointments: Queries available slots from Airtable, handles booking logic, sends confirmation via WhatsApp
Discounts: Pulls customer history, calculates applicable promos, explains the discount
Orders: Validates stock against inventory DB, creates the order, sends ETA notification

HITL Handoff: When confidence drops below threshold or the customer explicitly asks for a human, Jacobo escalates — but passes the full conversation context so nobody starts from zero.

The Stack

Component	Tool	Why
LLM	Claude API	Best balance of reasoning + tool use at the time
Orchestration	n8n	Visual workflows, easy to debug, self-hosted
WhatsApp	WATI	Clean WhatsApp Business API wrapper
Voice	ElevenLabs	Natural-sounding Spanish TTS
Phone	Aircall	Cloud PBX with good API
Backend/DB	Airtable	Already our source of truth for everything

The key insight: Airtable wasn't just storage — it was the agent's brain. Every sub-agent queried Airtable directly. Customer history, inventory levels, appointment slots — all live data, no sync issues.

Key Technical Decisions (And Why)

1. Tool Calling Over Prompt Stuffing

Early versions tried to cram everything into the system prompt. "Here's how to check inventory, here's how to book appointments, here's our discount rules..."

It was brittle. The model would hallucinate discounts or book non-existent slots.

Tool calling changed everything. Each sub-agent has explicit tools:

check_available_slots(date, service_type) → returns actual slots
create_booking(customer_id, slot_id) → books or fails with reason
calculate_discount(customer_id, service) → returns applicable promo

The model reasons about what to do. The tools handle how. Clean separation.

2. Sub-Agent Specialization Over One Big Agent

A single agent handling appointments, discounts, orders, and general FAQs? That's a recipe for confusion.

Each sub-agent has:

Its own system prompt (focused, ~200 tokens)
Its own tool set (only what it needs)
Its own failure modes (easier to debug)

The router is dumb on purpose. It just classifies and delegates. Complexity lives at the edges.

3. Graceful HITL, Not Graceful Degradation

Some AI systems try to "degrade gracefully" — giving worse answers when uncertain. I took a different approach: escalate early, escalate with context.

When Jacobo wasn't confident:

Customer got a message: "Let me connect you with the team"
Staff got a Slack notification with full conversation history
Average human response time: under 2 minutes

The 10% that needed humans got better service than before, because staff had full context.

Lessons Learned

Start with the most repetitive task. Appointment booking was 40% of all inquiries. Automating that alone bought us massive breathing room.

Your database is your agent's memory. Don't build a separate "AI database." Query what you already have. Airtable's API was fast enough for real-time lookups.

Tool calling > RAG for transactional tasks. RAG is great for knowledge retrieval. But when you need to do things — book, order, check status — tool calling is the architecture.

Measure deflection rate, not just accuracy. "Did the agent answer correctly?" matters less than "Did the customer get what they needed without human help?" We tracked both.

The Outcome

After 12 months in production:

~90% of customer interactions handled without human intervention
Staff spent 70% more time on actual repairs
Customer satisfaction stayed flat (no degradation — that was the goal)
The system became a selling point when I exited the business

What I'd Do Differently

Voice was harder than expected. ElevenLabs sounds great, but latency in the voice → transcription → LLM → TTS loop was noticeable. I'd explore tighter integrations if rebuilding today.

More observability earlier. I added proper logging and trace monitoring late in the project. Should've been day one.

Simpler discount logic. The discount sub-agent had too many edge cases baked into the prompt. Should've moved more logic into deterministic code and kept the LLM for natural language understanding only.

Building Jacobo taught me that AI agents aren't magic — they're systems engineering with an LLM in the middle. The LLM handles the messy human language part. Everything else is APIs, databases, and good old-fashioned software architecture.

The 90% automation wasn't because the AI was brilliant. It was because we picked the right problems, built the right tools, and knew when to hand off to humans.

I'm currently open to AI Product Manager and Forward Deployed Engineer roles. Check my portfolio at santifer.io.

Top comments (1)

Martijn Assie • Feb 4

Strong system design here, this reads like real ops work instead of AI hype!
The router plus narrow sub agents approach is exactly why this scaled without collapsing under edge cases?!
Tool calling tied directly to Airtable as a live source of truth is the real win, that’s where most agents fail when they fake state!!!
The early HITL escalation with full context shows you understood trust and latency better than most teams do...
Tip: next iteration could add simple confidence metrics per intent over time so you can see which sub agent actually leaks work back to humans most often??