Building a Conversational AI Interface for Travel Data

#ai #nlp #travel #webdev

When we built the AI chat layer for eSIMDB AI, the core challenge was deceptively simple to state: take a user's natural language trip description and translate it into a structured database query across 15,000+ plans. Here's how we approached it and what we learned.

Why Not Just Use Filters?

The first version of eSIMDB had a traditional filter-based UI: dropdowns for destination, data size, validity, budget. It worked but had a UX problem: travelers don't think in terms of database query parameters. They think "I'm going to Japan for 10 days and I work remotely so I need a hotspot-capable plan under €20."

Translating that thought into correct filter selections required the user to know that "Japan" maps to country code JP, that "10 days" should be entered as validity, that "under €20" needs a price_max parameter, and that "hotspot-capable" means filtering the tethering_allowed boolean.

Users made mistakes, missed filters, and got suboptimal results. Conversion was poor because users weren't confident they'd set filters correctly.

The NLP Extraction Layer

The core of the AI system is a query understanding model that extracts structured intent from unstructured input:

class TripQuery:
    countries: list[str]          # ["JP"]
    duration_days: int            # 10
    data_gb: float | None         # None (unspecified)
    budget_eur: float | None      # 20.0
    hotspot_required: bool        # True
    sim_count: int                # 1
    requires_5g: bool             # False
    voip_required: bool           # False

def extract_query(user_input: str) -> TripQuery:
    ...

We tried several approaches for this extraction:

Rule-based regex: Fast but brittle. "10 days in Japan" worked; "a week and a bit in JP" didn't.
GPT-4 with structured output: Good accuracy, expensive per-request, latency too high for real-time UI.
Fine-tuned small LLM (our current approach): We fine-tuned a 3B parameter model specifically on travel intent extraction. It runs in ~50ms, handles ambiguity well, and is cheap to operate.

Training data was the critical investment — we labeled 10,000+ example queries spanning real user phrasings, edge cases, and deliberate ambiguities.

Handling Ambiguity

Real user queries are often ambiguous:

"Paris" → city (map to France, FR) or person's name?
"10 days" → validity must be ≥10 or exactly 10?
"unlimited" → truly unlimited or just a large data cap?
"I might go to Switzerland too" → hard requirement or optional?

Our approach: extract the high-confidence attributes, use sensible defaults for ambiguous ones, and surface the assumptions transparently: "Searching for France plans with 10+ day validity. Let me know if you also need Switzerland coverage."

For low-confidence extractions, we ask clarifying questions rather than guessing. Better to spend one more round-trip than return irrelevant results.

Plan Retrieval Architecture

Once we have a structured TripQuery, retrieval is a two-stage process:

Stage 1 — Coarse filter: SQL query to get candidate plans matching the hard requirements (country coverage, minimum validity, maximum price if specified, tethering if required). This typically returns 100–500 candidate plans.

Stage 2 — Scoring and ranking: Apply the composite scoring model (price efficiency, reliability, coverage quality, flexibility, features, activation speed) with the user's specific context. Surface top 3–5 with honest tradeoff explanations.

The tradeoff generation uses the LLM — we give it the top 5 scored plans and the original query, and ask it to explain the differences in plain language. This is the piece users find most valuable.

What Didn't Work

Purely vector/embedding-based retrieval: We tried encoding plans as embeddings and doing similarity search against the encoded query. It worked poorly — the dimensions that matter most (data size, country coverage, price) are structured numerical attributes that embeddings don't handle well. The hybrid filter-then-rank approach is better.

Letting the LLM do all the filtering: "Let the model figure it out" is tempting but results in hallucinated plans and prices. Structured retrieval from the database, LLM only for understanding and explanation.

Current State and Open Questions

The system handles 95%+ of real user queries correctly based on our evaluation set. The remaining 5% are mostly exotic edge cases (very unusual countries, highly specific technical requirements).

Open questions we're still working on: how to handle "tell me more about provider X" queries efficiently, better personalization without requiring sign-up, and multi-turn context over longer conversations.

The full product is live at esimdb.ai. Open to architecture discussions in the comments.