Patrick T

Posted on Mar 10

How We Built a Flight Search AI That Actually Remembers Your Conversation

#ai #architecture #showdev #systemdesign

Most AI-powered flight search tools have a dirty secret: they're stateless.

You type a query, they fire an API call, they return results. You type a follow-up — "what if I fly into Osaka instead?" — and they fire another API call. Completely fresh. No memory of what you were just looking at.

The result feels like a chatbot that happens to search flights, not an intelligent assistant that's actually helping you find the best option.

We built JetSet AI differently. Here's the architecture decision that changed everything — and why it matters more than the AI model you choose.

The Stateless Problem in AI Agents

When you build an AI tool on top of a standard LLM API, you're working with a fundamentally stateless system. Each API call is independent. To simulate "memory," most tools do one of two things:

Stuff the entire conversation history into the context window on every call
Use a vector database to retrieve "relevant" past messages

Both approaches work up to a point. But they have real limitations for a flight search use case:

Context stuffing gets expensive fast. A flight search conversation generates a lot of tokens — flight results, prices, airline codes, timestamps. Stuffing 10 turns of that into every API call burns through your context window and your budget simultaneously.

Vector retrieval is lossy. When a user says "what about business class for those flights?", the word "those" requires understanding the exact set of flights from the previous message — not a semantically similar set retrieved from a vector store. Approximate memory isn't good enough for transactional queries.

There's a third option that most teams don't consider because it's operationally heavier: give the agent a persistent runtime environment.

The Architecture: Dedicated VMs Per Session

JetSet AI runs on SuperNinja — an AI agent platform where each app instance runs on a dedicated virtual machine rather than a serverless function.

What this means in practice:

Traditional AI tool:
User message → API call → LLM → Response → (state discarded)

JetSet AI:
User message → Persistent VM → LLM with full session state → Response
                    ↑                                              |
                    └──────────── state maintained ───────────────┘

The VM persists for the duration of your session. Your conversation history, your search context, the specific flights you were looking at — all of it lives in memory on the VM, not reconstructed from a database on every turn.

This means:

Zero context reconstruction overhead — the state is just there
Exact reference resolution — "those flights" means exactly those flights
Arbitrary tool state — the flight search tool can maintain its own internal state between calls, not just the conversation history

Why This Changes the User Experience

Here's a concrete example. A user types:

"I want to fly from London to Tokyo in April. Flexible on dates, cheapest option, back by April 28th."

JetSet AI returns 8 live flight options with real pricing. Then the user says:

"What if I fly into Osaka instead?"

In a stateless system, this requires the user to re-specify: dates, return constraint, price preference, stop preference. The system has no idea what "instead" refers to.

In JetSet AI, the VM has the full context. "Instead" resolves correctly. The system knows:

The origin (London)
The date flexibility (April, cheapest)
The return constraint (by April 28th)
The stop preference (from the original query)

It just swaps the destination and re-runs the search. The user never repeats themselves.

Then: "What about business class — how much more is that?"

Again — the VM knows exactly which flights the user is comparing. It returns business class pricing for those specific options, not a generic business class search.

This is the difference between a search tool and a search conversation.

The Technical Stack

The flight data layer uses live pricing APIs — not cached or estimated fares. Every search hits real inventory, which means:

Prices are accurate at time of search
Availability is real
The "Book Now" link goes directly to the booking page with the fare pre-loaded

The LLM layer handles intent parsing — extracting origin, destination, date constraints, flexibility, price limits, stop preferences from natural language. This is where the model choice matters: you need something that handles ambiguous temporal expressions well ("sometime in April," "the first two weeks of March," "cheapest day next month").

The booking handoff is a direct deep-link to the carrier or OTA with the specific fare parameters — no re-entering your search on the other side.

What We Learned About Conversational Search

Building this taught us a few things that aren't obvious from the outside:

1. The hard problem isn't the initial search — it's the refinement.

Getting a good first result is table stakes. The experience that makes users come back is how well the tool handles "what if" questions. That's where persistent state pays off.

2. Temporal flexibility is underserved.

"Cheapest day in April" is a completely normal query that no traditional flight search tool handles well. The price calendar on Google Flights is the closest thing, but it requires you to already be in the right search context. Natural language temporal flexibility — "sometime in the next three weeks," "the cheapest weekend in May" — is a genuinely hard NLP problem that most tools punt on.

3. Direct booking links matter more than you'd think.

Every extra step between "I found a good flight" and "I booked it" is a drop-off point. Tools that show you results but make you go find the booking yourself lose users at the last mile. The handoff has to be seamless.

Try It

If you want to see the persistent context in action, the best way is to try a multi-turn search:

Go to JetSet AI
Type a real flight query with some flexibility: "Cheapest flight from [your city] to [destination] in [month], flexible on dates"
Get results
Then refine: "What about flying a week later?" or "What if I do business class?"

The difference from a stateless tool is immediately obvious. You'll never want to go back to re-typing your constraints.

What's Next

We're working on a few things that push the persistent VM model further:

Price alerts with context — "tell me when this specific route drops below £500" — where "this route" means the one you were just looking at
Multi-city trip planning — building a full itinerary across multiple searches in a single session
Fare history with memory — "is this cheaper than last time I searched this route?"

The persistent runtime makes all of these significantly easier to build than they would be in a stateless architecture.

If you're building AI agents and wrestling with the stateless problem, I'd genuinely recommend thinking about persistent VM-based runtimes before reaching for a vector database. For transactional, multi-turn use cases, the operational overhead is worth it.

JetSet AI is built on SuperNinja — an AI agent platform where every app runs on a dedicated VM for persistent, context-aware conversations. Try it here.