Parveen Asnora

Posted on May 21

Why ChatGPT Cannot Replace Travel Agents — Notes from Building the Backend

#ai #architecture #backend #traveltech

Every few months a tech publication runs some variant of "ChatGPT will replace travel agents." The argument sounds airtight: travel planning is mostly research, LLMs are great at research, therefore the job is done.
I work as a backend developer at MindDMC, an AI itinerary platform built for travel agents and DMCs. When the team started, we tried to build the entire product on top of GPT-4. That approach failed — not because the model was not smart enough, but because we were trying to solve a transactional B2B problem with a generative consumer tool.
The architectural mismatch was the lesson. I think it is worth sharing because the same pattern shows up in healthcare, legal tech, financial services, and any other domain where AI gets pitched as a replacement for human professionals.
This is a technical post about why the architecture of consumer LLMs makes them structurally incapable of doing what a travel agent does — and what a system that can do that job actually needs to look like.

The demo trap
If you have used ChatGPT to plan a vacation, you have probably had the same experience I had the first time: it feels like magic.
You type "plan me a 10-day trip through Italy in October, mid-range budget, mix of cities and countryside." Out comes a beautifully structured day-by-day itinerary. Florence on day 1, Tuscany on day 3, the Amalfi Coast on day 7. Suggested hotels. Restaurant picks. Even a packing list.
This is the demo that has launched a hundred "AI will disrupt travel" think pieces.
The problem is what happens next.
Now imagine you are a travel agent and a client just paid you a USD 200 consultation fee. You need to turn that ChatGPT output into a bookable, contractable, deliverable trip in the next 48 hours. You need:

Real availability for those hotels on those exact dates
Actual pricing, not 2023 estimates pulled from training data
A confirmed transfer service between Florence and Tuscany
A proposal document with the agency's branding
The ability to swap any component without rewriting the whole itinerary
A version the client can approve, after which the booking goes through

ChatGPT cannot do any of those things. Not because it is poorly designed for its job — it is excellent at being a generative writing tool — but because none of those things are what a generative writing tool is built to do.
The failures are architectural. Let me walk through them.

Problem 1: The stateless generation problem

LLMs are stateless text-completion machines. You send tokens in, you get tokens out. There is no persistent state, no transactional layer, no external system being modified.
A travel agent's actual workflow is the opposite. Almost every step modifies external state:

Querying live hotel inventory in a Global Distribution System
Holding a room for 24 hours pending client approval
Confirming a rail seat with Eurostar
Issuing a booking through a wholesaler API
Generating a PNR

None of this exists inside the LLM. The LLM can describe a hotel beautifully, but it has no idea if room 412 at the Hotel Cipriani is available on October 14, and it has no mechanism to find out.

You can bolt on tool use (OpenAI function calling, MCP, agentic frameworks), and I will get to that. But the moment you do, you are no longer building "an LLM solution." You are building a traditional integration architecture where the LLM is one component among many — and the engineering complexity lives in the components the LLM does not provide.

Problem 2: The pricing hallucination problem

This is the one that killed our first prototype.

We asked GPT-4 to generate a 7-day Switzerland itinerary with hotel pricing. It produced gorgeous output — and quoted CHF 380 per night for the Hotel Schweizerhof in Lucerne.

The actual price that week was CHF 740.

That is not a model defect. The training data has a cutoff. Hotel pricing fluctuates daily based on occupancy, season, events, and yield management algorithms run by the hotel chain. Even if the model had been trained on Hotel Schweizerhof's rate card from last year, it would still be wrong today.

For a consumer asking "roughly how much does a week in Switzerland cost?", the hallucinated number is fine — they will check Booking.com anyway. For a travel agent quoting a client, a 50 percent pricing error is catastrophic. It means either you lose the booking when reality catches up, or you eat the difference.

The only fix is to ground every price in a live API call to actual inventory — HotelBeds, WebBeds, Stuba, or whichever wholesaler serves that region. The LLM's job becomes describing what the API returns, not generating the price itself.

This is RAG (Retrieval-Augmented Generation), but with one critical difference: in most RAG use cases the retrieved data is static documents. In travel, the retrieved data is a real-time pricing API response that expires in minutes.

Problem 3: The context window problem at scale

GPT-4 Turbo has a 128k context window. Claude has 200k. These sound enormous until you try to fit a real itinerary into one.

A single bookable 10-day itinerary, fully specified, looks like this:

10 hotel options per night with full details, amenities, cancellation policy, pricing tiers → roughly 80k tokens
Inter-city transfer options (train, car, regional flight) with timetables → 15k tokens
Daily activity options with operating hours, group sizes, weather contingencies → 25k tokens
Restaurant suggestions per location, dietary filters, reservation requirements → 12k tokens
Client preferences, travel history, past bookings → 8k tokens
Agency branding rules, output formatting, compliance disclaimers → 5k tokens

That is 145k tokens before the LLM has done any reasoning. You have already blown through every consumer model's context window.

You can compress, summarize, or use retrieval to load only what is needed for each generation step. But now you are building a multi-stage pipeline with a retrieval system, a state manager, and a planning layer above the LLM. The LLM is one node in a graph, not the product.

Problem 4: The transactional integrity problem

This one is subtle and it is what most "AI travel" startups underestimate.
When a travel agent confirms a booking, three things must happen atomically:

The supplier confirms the room is held
The client's payment authorization is captured
The agent's commission tracking is updated

In database terms, this is a distributed transaction. If step 2 fails after step 1 succeeds, you have a held room with no payment. If step 3 fails after step 2 succeeds, the agent does not get paid for work they delivered.

LLMs do not do transactions. They generate text. To get transactional integrity you need an orchestration layer with rollback semantics, idempotency keys, and reconciliation logic. None of this is something you "prompt your way to."

This is why every serious AI travel platform — including ours — ends up looking architecturally a lot like a traditional B2B SaaS product, with an LLM acting as the natural-language interface to a deterministic backend. The LLM is the steering wheel. The transaction engine is the rest of the car.

Problem 5: The workflow problem

The final issue is the most boring and the most fatal.

A travel agent's deliverable is not a chatbot conversation. It is:

A branded PDF proposal with the agency's logo
A client portal where the customer can approve, modify, or reject the itinerary
An invoicing system tied to the agency's accounting
A reminder system for visa deadlines and check-in dates
A handoff to operations staff if the booking is complex
Post-trip feedback collection

Every one of these is a product surface. ChatGPT does not have any of them. It has a chat window.

You can ask ChatGPT to "format this as a proposal" and it will give you Markdown. That is not a deliverable. A real proposal needs typography, page breaks, image placement, the agency's brand kit, and a downloadable file the client can sign.

A travel agent uses an AI tool the way an architect uses Revit. The tool exists to accelerate a specific workflow, with specific outputs, in a specific business process. A general-purpose chatbot is not that tool.

What B2B travel AI actually needs

After we burned the GPT-4-only prototype, the team redesigned around what we now call the LLM + Travel API hybrid pattern:

An LLM layer for natural-language input parsing and prose generation. This is what GPT-4 and Claude are good at, and we use them for exactly that. Nothing more.
A real-time inventory layer with integrations into wholesale APIs — HotelBeds, RailEurope, WebBeds, Stuba, and regional DMC partners. Every price, every availability check, every booking confirmation goes through here. The LLM never invents these.
An orchestration layer that handles the pipeline: parse user intent, fetch live options, rank them against client preferences, generate the prose description, format the output, and prepare the deliverable.
A workflow layer that handles agency-specific concerns: white-label branding, proposal templates, client approval flow, payment handoff, post-booking operations.

The LLM, in the final architecture, is maybe 15 percent of the system. The other 85 percent is the boring transactional infrastructure that lets the LLM be useful in a business context.

That ratio surprises engineers who come into travel tech thinking the LLM is the product. It is not. The integration plumbing is the product. The LLM is the user interface.

The general pattern

This is not a travel-specific lesson. It is the same architectural pattern that shows up everywhere a generative tool meets a transactional reality:

Legal AI — drafting a contract is generative; making it valid in a jurisdiction is transactional
Medical AI — describing a treatment plan is generative; integrating it with the EHR and prescribing system is transactional
Financial AI — analyzing a portfolio is generative; rebalancing it through brokerage APIs is transactional
HR AI — writing a job description is generative; running it through compliance, ATS, and payroll is transactional

In every one of these domains, the "AI will replace humans" narrative collapses at the same architectural seam: the moment you need to modify the state of the real world, you need an integration layer the LLM cannot provide.

The professionals are not safe because AI is dumb. They are safe because the AI is solving the easiest 15 percent of the job, and the other 85 percent still requires the system around the AI.

What this means if you are building in this space

A few practical takeaways from working on this kind of backend:

Pick the integration first, the model second. The hardest engineering problems in this category are not LLM problems. They are the deterministic boring infrastructure problems. Get those right and the LLM choice becomes interchangeable.

Treat the LLM as a renderer, not a brain. Use it for natural-language input parsing and natural-language output formatting. Do not use it for reasoning over state, computing prices, or making decisions that need to be deterministic.

Ground every claim in a real source. If your LLM is generating a number, an address, a phone, a price, or a date — it needs to be retrieved from an authoritative source, not generated. Hallucinated facts will eventually cost you a customer.

Build for the workflow, not the demo. The demo is the easy part. The work is in the seventeen tiny features that turn a generated text into a deliverable inside a real business process.

We are still early in figuring this out. The B2B AI patterns that will win in the next five years are not going to look like ChatGPT. They are going to look like ChatGPT plus a giant pile of integration code — and the integration code is where the moat is.

If you are working on similar problems in B2B travel or any other domain where LLMs need to meet transactional systems, happy to compare notes. You can find me through minddmc.ai or in the comments below.

Parveen Asnora is a Backend Developer at MindDMC, an AI itinerary platform for travel agents, tour operators, and destination management companies. He works on the integration layer between LLMs and travel inventory APIs.