Bokang Sibolla

Posted on Jun 1 • Originally published at bokangsibolla.com

I audited the world's biggest hotel platform. Here is what the AI travel agents are being trained to inherit.

#ai #api #architecture #startup

I run Sola, a travel app for people who move differently from the traveller the industry was built for. While building it, I kept hitting the same wall. The data I wanted to query did not exist. Not because nobody collected it, but because the schema underneath the whole industry never had a field for it.

So on 27 May 2026 I sat down and audited Booking.com. The homepage form, the currency selector, a Bangkok search results page. I wrote down what it accepts and what it refuses. Then I looked at the new AI travel agents shipping on top of it.

Here is what I found, and why it matters to anyone building in this space right now.

The form is the spec

Booking.com's homepage search bar accepts exactly four inputs:

A destination, as a single text field
A check-in date and check-out date, as one range
An occupancy counter, defaulting to "2 adults · 0 children · 1 room"
A search button

That is the spec. An online travel agency (OTA) is a CRUD app over this spec, and Expedia, Agoda, and Hotels.com run the same four fields. Airbnb lets you skip the dates. The destination stays a single field everywhere.

Think about what a spec encodes. The default occupancy is a couple. Not a solo traveller, not a parent with one child, not three generations, not seven people eating from one host's kitchen. The form cannot accept a circuit ("Bangkok, then Hanoi, then Jakarta" forces three separate searches). It cannot accept an open date ("October, not sure which week"). It has no field for the part of a trip where you sleep at family but spend money in restaurants.

When you fill that form, you have not searched. You have submitted to a schema. Most of the world's travellers fail the schema before they fail the search.

The data receipts

I am a builder, so I went for counts, not adjectives. Everything below rendered on the platform on 27 May 2026.

Currencies: 52 offered, about 180 in circulation.

Eight currencies sit featured at the top of the dropdown. On the day I ran it the order was EUR, USD, GBP, AED, SGD, AUD, JPY, PHP. The eighth slot is geo-personalised, and the test IP resolved to the Philippines, which bumped PHP up. The other 44 fall alphabetically below: BRL at position 12, EGP at 19, INR at 25, IDR at 26, MXN at 33, ZAR at 45, KRW at 46, THB at 49.

Now the absences. Currencies for these five economies are not in the dropdown at all:

Nigerian naira (220 million people)
Bangladeshi taka (170 million people)
Vietnamese dong (100 million people)
Kenyan shilling (55 million people)
Ghanaian cedi (33 million people)

That is around 580 million people from five countries alone. Add the unverified gaps across the rest of Africa and Southeast Asia and you clear 600 million people who cannot see a hotel price in their own money. They convert in their heads on every search and pay their bank to settle in a currency the platform never asked about.

Filters: 115 items across 22 groups, and the gaps are the story.

A Bangkok search hands you a wall of filters. You can narrow by price per night, property type (hotel, hostel, ryokan, capsule, villa, motel, boat, and more), bedrooms, bathrooms, review score, free cancellation, breakfast, pool, parking, wifi, airport shuttle, star rating, neighbourhood, distance from centre, hotel chain, bed preference, sauna, massage, happy hour, pets allowed, adults only.

You cannot filter for any of this:

Visa requirement
Halal food or hotel certification
Kosher, jain, vegetarian dietary needs
Prayer room or qibla direction
Community-owned or locally-owned property
Women-only floors or women-safe at night
Walking distance to a mosque, temple, synagogue, or church
Public transit access for travellers who do not drive
A family group larger than six

The words "visa," "halal," "kosher," and "prayer" appear zero times anywhere on the audited page. The filter group named "Travel group" exists. Its two options are "Pets allowed" and "Adults only." The category itself tells you what the platform thinks a travel group is.

The default sort is a paid position.

The default sort is labelled "Our top picks." It reads like a recommendation. In mechanism it is an auction. Booking.com charges hotels a base commission of roughly 10 to 25 percent of the room rate, and Genius and Preferred Partner programs add roughly 3 to 5 percent more in exchange for placement. Partner-facing material confirms that visibility tier, commission rate, and conversion data feed the default sort. The top result is the property that pays most and that the platform has the most data on.

Why the schema never gets fixed

This is the part that matters for builders, because it is an architecture problem, not a UX oversight.

An OTA can only offer a filter the underlying data contains. Below Booking sits the Global Distribution System (GDS) layer: Sabre (operational across American Airlines by 1964), Amadeus (founded 1987 by four European airlines), and Travelport. These were built for airlines and travel agents decades before the consumer internet, and they set the data schema for inventory. Visa status is not in the GDS. Halal-friendliness is not in the GDS. Community ownership is not in the GDS.

Then commission economics locks it. To make 10 to 25 percent per booking work, you need volume, and volume needs standardisation. Adding a "community-owned" filter means a two-room guesthouse in Lagos or Jakarta has to enter custom structured data, and it will not do that without payment, and the margin model has no room to pay. The team in Amsterdam can see exactly what is missing. The business model does not let them fix it.

There are only two real exits. Hotels publish richer structured data, which needs a state mandate or platform leverage. Or the platform infers the missing dimensions from unstructured signals: reviews, photos, web mentions, third-party datasets. Booking can build the second one. They have not.

What the agents inherit

On 6 October 2025, OpenAI announced its Apps SDK at DevDay, letting third-party services run inside ChatGPT. Booking.com was a launch partner. The same four fields, now reachable through a chat box.

The agent layer is the next top of the funnel, and here is the trap. The training data is the previous layer's product. The web that large language models read for travel is the Booking, Expedia, Airbnb, TripAdvisor, and Google Travel corpus, in English, with those platforms' defaults baked in. Ship an agent on that dataset and you ship the dataset's defaults. The schema reproduces itself through the training data.

I watched this happen in miniature. Booking added a "Smart filters" panel that takes natural language ("What are you looking for?"). Ask it for a halal-friendly hotel near a mosque and it returns the closest approximation it can express, which is nothing. The AI layer helps you navigate the 115 filters that exist. It cannot surface a filter that does not exist. The model on top is only as good as the schema underneath.

The break is also visible, which is the hopeful part. Agents can infer structured data from unstructured signals in a way an OTA's margin model never could. An agent can read a forum thread, a travel writer's caption, a comment about a women-friendly hostel, and synthesise an answer the filter set could never produce. Whether the next travel layer breaks the schema or repeats it comes down to who builds it, what they train it on, and which traveller they design for.

The builder takeaway

If you are building anything agentic on top of travel data, three things to sit with:

Audit your inputs as a schema, not a feature list. The absences encode who the product was built for. Count what is missing, not what is offered.
Do not let the training corpus pick your defaults for you. Inherited data ships inherited assumptions. The default sort, the currency list, the occupancy counter: someone chose those, and the choice is now in your weights unless you intervene.
The structural opportunity is inference. The whole reason OTAs cannot serve 600 million people is that the data model has no field for them. Agents can build that field from signals the OTA threw away. That is the actual product.

The traveller Booking.com was built for has been served for thirty years. The next default is being designed right now, in the training runs and the system prompts. The four fields were never neutral. Neither is whatever you build on top of them.

Method: findings come from a live walk-through of Booking.com on 27 May 2026, covering the homepage form, currency selector, and a Bangkok results page. Counts and label wording are taken directly from the platform as it rendered that day. Platforms iterate, so specific positions may have shifted. The structural argument does not depend on any single number. I write more on this at the link in my bio, and I run Sola, which is my attempt to build the field the schema is missing.

Top comments (1)

Harjot Singh • Jun 1

i totally get the frustration with data schemas in the travel industry - it's tough to build something that really meets diverse needs. speaking of building, check out moonshift: it lets you get a full next.js + postgres + auth app deployed in about 7 minutes, and you own the code on your github. how about a free run, no strings attached?