I've been building a travel-planning agent and kept hitting the same wall. There's no official Google Flights API. QPX Express died in 2018, ITA Software is enterprise-only, and most of the "alternatives" I tried either returned stale data or wanted a five-figure contract before I could even run a test query.
For an agent that's supposed to actually book something cheap, none of that works. The agent needs one function call and structured JSON back, and it needs the call to succeed pretty much every time. So I built one.
By the end of this guide you'll have a Python module an agent can call as a tool. It scrapes Google Flights directly when it can (free, fast, one HTTP call) and falls back to SearchAPI.io's Google Flights endpoint when the scrape comes back empty. Same return shape either way, so the agent doesn't have to branch.
Code is in the companion repo.
Contents
- Why this is the right shape for an agent
- Setup
- Step 1: Encode the search as a
tfsparameter - Step 2: Fetch the page with the right cookies
- Step 3: Parse flights from aria-labels
- Where the scraper breaks (and why agents need a fallback)
- Wiring in SearchAPI as the fallback
- The dispatcher: one function for the agent to call
- Exposing it as a tool to your agent
- The complete
flights.py - FAQ
Why this is the right shape for an agent
Agents don't tolerate "sometimes it works." A tool that returns flight data on Tuesday and a blank page on Wednesday will poison the agent's reasoning. It'll either hallucinate around the gap or give up. So the goal here is one function the agent can call with the same inputs and get the same kind of output every time, no matter what's happening on Google's side.
That's why there are two layers. The free scrape path is a single httpx GET to Google Flights, no headless browser, no proxy stack. It works for a decent chunk of routes. When it doesn't (and we'll see why in a bit), the same function quietly falls back to SearchAPI and returns the same dataclass. The agent calls one function. It never has to know which path ran.
A few things about Google Flights make the scrape side weird:
- The search query lives in a base64-encoded protobuf. The URL looks like
google.com/travel/flights/search?tfs=GiQSCjIwMjYtMDYtMTU..., wheretfsis a hand-rolled binary message, not a normal querystring. - Unconsented requests hit a cookie wall. Without the right consent cookies, Google bounces you to "Before you continue" and Flights never loads.
- CSS class names rotate per deploy. Anything anchored to
.gws-flights__result-rowwill rot the next time Google ships a build. We parsearia-labelattributes instead, since those are stable for accessibility reasons.
Setup
This guide assumes Python 3.10 or newer. Check yours with python3 --version.
mkdir flight-tool
cd flight-tool
python3 -m venv .venv
source .venv/bin/activate
pip install httpx selectolax
Windows: use
.venv\Scripts\activate.
Create one file called flights.py. Everything goes in this one file — we add code to the bottom as we build.
Step 1: Encode the search as a tfs parameter
The tfs parameter is a base64url-encoded protobuf with this rough schema (reverse-engineered from public Google Flights URLs):
message Info {
repeated Leg legs = 3; // one per slice
int32 seat_class = 8; // 1=economy 2=premium 3=business 4=first
repeated int32 passengers = 9; // 1=adult 2=child 3=infant_seat 4=infant_lap
int32 trip_type = 19; // 1=round 2=oneway 3=multi
}
message Leg {
string date = 2; // YYYY-MM-DD
repeated Airport origin = 13;
repeated Airport destination = 14;
}
message Airport {
int32 loc_type = 1; // 1=airport (IATA), 2=city
string code = 2; // IATA code
}
You don't need a full protobuf library. Two primitives do all the work: varint for integers, length-prefixed bytes for strings and nested messages.
# flights.py — Step 1: protobuf encoding
import base64
from datetime import date, timedelta
def _varint(n: int) -> bytes:
out = bytearray()
while True:
b = n & 0x7F
n >>= 7
if n:
out.append(b | 0x80)
else:
out.append(b)
return bytes(out)
def _tag(field_num: int, wire_type: int) -> bytes:
return _varint((field_num << 3) | wire_type)
def _f_varint(field_num: int, value: int) -> bytes:
return _tag(field_num, 0) + _varint(value)
def _f_bytes(field_num: int, value: bytes) -> bytes:
return _tag(field_num, 2) + _varint(len(value)) + value
def _f_string(field_num: int, value: str) -> bytes:
return _f_bytes(field_num, value.encode("utf-8"))
def _encode_airport(code: str) -> bytes:
return _f_varint(1, 1) + _f_string(2, code)
def _encode_leg(origin: str, destination: str, date_str: str) -> bytes:
return (
_f_string(2, date_str)
+ _f_bytes(13, _encode_airport(origin))
+ _f_bytes(14, _encode_airport(destination))
)
def encode_trip(origin: str, destination: str, depart_date: str,
return_date: str | None = None) -> str:
buf = bytearray()
buf += _f_bytes(3, _encode_leg(origin, destination, depart_date))
if return_date:
buf += _f_bytes(3, _encode_leg(destination, origin, return_date))
buf += _f_varint(8, 1) # seat_class = economy
buf += _f_varint(9, 1) # passengers = [adult]
buf += _f_varint(19, 1 if return_date else 2)
return base64.urlsafe_b64encode(bytes(buf)).rstrip(b"=").decode("ascii")
if __name__ == "__main__":
depart = (date.today() + timedelta(days=30)).isoformat()
ret = (date.today() + timedelta(days=37)).isoformat()
tfs = encode_trip("SFO", "JFK", depart, ret)
print(f"https://www.google.com/travel/flights/search?tfs={tfs}")
Run it, copy the URL into a browser. If you see Google Flights load your SFO → JFK search, the encoding worked.
Step 2: Fetch the page with the right cookies
# flights.py — Step 2: fetching
import httpx
SEARCH_URL = "https://www.google.com/travel/flights/search"
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
# Static consent cookies — these bypass the "Before you continue" gate.
COOKIES = {
"CONSENT": "YES+cb",
"SOCS": "CAESHAgBEhJnd3NfMjAyNDA0MDgtMF9SQzIaAmVuIAEaBgiAvMqxBg",
}
def fetch_html(tfs: str, hl: str = "en", currency: str = "USD") -> str:
params = {"tfs": tfs, "hl": hl, "curr": currency, "gl": "US"}
with httpx.Client(timeout=20.0) as client:
resp = client.get(SEARCH_URL, params=params, headers=HEADERS, cookies=COOKIES)
resp.raise_for_status()
return resp.text
Those cookies aren't tied to an account. They're static values that say "this user has dismissed the consent dialog." Without them you get redirected and the page never loads flight data.
Update __main__ to test it:
if __name__ == "__main__":
depart = (date.today() + timedelta(days=30)).isoformat()
ret = (date.today() + timedelta(days=37)).isoformat()
tfs = encode_trip("SFO", "JFK", depart, ret)
html = fetch_html(tfs)
print(f"HTML length: {len(html):,}")
A few MB of HTML means Google server-rendered the page. Tiny response (~50KB) means the consent gate caught you — double-check the cookies.
Step 3: Parse flights from aria-labels
Google's flight rows have aria-labels that look like this:
From 625 US dollars round trip total.
1 stop flight with American.
Leaves San Francisco International Airport at 11:49 PM on Monday, June 15
and arrives at John F. Kennedy International Airport at 10:33 AM on Tuesday, June 16.
Total duration 7 hr 44 min.
Layover (1 of 1) is a 32 min layover at Charlotte Douglas International Airport in Charlotte.
... Select flight
Every field we want is in there, sentence-shaped. CSS classes rotate constantly; aria-labels don't, because they're an accessibility contract that screen readers depend on.
# flights.py — Step 3: parsing
import re
from dataclasses import dataclass
from selectolax.parser import HTMLParser
@dataclass
class Flight:
price: str
airline: str
stops: str
departure_airport: str
departure_time: str
arrival_airport: str
arrival_time: str
duration: str
PRICE_RE = re.compile(r"From (\d+(?:,\d+)*) (US dollars|euros|pounds)")
STOPS_RE = re.compile(r"(Nonstop|(\d+) stops?) flight with (.+?)\.")
DEPARTURE_RE = re.compile(r"Leaves (.+?) at (.+?) on")
ARRIVAL_RE = re.compile(r"arrives at (.+?) at (.+?) on")
DURATION_RE = re.compile(r"Total duration (.+?)\.")
def parse_flights(html: str) -> list[Flight]:
tree = HTMLParser(html)
flights = []
for node in tree.css("[aria-label*='Total duration']"):
label = node.attributes.get("aria-label", "")
price = PRICE_RE.search(label)
stops = STOPS_RE.search(label)
dep = DEPARTURE_RE.search(label)
arr = ARRIVAL_RE.search(label)
dur = DURATION_RE.search(label)
if not all((price, stops, dep, arr, dur)):
continue
flights.append(Flight(
price=f"${price.group(1)}",
airline=stops.group(3),
stops=stops.group(1),
departure_airport=dep.group(1),
departure_time=dep.group(2),
arrival_airport=arr.group(1),
arrival_time=arr.group(2),
duration=dur.group(1),
))
return flights
Replace __main__ to run the whole pipeline:
if __name__ == "__main__":
depart = (date.today() + timedelta(days=30)).isoformat()
ret = (date.today() + timedelta(days=37)).isoformat()
tfs = encode_trip("SFO", "JFK", depart, ret)
html = fetch_html(tfs)
for f in parse_flights(html)[:5]:
print(f"{f.price:>8} {f.airline:<12} {f.stops:<10} {f.duration}")
You should see:
$407 American 1 stop 7 hr 44 min
$552 Frontier 1 stop 10 hr 41 min
$612 American Nonstop 5 hr 49 min
$797 Delta Nonstop 5 hr 32 min
$518 American 1 stop 7 hr 48 min
Where the scraper breaks (and why agents need a fallback)
The scraper works. The catch is which queries Google actually answers with HTML.
Google server-renders Flights for some queries and silently doesn't for others, and it's not stable. Running the same route twice an hour apart can give you flights one time and an empty JS shell the next. Domestic vs international isn't the rule either. I've seen JFK → NRT and MIA → GRU return data, and LAX → LHR flip between full results and nothing on consecutive runs.
For a human running a script once, that's annoying. For an agent in a loop, it's a real problem. The agent can't reason about "the tool sometimes returns nothing for unclear reasons." It'll either retry forever or just invent results to fill the gap.
So the scrape stays as the cheap path, but for the routes Google won't serve we need a path that always returns something.
Wiring in SearchAPI as the fallback
SearchAPI's Google Flights endpoint does the rendering and parsing on its end and returns JSON. From your code, it's one HTTP call.
Sign up at searchapi.io, grab your API key, export it:
export SEARCHAPI_API_KEY="your_key_here"
Windows:
set SEARCHAPI_API_KEY=your_key_here
Add this to flights.py:
# flights.py — Step 4: SearchAPI fallback
import os
def search_via_searchapi(origin: str, destination: str, depart: str,
return_date: str | None = None, *, api_key: str) -> dict:
params = {
"engine": "google_flights",
"api_key": api_key,
"departure_id": origin,
"arrival_id": destination,
"outbound_date": depart,
"flight_type": "round_trip" if return_date else "one_way",
"travel_class": "economy",
"currency": "USD",
"gl": "us",
"hl": "en",
"adults": 1,
}
if return_date:
params["return_date"] = return_date
resp = httpx.get("https://www.searchapi.io/api/v1/search",
params=params, timeout=30.0)
resp.raise_for_status()
return resp.json()
def _format_minutes(mins: int) -> str:
h, m = divmod(int(mins or 0), 60)
if h and m:
return f"{h} hr {m} min"
return f"{h} hr" if h else f"{m} min"
def map_searchapi(entry: dict) -> Flight | None:
legs = entry.get("flights") or []
if not legs:
return None
first, last = legs[0], legs[-1]
return Flight(
price=f"${int(entry.get('price') or 0):,}",
airline=first.get("airline", ""),
stops="Nonstop" if len(legs) == 1
else f"{len(legs) - 1} stop{'s' if len(legs) > 2 else ''}",
departure_airport=first.get("departure_airport", {}).get("name", ""),
departure_time=first.get("departure_airport", {}).get("time", ""),
arrival_airport=last.get("arrival_airport", {}).get("name", ""),
arrival_time=last.get("arrival_airport", {}).get("time", ""),
duration=_format_minutes(entry.get("total_duration", 0)),
)
The thing to notice here is that map_searchapi returns the same Flight dataclass the scraper produces. Same shape, same fields. That's what lets the agent ignore which path ran.
The dispatcher: one function for the agent to call
This is the function the agent actually calls. Scrape first (free), fall back to SearchAPI when empty:
# flights.py — Step 5: the dispatcher
def search_with_fallback(origin: str, destination: str, depart: str,
return_date: str | None = None, *,
searchapi_key: str | None = None
) -> tuple[list[Flight], str]:
tfs = encode_trip(origin, destination, depart, return_date)
html = fetch_html(tfs)
flights = parse_flights(html)
if flights:
return flights, "scrape"
if not searchapi_key:
return [], "scrape"
data = search_via_searchapi(origin, destination, depart, return_date,
api_key=searchapi_key)
flights = [
f for f in (
map_searchapi(e)
for e in data.get("best_flights", []) + data.get("other_flights", [])
)
if f is not None
]
return flights, "searchapi"
The return value includes which path ran ("scrape" or "searchapi"), which is handy for logging and cost tracking. The agent itself doesn't need to look at it.
Exposing it as a tool to your agent
The function above is already shaped how an agent wants it (typed args, structured return), but most frameworks want a JSON schema for tool definitions. Here's the bridge for Claude tool use:
search_flights_tool = {
"name": "search_flights",
"description": "Search for live flight prices and itineraries. Returns a list of flights with price, airline, stops, departure/arrival airports and times, and total duration.",
"input_schema": {
"type": "object",
"properties": {
"origin": {"type": "string", "description": "IATA airport code, e.g. 'SFO'"},
"destination": {"type": "string", "description": "IATA airport code, e.g. 'JFK'"},
"depart_date": {"type": "string", "description": "YYYY-MM-DD"},
"return_date": {"type": "string", "description": "YYYY-MM-DD, omit for one-way"},
},
"required": ["origin", "destination", "depart_date"],
},
}
def run_search_flights(args: dict) -> list[dict]:
flights, _ = search_with_fallback(
args["origin"], args["destination"], args["depart_date"],
args.get("return_date"),
searchapi_key=os.environ.get("SEARCHAPI_API_KEY"),
)
return [f.__dict__ for f in flights]
The agent calls search_flights, the dispatcher decides whether to scrape or fall back, and the agent gets a clean list of dicts. From the model's side there's one tool that always works.
The same pattern slots into OpenAI function calling or LangChain tools without much change. If you're running the agent server-side and don't want the SearchAPI key floating around every agent process, I host search_with_fallback behind a small Flask endpoint on a VPS and give each agent an internal URL. One key, one place to rotate it.
The complete flights.py
# flights.py
import base64
import os
import re
from dataclasses import dataclass
from datetime import date, timedelta
import httpx
from selectolax.parser import HTMLParser
# --- Step 1: protobuf encoding ---
def _varint(n: int) -> bytes:
out = bytearray()
while True:
b = n & 0x7F
n >>= 7
if n:
out.append(b | 0x80)
else:
out.append(b)
return bytes(out)
def _tag(field_num: int, wire_type: int) -> bytes:
return _varint((field_num << 3) | wire_type)
def _f_varint(field_num: int, value: int) -> bytes:
return _tag(field_num, 0) + _varint(value)
def _f_bytes(field_num: int, value: bytes) -> bytes:
return _tag(field_num, 2) + _varint(len(value)) + value
def _f_string(field_num: int, value: str) -> bytes:
return _f_bytes(field_num, value.encode("utf-8"))
def _encode_airport(code: str) -> bytes:
return _f_varint(1, 1) + _f_string(2, code)
def _encode_leg(origin: str, destination: str, date_str: str) -> bytes:
return (
_f_string(2, date_str)
+ _f_bytes(13, _encode_airport(origin))
+ _f_bytes(14, _encode_airport(destination))
)
def encode_trip(origin: str, destination: str, depart_date: str,
return_date: str | None = None) -> str:
buf = bytearray()
buf += _f_bytes(3, _encode_leg(origin, destination, depart_date))
if return_date:
buf += _f_bytes(3, _encode_leg(destination, origin, return_date))
buf += _f_varint(8, 1)
buf += _f_varint(9, 1)
buf += _f_varint(19, 1 if return_date else 2)
return base64.urlsafe_b64encode(bytes(buf)).rstrip(b"=").decode("ascii")
# --- Step 2: fetching ---
SEARCH_URL = "https://www.google.com/travel/flights/search"
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
COOKIES = {
"CONSENT": "YES+cb",
"SOCS": "CAESHAgBEhJnd3NfMjAyNDA0MDgtMF9SQzIaAmVuIAEaBgiAvMqxBg",
}
def fetch_html(tfs: str, hl: str = "en", currency: str = "USD") -> str:
params = {"tfs": tfs, "hl": hl, "curr": currency, "gl": "US"}
with httpx.Client(timeout=20.0) as client:
resp = client.get(SEARCH_URL, params=params, headers=HEADERS, cookies=COOKIES)
resp.raise_for_status()
return resp.text
# --- Step 3: parsing ---
@dataclass
class Flight:
price: str
airline: str
stops: str
departure_airport: str
departure_time: str
arrival_airport: str
arrival_time: str
duration: str
PRICE_RE = re.compile(r"From (\d+(?:,\d+)*) (US dollars|euros|pounds)")
STOPS_RE = re.compile(r"(Nonstop|(\d+) stops?) flight with (.+?)\.")
DEPARTURE_RE = re.compile(r"Leaves (.+?) at (.+?) on")
ARRIVAL_RE = re.compile(r"arrives at (.+?) at (.+?) on")
DURATION_RE = re.compile(r"Total duration (.+?)\.")
def parse_flights(html: str) -> list[Flight]:
tree = HTMLParser(html)
flights = []
for node in tree.css("[aria-label*='Total duration']"):
label = node.attributes.get("aria-label", "")
price = PRICE_RE.search(label)
stops = STOPS_RE.search(label)
dep = DEPARTURE_RE.search(label)
arr = ARRIVAL_RE.search(label)
dur = DURATION_RE.search(label)
if not all((price, stops, dep, arr, dur)):
continue
flights.append(Flight(
price=f"${price.group(1)}",
airline=stops.group(3),
stops=stops.group(1),
departure_airport=dep.group(1),
departure_time=dep.group(2),
arrival_airport=arr.group(1),
arrival_time=arr.group(2),
duration=dur.group(1),
))
return flights
# --- Step 4: SearchAPI fallback ---
def search_via_searchapi(origin: str, destination: str, depart: str,
return_date: str | None = None, *, api_key: str) -> dict:
params = {
"engine": "google_flights",
"api_key": api_key,
"departure_id": origin,
"arrival_id": destination,
"outbound_date": depart,
"flight_type": "round_trip" if return_date else "one_way",
"travel_class": "economy",
"currency": "USD",
"gl": "us",
"hl": "en",
"adults": 1,
}
if return_date:
params["return_date"] = return_date
resp = httpx.get("https://www.searchapi.io/api/v1/search",
params=params, timeout=30.0)
resp.raise_for_status()
return resp.json()
def _format_minutes(mins: int) -> str:
h, m = divmod(int(mins or 0), 60)
if h and m:
return f"{h} hr {m} min"
return f"{h} hr" if h else f"{m} min"
def map_searchapi(entry: dict) -> Flight | None:
legs = entry.get("flights") or []
if not legs:
return None
first, last = legs[0], legs[-1]
return Flight(
price=f"${int(entry.get('price') or 0):,}",
airline=first.get("airline", ""),
stops="Nonstop" if len(legs) == 1
else f"{len(legs) - 1} stop{'s' if len(legs) > 2 else ''}",
departure_airport=first.get("departure_airport", {}).get("name", ""),
departure_time=first.get("departure_airport", {}).get("time", ""),
arrival_airport=last.get("arrival_airport", {}).get("name", ""),
arrival_time=last.get("arrival_airport", {}).get("time", ""),
duration=_format_minutes(entry.get("total_duration", 0)),
)
# --- Step 5: the dispatcher ---
def search_with_fallback(origin: str, destination: str, depart: str,
return_date: str | None = None, *,
searchapi_key: str | None = None
) -> tuple[list[Flight], str]:
tfs = encode_trip(origin, destination, depart, return_date)
html = fetch_html(tfs)
flights = parse_flights(html)
if flights:
return flights, "scrape"
if not searchapi_key:
return [], "scrape"
data = search_via_searchapi(origin, destination, depart, return_date,
api_key=searchapi_key)
flights = [
f for f in (
map_searchapi(e)
for e in data.get("best_flights", []) + data.get("other_flights", [])
)
if f is not None
]
return flights, "searchapi"
if __name__ == "__main__":
depart = (date.today() + timedelta(days=30)).isoformat()
ret = (date.today() + timedelta(days=37)).isoformat()
api_key = os.environ.get("SEARCHAPI_API_KEY")
flights, source = search_with_fallback(
"LAX", "LHR", depart, ret,
searchapi_key=api_key,
)
print(f"Got {len(flights)} flights from {source}\n")
for f in flights[:5]:
print(f"{f.price:>8} {f.airline:<12} {f.stops:<10} {f.duration}")
FAQ
Is there an official Google Flights API?
No. QPX Express was shut down in 2018 and ITA Software is gated behind enterprise agreements. Scraping and third-party APIs are the only practical options.
Is it legal to scrape Google Flights?
It violates Google's terms of service. Fine for personal projects and learning. For anything commercial or public-facing — including an agent product you ship to users — use the licensed third-party API path.
Do I need a headless browser?
No. The scrape is a single HTTP call plus regex; the SearchAPI path is a single HTTP call to a JSON endpoint. Agents stay fast.
Why does the scraper only work for some routes?
Google server-renders Flights HTML for some queries and not others, with no clean rule for which is which. When it doesn't, the response is a JS shell with no flight data in the HTML. No scraping workaround exists short of rendering the page yourself — which is exactly what the SearchAPI fallback does for you.
How fresh is the data?
Live in both paths. Both return prices at the moment of the request.
Can I get one-way or multi-city?
Both, via the protobuf trip_type field (one-way = 2, multi-city = 3) or the SearchAPI flight_type parameter (one_way, multi_city).
What about non-English locales?
The scraper as written is English-only — the parser anchors on phrases like "flight with" and "Total duration". SearchAPI supports the full set of hl and gl locales.
Rate limits?
The scraper hits Google's bot detection if you push it (a handful of requests per minute from one IP is usually fine). SearchAPI limits depend on your plan — handle 429 with exponential backoff.
Wrapping up
That's the whole thing. One function the agent calls, two paths underneath, same dataclass either way. Drop search_flights_tool into your tool list and the agent has live flight data.
A few things I'd add next if I were extending this:
- Pull carry-on, checked-bag, and emissions from the aria-labels (they're in there)
- Cache results in SQLite keyed on
(origin, destination, depart, return)for a few minutes, because agents tend to re-query the same routes - Move the dispatcher behind a Flask endpoint on a VPS so multiple agents can share one key
Code and a fuller CLI/web wrapper: github.com/SamJale/google-flights-api.
SearchAPI Google Flights docs.
Top comments (0)