Hamza

Posted on Jul 1 • Edited on Jul 4

Expedia Cars GraphQL Inspector: How I Turned Expedia's Network Traffic into Instant CSV Exports

#javascript #automation #productivity #userscript

Inside Expedia's GraphQL: Building a Tool That Captures 700+ Car Rentals in 30 Seconds. Behind every modern website is an API conversation. Once you learn how to listen to it, everything changes.

By Hamza

Article Revision 2.0

I got tired of clicking "Load More."

You know the drill: you search for rental cars, the page shows 25 results, and there are 30 more pages hiding behind that little button. Click. Wait. Scroll. Click. Wait. Scroll. Your cursor finger starts hurting. Your soul starts dying.

So I built something better.

Expedia GraphQL Inspector v2.5 is a browser tool that intercepts Expedia's internal API calls, collects every rental car across every page in seconds, and dumps it into a CSV you can open in Excel. It runs in your browser. No Python, no Docker, no headless Chrome, no PhD in web scraping.

This is the story of how it works, piece by piece.

1. Fetch Interception: Catching Data Before It Renders

Every time you visit a modern website, your browser plays middleman. It sends requests to servers, gets back JSON blobs, and hands them to JavaScript that paints pixels on screen.

The trick? Get in the middle before the pixels happen.

The script monkey-patches window.fetch (the browser's built-in HTTP request function) and wraps it with a spy. Every single network request passes through this wrapper. The script inspects each one, asks "is this an Expedia car rental GraphQL query?", and if yes, clones the response JSON before the page ever sees it.

const originalFetch = window.fetch;

window.fetch = async function (...args) {
  const request = args[0];
  const options = args[1] || {};

  // Skip non-GraphQL URLs immediately
  const url = typeof request === "string" ? request : request.url;
  if (!url.includes("/graphql")) {
    return originalFetch.apply(this, args);
  }

  const response = await originalFetch.apply(this, args);

  // Only check request bodies for POSTs
  let requestBody = null;
  if (options.body) {
    try { requestBody = JSON.parse(options.body); }
    catch { /* not JSON, skip */ }
  }

  // Filter by operation name, not response shape
  if (!requestBody || requestBody.operationName !== "CarSearchV3") {
    return response;
  }

  const json = await response.clone().json();
  captureData(json);
  return response;
};

That's the entire foundation. The original fetch still works normally; the page never even knows it's being watched. We just take a peek, copy what we need, and let the data flow through.

A critical detail: the script sets a __replaying flag during its own programmatic requests. This prevents the hook from re-capturing pages that the auto-loader fetched itself, which would otherwise corrupt the log with duplicates.

Why this beats scraping HTML: HTML is messy. It's designed for human eyes, not data extraction. Prices show up as formatted strings ("$123.45"), layout changes break your CSS selectors, and missing fields cause silent errors. Raw API data is pristine: typed, structured, complete. Why reconstruct what the server already sent perfectly?

2. GraphQL: One Endpoint to Rule Them All

If you've only ever worked with REST APIs, GraphQL looks weird at first. Instead of ten different URLs (one for users, one for products, one for search results), GraphQL uses a single endpoint with a query language.

Expedia's car search runs on a single GraphQL endpoint. Every request (first page, next page, sort by price, filter by supplier) hits the same URL with a different query payload inside the POST body.

The script doesn't care about URLs. It cares about operation name. Every GraphQL request has one: a human-readable label like CarSearchV3. The script filters for exactly that:

if (requestBody.operationName === "CarSearchV3") {
  // This is the one we want
}

GraphQL responses have a predictable structure. The actual Expedia car search response looks like this (no edges, no nodes, no cursors):

{
  "data": {
    "carSearchOrRecommendations": {
      "carSearchResults": {
        "carsShoppingContext": {
          "searchId": "31c6ca70-7764-4452-a228-01f344ad6351"
        },
        "listings": [
          {
            "vehicle": {
              "category": "Economy",
              "description": "Chevrolet Spark or similar",
              "attributes": [
                { "icon": { "description": "transmission" }, "text": "Automatic" },
                { "icon": { "description": "mileage" }, "text": "Unlimited mileage" }
              ]
            },
            "vendor": { "image": { "description": "Budget" } },
            "priceSummary": {
              "lead": { "price": { "amount": 139.73, "currencyInfo": { "code": "USD" } } },
              "total": { "price": { "amount": 188.26, "currencyInfo": { "code": "USD" } } }
            },
            "review": { "rating": "46%" }
          }
        ],
        "loadMoreAction": {
          "searchPagination": {
            "startingIndex": 0,
            "hasNextPage": true,
            "size": 25
          }
        }
      }
    }
  }
}

One query, one shape, infinite cars. Beautiful.

3. Pagination: The Offset Trap

Expedia doesn't send you 700 cars at once. That'd be too easy. It sends 25, plus an offset value that says "the next page starts here."

The page shows a "Show More" button. Click it, and the browser sends a new GraphQL query with an incremented page index in the request variables.

Manual pagination is torture. Auto-pagination is a solved problem:

Capture the first page's response
Read loadMoreAction.searchPagination.hasNextPage
Extract the current startingIndex and size
Replay the same GraphQL query with selPageIndex set to the next offset inside secondaryCriteria.selections
Repeat until hasNextPage is false

There's a subtle bug to watch out for: duplicates. Sometimes the same page offset appears twice (if the user refreshes or the session restores from localStorage). The script maintains a Set of searchId:startingIndex keys and skips any page it's already seen. Deduplication with zero overhead, just a simple Set.has() call (O(1) per page).

4. Replay Requests: Acting Like a Human, Moving Like a Machine

The first page is captured from the user's real search. But what about page 2, 3, 4 through 47?

The script replays requests programmatically. It takes the original GraphQL query payload, deep-clones it, increments the page offset inside secondaryCriteria.selections, and fires a fresh fetch call. Same endpoint, same headers (minus content-length and host), same authentication cookies, just a different page index.

async function replayLoadMore(prevEntry) {
  const body = structuredClone(prevEntry.request);

  const selections = body.variables.secondaryCriteria.selections;
  const nextIndex = prevEntry.loadMore.searchPagination.startingIndex
                  + prevEntry.listingCount;

  // Set the next page offset into the request variables
  setSelection(selections, "selPageIndex", nextIndex);
  setSelection(selections, "searchId", prevEntry.searchId);
  setSelection(selections, "selPageCount", 25);

  const response = await fetch(prevEntry.url, {
    method: "POST",
    credentials: "include",
    headers: { "content-type": "application/json" },
    body: JSON.stringify(body)
  });

  return response.json();
}

The magic is that the user's session is intact. No login tokens to manage. No cookies to rotate. The browser handles authentication automatically. The script just borrows the connection.

But there's a catch: you can't fire 50 requests instantly. Servers notice. So the script has three speed presets with random jitter to avoid pattern detection:

Speed	Base Delay	Jitter Range	Effective Range
Slow	4s	0–3s	4–7s
Medium	1.5s	0–1.5s	1.5–3s
Fast	0.6s	0–0.8s	0.6–1.4s

Smart Retry: If a request fails, the script retries with exponential backoff (base times 2^attempt). For 403 responses, it uses a more aggressive exponential base (times 3^attempt) and respects the server's Retry-After header if present. The timeout is capped at 30 seconds per request.

5. CSV Generation: From Nested JSON to Spreadsheet in One Click

GraphQL returns deeply nested JSON. A single car listing looks like this in the actual Expedia response:

{
  "vehicle": {
    "category": "Compact",
    "description": "Kia Soul or similar",
    "attributes": [
      { "icon": { "id": "transmission" }, "text": "Automatic" },
      { "icon": { "id": "speed" }, "text": "Unlimited mileage" }
    ]
  },
  "vendor": { "image": { "description": "Budget" } },
  "priceSummary": {
    "lead": { "price": { "amount": 139.73, "currencyInfo": { "code": "USD" } } },
    "total": { "price": { "amount": 188.26, "currencyInfo": { "code": "USD" } } }
  },
  "review": { "rating": "46%" },
  "actionableConfidenceMessages": [
    { "value": "Free cancellation" },
    { "value": "Pay at pick-up" }
  ],
  "tripsSaveItemWrapper": {
    "tripsSaveItem": {
      "attributes": {
        "categoryCode": "C",
        "typeCode": "C",
        "transmissionDriveCode": "A",
        "fuelAcCode": "R",
        "searchCriteria": {
          "pickUpDateTime": { "day": 11, "month": 7, "year": 2026, "hour": 10, "minute": 30 },
          "dropOffDateTime": { "day": 12, "month": 7, "year": 2026, "hour": 10, "minute": 30 },
          "pickUpLocation": { "airportCode": "LGA" }
        }
      }
    }
  }
}

Your boss doesn't want that. Your boss wants columns: Supplier, Vehicle, Total Price, Daily Price, Pickup Date, Dropoff Date.

The script extracts values by walking known paths directly, no recursive field map needed:

car.vendor?.image?.description;                     // Supplier
car.vehicle?.description;                            // Vehicle example
car.priceSummary?.total?.price?.amount;              // Total price
car.priceSummary?.lead?.price?.amount;               // Daily price
car.tripsSaveItemWrapper?.tripsSaveItem?.attributes
  ?.searchCriteria?.pickUpDateTime;                  // Pickup date/time

Then it serializes to CSV using the standard RFC 4180 format; it handles commas inside values, wraps strings in quotes, and escapes double quotes. No library needed, just careful string building.

The download happens via a classic browser trick:

const blob = new Blob([csvContent], { type: "text/csv" });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = "expedia_results.csv";
a.click();
URL.revokeObjectURL(url);

No server involved. The CSV is generated entirely in the browser and downloaded instantly. Private data never touches a third party.

The exported CSV includes 18 columns: Provider, Location Code, Pickup Date, Pickup Time, Return Date, Duration Days, Supplier, Supplier Raw, ACRISS code, Vehicle Example, Currency, Total Price, Price Daily, Rating, Transmission, Mileage, Free Cancellation, Pay At Pickup, and Prepaid.

Putting It All Together: The 30-Second Workflow

Search for cars on Expedia like you normally would, then wait for the results page to fully load
The script starts capturing automatically: you see a counter tick up
Click ↻ Auto (the gold button), and the progress bar appears
Watch it fly through 30+ pages in about 30 seconds on Fast speed (longer on Medium/Slow)
A CSV downloads automatically: 700+ cars, structured, ready

That's the difference between tapping the source directly vs. scraping rendered HTML. 30 seconds vs. 15 minutes. No Python setup, no Playwright config, no CAPTCHA headaches.

Note: The auto-loader requires at least one page of search results to be captured first. If you haven't searched yet or clicked Clear, the Auto button will prompt you to perform a search first. Times above are achieved on the Fast speed setting, while the default Medium speed takes roughly 45–90 seconds for 30 pages due to jitter-based delays.

What I Learned Building This

Fetch interception is absurdly powerful. A dozen lines of monkey-patching gives you visibility into everything a website does. It's like having X-ray vision for network requests.

GraphQL makes scraping easier, not harder. One query shape, transparent variable passing, and deterministic responses. If you understand the one query, you understand the whole API.

Offset pagination is simpler than cursors for replay. No base64 tokens to manage, just increment an integer and set it in the request body's selections array.

Replay with real sessions is king, but has constraints. No proxies, no auth tokens, no session management. The user's browser handles all of that. You just borrow the connection. This also means you're locked into being single-user and real-time dependent: close the tab and it stops. That's by design (it's a userscript), but worth knowing if you're considering production use.

Schema versioning is the real maintenance headache. The CSV export traverses specific nested paths like priceSummary.total.price.amount. If Expedia restructures those fields, the export breaks silently. The fetch interception works as long as the CarSearchV3 operation name stays the same, but the extraction logic is coupled to the schema. Duck-typing (matching on shape rather than name) would make this more resilient, something to explore in a future version.

CSV is still the universal format. JSON is for machines. CSV is for humans with Excel. Don't overthink it, just flatten, serialize, and download.

Expedia GraphQL Inspector v2.5 exists because your time is worth more than clicking "Load More" 47 times.

Known Limitations

Before using this in production or adapting it for other sites, be aware of these constraints:

Content Security Policy (CSP): If a site locks down unsafe-eval or inline scripts via CSP, the fetch monkey-patching may be blocked entirely. Expedia doesn't do this, but sites like Airbnb do.
JWT/Session Expiry: The script borrows your browser session. If your authentication token expires mid-extraction (like a 30-minute JWT), subsequent replay requests will return 401s and the extraction stops.
Persisted Queries: Some sites (Airbnb, GitHub) use persisted queries (opaque hashes instead of operation names). The operationName filter won't work there. You would need to match on URL patterns or query hash instead.
Single-User, Real-Time: This is a userscript running in your browser. It's not a server-side daemon. Closing the tab ends the session.
Expedia Car Rentals Only: Only captures the CarSearchV3 query. Hotels, flights, and packages are not supported.

Corrections by Users

Retry-After Support Added

After @nazar_boyko (from Dev.to) pointed out that a fixed retry can actually make a 429 worse, I added Retry-After support to v2.3. The script now reads the server's Retry-After header when it gets hit with rate limiting, and backs off to whatever Expedia actually asks for instead of guessing. If no Retry-After is present it still uses the exponential backoff as before. There was also a quiet bug in fetchWithRetry where non-200 responses would loop back immediately with zero delay, and that's fixed too.

Several Corrections In The Article Added

Thanks to @OnlineProxy for the thoughtful code review. His feedback led to the Known Limitations section above, the schema versioning warning in What I Learned, and several corrections throughout the article. If you are building extractors for other travel sites, his advice on CSP, JWT expiry, and persisted queries is worth listening to.

Source Code

The full source code is available on GitHub. Free, open-source, MIT licensed. Built by me (Hamza), powered by curiosity and an allergy to repetitive clicking.
Youtube link

Top comments (11)

Wren Calloway • Jul 3

The replay step has a fragility you're glossing over: you reconstruct the payload from baseVariables plus a swapped cursor, but Expedia's real client almost certainly sends more than an operation name and query string. GraphQL APIs behind CDNs frequently gate on persisted-query hashes, client-version headers, and signed tokens injected by their own bundle. The moment they rotate the query, add an extensions.persistedQuery.sha256Hash, or start validating a header your hardcoded CAR_SEARCH_QUERY doesn't carry, every replayed page 2+ silently returns an error or an empty edges array — while your captured page 1 still works, so it looks like it succeeded. That's the failure mode worth designing around, not the 429.

Concretely: instead of rebuilding the payload from a stored query constant, capture the actual request options from the intercepted first-page fetch (args[1] in your monkey-patch — body, headers, credentials) and reuse them verbatim, only mutating the cursor inside the parsed variables. Replay what the page really sent, not what you think it sent. It also sidesteps the whole "same authentication cookies" hand-wave, because you're not reasoning about which headers matter — you're copying all of them.

Hamza • Jul 3 • Edited

Thanks for the thoughtful write-up @wrencalloway.
Your general concern is valid, any script that reconstructs GraphQL payloads from hardcoded constants is fragile. But in this case, the actual script already follows the exact approach that you are recommending.

The article you read is a simplified conceptual walkthrough, not the production code. The real Expedia Data Extractor.js does not use CAR_SEARCH_QUERY or baseVariables. Those strings do not exist anywhere in the script.

Here is what actually happens:

The fetch hook captures the entire original request, body, variables, query string, and every single header, straight from the intercepted fetch() call:

request: structuredClone(requestBody)       // full request body verbatim
requestHeaders: captureHeaders(request, options)  // all headers, any format
variables: structuredClone(requestBody.variables) // raw variables
query: requestBody.query                     // the full GraphQL query

Then replayLoadMore deep-clones that captured request and reuses it verbatim. It only touches three pagination variables inside the already-existing structure:

const body = structuredClone(entry.request);     // clone the real captured request
const headers = { ...entry.requestHeaders };       // copy all original headers
// only modify pagination values:
setSelection(selections, "selPageIndex", nextIndex);
setSelection(selections, "searchId", entry.searchId);
setSelection(selections, "selPageCount", pageSize);

Nothing is reconstructed. Nothing is hardcoded. The script replays exactly what the page sent, which means:

If Expedia sends a persistedQuery.sha256Hash in the body, it is captured and replayed
If Expedia sends custom auth tokens or client-version headers, they are captured and replayed
The only things stripped are content-length and host, which the browser would reject or miscalculate on replay anyway

So your recommended approach of "capture the actual request options and reuse them verbatim, only mutating the pagination values" is exactly what the code does. The article just did not reflect that because it was written as a simplified explanation of the concept, not the real implementation.

By the way, full source code is available on GitHub. Free, open-source, MIT licensed.

I appreciate you taking the time to write it up though, it is always good to have extra eyes on the architecture.

Nazar Boyko • Jul 1

One thing that might save you a ban when the fast preset runs into a 429: a flat two second retry can actually dig you deeper, since the server is already asking you to slow down and you knock again at the same beat. When a Retry-After header is present, backing off to whatever it says instead of a fixed wait tends to keep the session alive a lot longer. Everything else here is solid, this is just the one edge that tends to bite after you've pointed the tool at a busy endpoint for a while.

Hamza • Jul 2

Good point! The script actually uses exponential backoff (not flat), so on fast preset it's 800ms, 1600ms, 3200ms with jitter. But you're absolutely right that we never check Retry After headers. Quick fix would be to read that header in fetchWithRetry when we get a 429 and back off to whatever it says. I'll add it. Thanks for the heads up.

Vinicius Pereira • Jul 1

Nice writeup, and keying on operationName instead of the URL is the right instinct, that alone survives a lot of what breaks naive scrapers. Since you asked for the sharp questions, here's where this bites in practice, from someone who reaches for the internal GraphQL over HTML every time.

The thing that kills these isn't the scraping logic, it's drift, and it comes in two flavors. One, the operation itself moves, CarSearchV3 becomes V4 or gets swapped for a persisted-query hash, and your interceptor just silently stops matching. Two, and this is the quieter one, the fieldMap. The moment Expedia renames pricing.total or nests it one level deeper, your recursive walker doesn't throw, it just writes a blank column, and now you've got a clean-looking CSV of 700 cars with an empty price field and no idea anything went wrong. That silent blank is worse than a crash, because a crash you actually notice. If I were hardening this I'd have it assert the shape it expects and shout when a mapped path comes back missing, instead of trusting the fieldMap forever.

The other one is the session-borrowing, which is genuinely the elegant part, no tokens to rotate is a real win. But it has a ceiling on a target as defended as Expedia. Your replayed fetch sends Content-Type and nothing else, while the real UI request carries a pile of client-info, traceparent, persisted-query and similar headers. On a heavily fingerprinted endpoint that difference is exactly what flags a replay, so the pattern that works great today can start getting soft-blocked or served degraded results with no obvious error. Cheapest insurance is to clone the original request's full header set when you replay, not just swap the cursor in the body.

None of that takes away from it, for a browser-side tool on your own session it's a clean approach. Just the two spots I'd watch for when it eventually stops working. Good first post.

Hamza • Jul 1

Thanks! I really appreciate this level of feedback. These are exactly the kinds of edge cases that make browser automation so interesting.

You’re absolutely right that long-term maintenance is usually less about the initial extraction logic and more about API drift over time. A few of the points you mentioned are things I’ve already started addressing in version 2.2:

Replay headers: the replay no longer sends only Content-Type. It now clones the original request headers as well, while excluding things like Content-Length and Host, which the browser regenerates. That makes the replay much closer to the browser’s original network behavior and noticeably more reliable.
Session borrowing:I intentionally rely on the user’s authenticated browser session instead of trying to recreate cookies or tokens. That has been one of the biggest wins in terms of simplicity and avoiding authentication headaches.
That said, I completely agree with your two bigger concerns.

Operation drift is inevitable. Right now I match on operationName because it is much more stable than URL matching, but if Expedia eventually moves to persisted queries or renames the operation entirely, the interceptor will need a smarter discovery mechanism rather than a hardcoded name.

Schema drift is probably the more dangerous failure mode. Silent blanks are much worse than exceptions because they produce exports that look believable but are wrong. I’m planning to move toward schema validation with required-field assertions so missing critical fields like pricing or supplier become visible failures instead of quietly generating incomplete CSV files.

In the longer term, I’d also like to make the extractor less Expedia-specific by separating:

network capture
replay engine
schema mapping
exporters

That way, adapting to future API changes OR even other GraphQL-backed sites, becomes mostly a mapping problem rather than a full rewrite of the pipeline.

I really appreciate you taking the time to write such a thoughtful review. Those are exactly the kinds of comments that help move a project from “works today” to “still works next year.”

Vinicius Pereira • Jul 1

Love where you're taking it, and the capture/replay/mapping/exporter split is exactly the seam that makes this survive. One idea for the operation-drift problem you flagged: instead of a smarter way to find the operation by name, discover it by shape. You already know the response you consume looks like data then edges then nodes carrying vehicleId, supplier, pricing, so match on "the response that contains that shape" rather than on the label CarSearchV3. A rename to V4 or a persisted-query hash doesn't touch the shape, so the interceptor keeps working through the exact drift that would break a name match. And it pairs nicely with the required-field assertions you mentioned, since the shape you match on and the shape you validate against are really the same declared contract, one just used to find the response and the other to trust it. Genuinely nice work, following where this goes.

Hamza • Jul 1

Thanks, that shape-matching idea is genuinely clever and I hadn't thought of it. I just checked the actual response structure in the code and it turns out Expedia doesn't use edges/nodes/vehicleId at all. The real shape is a listings[] array where each listing has things like car.vendor, car.priceSummary.total.price.amount, car.vehicle.description, and car.tripsSaveItemWrapper.tripsSaveItem.attributes for the pickup/return details. So matching on the shape by checking for the combo of priceSummary + vendor + listings would survive an operation rename to V4 or even a persisted query hash, no changes needed. And like you said, those same fields are exactly what I'd validate against anyway, so the finder and the validator would share the same contract by default. I'm going to experiment with that approach, appreciate the insight.

Hamza • Jul 1

Thanks for Reading!
If you have questions about GraphQL interception, fetch hooking, pagination, or browser automation, leave a comment. I'll be happy to help.

More developer tools are coming soon.

Frank • Jul 1

How did you handle pagination with the GraphQL API to capture 700+ car rentals in 30 seconds? I'm facing a similar issue and would love to swap ideas on this.

Hamza • Jul 1

Hi there @frank_signorini, Great question! pagination was the trickiest part. The short answer: index-based pagination where I calculate the next page offset and inject it back into the same GraphQL query.

Expedia's CarSearchV3 doesn't use cursors. The response includes a loadMoreAction.searchPagination object with startingIndex, size, and hasNextPage. Each page returns listings at a given starting index, and the next page starts at currentIndex + listingCount.

Here's the stripped-down version of how it actually works:

async function replayLoadMore(previousEntry) {
  const body = structuredClone(previousEntry.request);
  const vars = body.variables ??= {};
  const secondary = vars.secondaryCriteria ??= {};
  const selections = secondary.selections ??= [];

  const pagination = previousEntry.loadMore?.searchPagination;
  const pageSize = pagination?.size ?? 25;
  const currentIndex = pagination?.startingIndex ?? 0;
  const currentCount = previousEntry.listingCount || pageSize;
  const nextIndex = currentIndex + currentCount;

  // Set the next page index
  setSelection(selections, "selPageIndex", nextIndex);
  setSelection(selections, "selPageCount", pageSize);
  setSelection(selections, "searchId", previousEntry.searchId);

  const response = await fetch(previousEntry.url, {
    method: "POST",
    credentials: "include",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });

  const json = await response.json();

  return {
    listingCount: json.data.carSearchOrRecommendations.carSearchResults.listings.length,
    loadMore: json.data.carSearchOrRecommendations.carSearchResults.loadMoreAction,
    // store the full response too
  };
}

Then the loop:

while (page.loadMore?.searchPagination?.hasNextPage !== false) {
  const next = await replayLoadMore(page);
  // store results, update page tracker
  page = next;

  // delay so you don't overwhelm the API
  await new Promise(r => setTimeout(r, 1500));
}

A couple gotchas I ran into:

Deduplication is necessary. The same car can appear on different pages if results shift. I use a composite key of searchId + startingIndex to skip duplicates.
Speed matters. I landed on 1.5s between pages — fast enough to be useful, slow enough to avoid rate limits. Three presets (4s, 1.5s, 0.6s) let users tune it based on their connection.
Retry on failures. Random 500s happen every ~50 pages. I wrap the fetch with up to 3 retries and exponential backoff.

The key insight: you're replaying the exact same request the browser would send, just with a different page index. Your real session cookies handle auth — no tokens, no proxy, no CAPTCHA.

Happy to dive deeper! What API are you working with?

View full discussion (11 comments)