DEV Community

Cover image for Expedia Cars GraphQL Inspector: How I Turned Expedia's Network Traffic into Instant CSV Exports
Hamza
Hamza

Posted on

Expedia Cars GraphQL Inspector: How I Turned Expedia's Network Traffic into Instant CSV Exports

Inside Expedia's GraphQL: Building a Tool That Captures 700+ Car Rentals in 30 Seconds. Behind every modern website is an API conversation. Once you learn how to listen to it, everything changes.

By Hamza


I got tired of clicking "Load More."

You know the drill: you search for rental cars, the page shows 20 results, and there are 30 more pages hiding behind that little button. Click. Wait. Scroll. Click. Wait. Scroll. Your cursor finger starts hurting. Your soul starts dying.

So I built something better.

Expedia GraphQL Inspector v2.2 is a browser tool that intercepts Expedia's internal API calls, collects every rental car across every page in seconds, and dumps it into a CSV you can open in Excel. It runs in your browser — no Python, no Docker, no headless Chrome, no PhD in web scraping.

This is the story of how it works, piece by piece.


1. Fetch Interception — Catching Data Before It Renders

Every time you visit a modern website, your browser plays middleman. It sends requests to servers, gets back JSON blobs, and hands them to JavaScript that paints pixels on screen.

The trick? Get in the middle before the pixels happen.

The script monkey-patches window.fetch — the browser's built-in HTTP request function — and wraps it with a spy. Every single network request passes through this wrapper. The script inspects each one, asks "is this an Expedia car rental GraphQL query?", and if yes, clones the response JSON before the page ever sees it.

const originalFetch = window.fetch;
window.fetch = function (...args) {
  return originalFetch.apply(this, args).then(async (response) => {
    const cloned = response.clone();
    const json = await cloned.json();
    // Is this a car rental response? If so, save it.
    if (json?.data?.carSearchV3) {
      captureData(json);
    }
    return response;
  });
};
Enter fullscreen mode Exit fullscreen mode

That's the entire foundation in ten lines. The original fetch still works normally — the page never even knows it's being watched. We just take a peek, copy what we need, and let the data flow through.

Why this beats scraping HTML: HTML is messy. It's designed for human eyes, not data extraction. Prices show up as formatted strings ("$123.45"), layout changes break your CSS selectors, and missing fields cause silent errors. Raw API data is pristine — typed, structured, complete. Why reconstruct what the server already sent perfectly?


2. GraphQL — One Endpoint to Rule Them All

If you've only ever worked with REST APIs, GraphQL looks weird at first. Instead of ten different URLs (one for users, one for products, one for search results), GraphQL uses a single endpoint with a query language.

Expedia's car search runs on a single GraphQL endpoint. Every request — first page, next page, sort by price, filter by supplier — hits the same URL with a different query payload inside the POST body.

The script doesn't care about URLs. It cares about operation name. Every GraphQL request has one — a human-readable label like CarSearchV3. The script filters for exactly that:

if (body && body.operationName === "CarSearchV3") {
  // This is the one we want
}
Enter fullscreen mode Exit fullscreen mode

GraphQL responses have a predictable structure too: a data object, then nested edges and nodes, then the actual fields. Once you know the shape, you can walk it blindfolded.

{
  "data": {
    "carSearchV3": {
      "pageInfo": {
        "hasNextPage": true,
        "endCursor": "eyJsYXN0SWQiOjQwfQ=="
      },
      "edges": [
        {
          "node": {
            "vehicleId": "abc123",
            "supplier": { "name": "Enterprise" },
            "pricing": { "total": 245.00 }
          }
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

One query, one shape, infinite cars. Beautiful.


3. Pagination — The Infinite Scroll Trap

Expedia doesn't send you 700 cars at once. That'd be too easy. It sends 25, plus a cursor — an opaque Base64 string that says "the next page starts here."

The page shows a "Show More" button. Click it, and the browser sends a new GraphQL query with the cursor from the previous response.

Manual pagination is torture. Auto-pagination is a solved problem:

  1. Capture the first page's response
  2. Read pageInfo.hasNextPage
  3. Extract pageInfo.endCursor
  4. Replay the same GraphQL query with the cursor injected into variables
  5. Repeat until hasNextPage is false

There's a subtle bug to watch out for: duplicates. Sometimes the same car appears on multiple pages (Expedia's backend shuffles results). The script maintains a Set of vehicleId values and skips any node it's already seen. Deduplication with zero overhead.


4. Replay Requests — Acting Like a Human, Moving Like a Machine

The first page is captured from the user's real search. But what about page 2, 3, 4... 47?

The script replays requests programmatically. It takes the original GraphQL query payload, swaps out the cursor variable, and fires a fresh fetch call. Same endpoint, same headers, same authentication cookies (because the user's real browser session handles that), just a different cursor.

async function fetchPage(cursor) {
  const payload = {
    operationName: "CarSearchV3",
    variables: { 
      ...baseVariables,
      cursor: cursor || null
    },
    query: CAR_SEARCH_QUERY
  };

  const response = await fetch(GRAPHQL_URL, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(payload)
  });

  return response.json();
}
Enter fullscreen mode Exit fullscreen mode

The magic is that the user's session is intact. No login tokens to manage. No cookies to rotate. The browser handles authentication automatically — the script just borrows the connection.

But there's a catch: you can't fire 50 requests instantly. Servers notice. So the script has three speed presets:

Speed Delay Rate
🐢 Slow 4s Conservative, acts like a slow human
🐇 Medium 1.5s Balanced, default setting
🚀 Fast 0.6s Aggressive, for stable connections

Smart Retry: If a request fails (network glitch, server hiccup), the script waits and retries up to 3 times. Exponential backoff would be overkill — a simple 2-second retry window catches 99% of failures.


5. CSV Generation — From Nested JSON to Spreadsheet in One Click

GraphQL returns deeply nested JSON. A single car listing looks like:

{
  "node": {
    "vehicleId": "abc123",
    "supplier": { "name": "Enterprise", "rating": 4.2 },
    "vehicle": {
      "make": "Toyota", "model": "Corolla",
      "type": "Compact", "doors": 4,
      "transmission": "Automatic"
    },
    "pricing": {
      "total": 245.00,
      "currency": "USD",
      "dailyRate": 48.99,
      "fees": [
        { "name": "Airport Surcharge", "amount": 12.50 },
        { "name": "Insurance", "amount": 30.00 }
      ]
    },
    "pickup": { "date": "2026-07-15", "time": "10:00" },
    "dropoff": { "date": "2026-07-20", "time": "10:00" }
  }
}
Enter fullscreen mode Exit fullscreen mode

Your boss doesn't want that. Your boss wants columns: Vehicle, Supplier, Total Price, Daily Rate, Pickup Date, Dropoff Date.

The script flattens this nested structure with a recursive extractor. It walks the JSON tree, follows known paths (defined in a field map), and builds a flat array of values.

const fieldMap = [
  ["vehicleId"],
  ["supplier", "name"],
  ["supplier", "rating"],
  ["vehicle", "make"],
  ["vehicle", "model"],
  ["pricing", "total"],
  ["pricing", "dailyRate"],
  ["pickup", "date"],
  ["dropoff", "date"]
];
Enter fullscreen mode Exit fullscreen mode

Then it serializes to CSV using the standard RFC 4180 format — handles commas inside values, wraps strings in quotes, escapes double quotes. No library needed, just careful string building.

The download happens via a classic browser trick:

const blob = new Blob([csvContent], { type: "text/csv" });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = `expedia-cars-${Date.now()}.csv`;
a.click();
URL.revokeObjectURL(url);
Enter fullscreen mode Exit fullscreen mode

No server involved. The CSV is generated entirely in the browser and downloaded instantly. Private data never touches a third party.


Putting It All Together: The 30-Second Workflow

  1. Search for cars on Expedia like you normally would
  2. The script starts capturing automatically — you see a counter tick up
  3. Click ↻ Auto (the gold button) — the progress bar appears
  4. Watch it fly through 30+ pages in about 30 seconds
  5. A CSV downloads automatically — 700+ cars, structured, ready

That's the difference between tapping the source directly vs. scraping rendered HTML. 30 seconds vs. 15 minutes. No Python setup, no Playwright config, no CAPTCHA headaches.


What I Learned Building This

Fetch interception is absurdly powerful. A dozen lines of monkey-patching gives you visibility into everything a website does. It's like having X-ray vision for network requests.

GraphQL makes scraping easier, not harder. One query shape, transparent variable passing, and deterministic responses. If you understand the one query, you understand the whole API.

Pagination is the bottleneck. Almost every "scraper that's too slow" problem traces back to inefficient pagination. Nail the pagination, and you nail the speed.

Replay with real sessions is king. No proxies, no auth tokens, no session management. The user's browser handles all of that. You just borrow the connection.

CSV is still the universal format. JSON is for machines. CSV is for humans with Excel. Don't overthink it — flatten, serialize, download.


The full source code is available on GitHub. Free, open-source, MIT licensed. Built by me (Hamza), powered by curiosity and an allergy to repetitive clicking.
Youtube link

Expedia GraphQL Inspector v2.2 — because your time is worth more than clicking "Load More" 47 times.

Top comments (8)

Collapse
 
nazar_boyko profile image
Nazar Boyko

One thing that might save you a ban when the fast preset runs into a 429: a flat two second retry can actually dig you deeper, since the server is already asking you to slow down and you knock again at the same beat. When a Retry-After header is present, backing off to whatever it says instead of a fixed wait tends to keep the session alive a lot longer. Everything else here is solid, this is just the one edge that tends to bite after you've pointed the tool at a busy endpoint for a while.

Collapse
 
vinimabreu profile image
Vinicius Pereira

Nice writeup, and keying on operationName instead of the URL is the right instinct, that alone survives a lot of what breaks naive scrapers. Since you asked for the sharp questions, here's where this bites in practice, from someone who reaches for the internal GraphQL over HTML every time.

The thing that kills these isn't the scraping logic, it's drift, and it comes in two flavors. One, the operation itself moves, CarSearchV3 becomes V4 or gets swapped for a persisted-query hash, and your interceptor just silently stops matching. Two, and this is the quieter one, the fieldMap. The moment Expedia renames pricing.total or nests it one level deeper, your recursive walker doesn't throw, it just writes a blank column, and now you've got a clean-looking CSV of 700 cars with an empty price field and no idea anything went wrong. That silent blank is worse than a crash, because a crash you actually notice. If I were hardening this I'd have it assert the shape it expects and shout when a mapped path comes back missing, instead of trusting the fieldMap forever.

The other one is the session-borrowing, which is genuinely the elegant part, no tokens to rotate is a real win. But it has a ceiling on a target as defended as Expedia. Your replayed fetch sends Content-Type and nothing else, while the real UI request carries a pile of client-info, traceparent, persisted-query and similar headers. On a heavily fingerprinted endpoint that difference is exactly what flags a replay, so the pattern that works great today can start getting soft-blocked or served degraded results with no obvious error. Cheapest insurance is to clone the original request's full header set when you replay, not just swap the cursor in the body.

None of that takes away from it, for a browser-side tool on your own session it's a clean approach. Just the two spots I'd watch for when it eventually stops working. Good first post.

Collapse
 
hamza16615 profile image
Hamza

Thanks! I really appreciate this level of feedback. These are exactly the kinds of edge cases that make browser automation so interesting.

You’re absolutely right that long-term maintenance is usually less about the initial extraction logic and more about API drift over time. A few of the points you mentioned are things I’ve already started addressing in version 2.2:

  • Replay headers: the replay no longer sends only Content-Type. It now clones the original request headers as well, while excluding things like Content-Length and Host, which the browser regenerates. That makes the replay much closer to the browser’s original network behavior and noticeably more reliable.

  • Session borrowing:I intentionally rely on the user’s authenticated browser session instead of trying to recreate cookies or tokens. That has been one of the biggest wins in terms of simplicity and avoiding authentication headaches.
    That said, I completely agree with your two bigger concerns.

Operation drift is inevitable. Right now I match on operationName because it is much more stable than URL matching, but if Expedia eventually moves to persisted queries or renames the operation entirely, the interceptor will need a smarter discovery mechanism rather than a hardcoded name.

Schema drift is probably the more dangerous failure mode. Silent blanks are much worse than exceptions because they produce exports that look believable but are wrong. I’m planning to move toward schema validation with required-field assertions so missing critical fields like pricing or supplier become visible failures instead of quietly generating incomplete CSV files.

In the longer term, I’d also like to make the extractor less Expedia-specific by separating:

  • network capture
  • replay engine
  • schema mapping
  • exporters

That way, adapting to future API changes OR even other GraphQL-backed sites, becomes mostly a mapping problem rather than a full rewrite of the pipeline.

I really appreciate you taking the time to write such a thoughtful review. Those are exactly the kinds of comments that help move a project from “works today” to “still works next year.”

Collapse
 
vinimabreu profile image
Vinicius Pereira

Love where you're taking it, and the capture/replay/mapping/exporter split is exactly the seam that makes this survive. One idea for the operation-drift problem you flagged: instead of a smarter way to find the operation by name, discover it by shape. You already know the response you consume looks like data then edges then nodes carrying vehicleId, supplier, pricing, so match on "the response that contains that shape" rather than on the label CarSearchV3. A rename to V4 or a persisted-query hash doesn't touch the shape, so the interceptor keeps working through the exact drift that would break a name match. And it pairs nicely with the required-field assertions you mentioned, since the shape you match on and the shape you validate against are really the same declared contract, one just used to find the response and the other to trust it. Genuinely nice work, following where this goes.

Thread Thread
 
hamza16615 profile image
Hamza

Thanks, that shape-matching idea is genuinely clever and I hadn't thought of it. I just checked the actual response structure in the code and it turns out Expedia doesn't use edges/nodes/vehicleId at all. The real shape is a listings[] array where each listing has things like car.vendor, car.priceSummary.total.price.amount, car.vehicle.description, and car.tripsSaveItemWrapper.tripsSaveItem.attributes for the pickup/return details. So matching on the shape by checking for the combo of priceSummary + vendor + listings would survive an operation rename to V4 or even a persisted query hash, no changes needed. And like you said, those same fields are exactly what I'd validate against anyway, so the finder and the validator would share the same contract by default. I'm going to experiment with that approach, appreciate the insight.

Collapse
 
hamza16615 profile image
Hamza

Thanks for Reading!
If you have questions about GraphQL interception, fetch hooking, pagination, or browser automation, leave a comment. I'll be happy to help.

More developer tools are coming soon.

Collapse
 
frank_signorini profile image
Frank

How did you handle pagination with the GraphQL API to capture 700+ car rentals in 30 seconds? I'm facing a similar issue and would love to swap ideas on this.

Collapse
 
hamza16615 profile image
Hamza

Hi there @frank_signorini, Great question! pagination was the trickiest part. The short answer: index-based pagination where I calculate the next page offset and inject it back into the same GraphQL query.

Expedia's CarSearchV3 doesn't use cursors. The response includes a loadMoreAction.searchPagination object with startingIndex, size, and hasNextPage. Each page returns listings at a given starting index, and the next page starts at currentIndex + listingCount.

Here's the stripped-down version of how it actually works:

async function replayLoadMore(previousEntry) {
  const body = structuredClone(previousEntry.request);
  const vars = body.variables ??= {};
  const secondary = vars.secondaryCriteria ??= {};
  const selections = secondary.selections ??= [];

  const pagination = previousEntry.loadMore?.searchPagination;
  const pageSize = pagination?.size ?? 25;
  const currentIndex = pagination?.startingIndex ?? 0;
  const currentCount = previousEntry.listingCount || pageSize;
  const nextIndex = currentIndex + currentCount;

  // Set the next page index
  setSelection(selections, "selPageIndex", nextIndex);
  setSelection(selections, "selPageCount", pageSize);
  setSelection(selections, "searchId", previousEntry.searchId);

  const response = await fetch(previousEntry.url, {
    method: "POST",
    credentials: "include",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });

  const json = await response.json();

  return {
    listingCount: json.data.carSearchOrRecommendations.carSearchResults.listings.length,
    loadMore: json.data.carSearchOrRecommendations.carSearchResults.loadMoreAction,
    // store the full response too
  };
}
Enter fullscreen mode Exit fullscreen mode

Then the loop:

while (page.loadMore?.searchPagination?.hasNextPage !== false) {
  const next = await replayLoadMore(page);
  // store results, update page tracker
  page = next;

  // delay so you don't overwhelm the API
  await new Promise(r => setTimeout(r, 1500));
}
Enter fullscreen mode Exit fullscreen mode

A couple gotchas I ran into:

  • Deduplication is necessary. The same car can appear on different pages if results shift. I use a composite key of searchId + startingIndex to skip duplicates.
  • Speed matters. I landed on 1.5s between pages — fast enough to be useful, slow enough to avoid rate limits. Three presets (4s, 1.5s, 0.6s) let users tune it based on their connection.
  • Retry on failures. Random 500s happen every ~50 pages. I wrap the fetch with up to 3 retries and exponential backoff.

The key insight: you're replaying the exact same request the browser would send, just with a different page index. Your real session cookies handle auth — no tokens, no proxy, no CAPTCHA.

Happy to dive deeper! What API are you working with?