Meir

Posted on Mar 31

Building an AI Browser Agent That Actually Works on Ticket Sites

#webdev #javascript #tutorial #ai

Last month I tried to buy concert tickets. Ticketmaster, StubHub, SeatGeek, VividSeats - I had eight tabs open, each with different prices, different sections, and zero way to compare them without a spreadsheet. I thought: what if an AI agent could just do this for me?

So I built one.Ticket Hunter is an autonomous AI agent that takes a natural-language query like "Taylor Swift Eras Tour NYC," searches Google for ticket platforms, opens real Chromium browsers on up to three sites simultaneously, visually navigates each one - clicking buttons, scrolling, filling forms - and returns structured, price-ranked ticket results. All streamed live to your screen.

Here's exactly how it works, what powers it, and how you can build something similar.

What You'll See in This Article

How Bright Data's Browser API gives AI agents real browsers with built-in anti-bot bypass
How Yutori's n1 model turns screenshots into browser actions — no DOM parsing needed
The full 5-stage LangGraph pipeline that orchestrates everything
Real code from the project with architecture decisions explained
Gotchas I hit and how I solved them

The Tech Stack

Layer	Technology
Framework	Next.js 16 + React 19 + TypeScript
Agent orchestration	LangGraph (from LangChain)
Web search	Bright Data SERP API
Browser infrastructure	Bright Data Browser API
Visual reasoning	Yutori n1 - pixels-to-actions LLM
Structured extraction	OpenRouter (Gemini)
Browser automation	Playwright Core over CDP
Rate limiting	Upstash Redis

The Two Technologies That Make This Possible

Bright Data Browser API: Real Browsers in the Cloud

Here's the core problem with building a browsing agent: ticket platforms don't want bots. Ticketmaster, StubHub, and their peers use aggressive anti-bot measures — CAPTCHAs, browser fingerprinting, IP reputation scoring, JavaScript challenges. Run a standard Puppeteer script against them and you'll get blocked within seconds.

Bright Data's Browser API solves this at the infrastructure level. It provides fully managed Chromium instances in the cloud that you connect to via Chrome DevTools Protocol (CDP). Behind the scenes, Bright Data handles:

Residential proxy rotation across 150M+ IPs in 195 countries
Automatic CAPTCHA solving — no extra code on your side
Browser fingerprint management — real user-agent strings, canvas fingerprints, WebGL hashes
Intelligent session recovery — if a request fails, it retries with a different IP and fingerprint

The connection is a single WebSocket URL:

// Connect Playwright to Bright Data's cloud browser
const browser = await chromium.connectOverCDP(
  "wss://brd-customer-<id>-zone-<zone>:<password>@brd.superproxy.io:9222"
);
const page = await browser.newPage();
await page.goto("https://www.ticketmaster.com/event/...");

From your code's perspective, it's just a Playwright browser. But it's running on Bright Data's infrastructure with all the unblocking built in. The Browser API handles over 100M+ AI agent interactions per day across their customer base — this is battle-tested infrastructure, not a hobby project.

Why this matters for AI agents specifically: When your AI agent decides to click a button or navigate to a page, you can't afford a CAPTCHA popup breaking the flow 15 steps in. Bright Data solves this transparently at the network layer so the agent logic stays clean.

Yutori n1: The "Pixels-to-Actions" Model

Most browser automation works by parsing the DOM — find elements by CSS selectors, extract text from HTML nodes. This breaks constantly. Sites change their markup, use shadow DOM, render content dynamically, or generate randomized class names.

Yutori n1 takes a radically different approach: it looks at screenshots. Built by three former Meta FAIR researchers (Devi Parikh, Abhishek Das, and Dhruv Batra), n1 is a vision-language model specifically trained for web navigation. You give it:

A screenshot of the current page
A text description of the task
History of previous actions

And it returns the next action: click at coordinates (423, 567), type "Taylor Swift", scroll down, etc.

The numbers are compelling:

Benchmark	n1	Claude Opus	GPT-5.4	Gemini 2.5 Pro
Navi-Bench v1 (100 real tasks)	91%	88%	78%	53%
Online-Mind2Web (300 tasks, 136 sites)	78.7%	61.0%	—	69.0%

n1 also runs at 3.6 seconds per step — 2-3x faster than alternatives — and costs $0.75/M input tokens, roughly 8x cheaper than competing vision models. The API follows the OpenAI Chat Completions format, so integration is straightforward:

import OpenAI from "openai";

const n1Client = new OpenAI({
  apiKey: process.env.YUTORI_API_KEY,
  baseURL: "https://api.yutori.com/v1",
});

const response = await n1Client.chat.completions.create({
  model: "n1-latest",
  messages: [
    {
      role: "system",
      content: "You are a web browsing agent specialized in finding ticket information..."
    },
    {
      role: "user",
      content: [
        { type: "text", text: "Find available tickets for Taylor Swift Eras Tour" },
        { type: "image_url", image_url: { url: screenshotDataUrl } }
      ]
    }
  ],
  tools: browserTools, // click, type, scroll, etc.
});

n1 responds with tool calls — left_click at (x, y), type "Taylor Swift", scroll down — which Playwright executes on the Bright Data browser. Then you screenshot again, send the new state back, and repeat.

The 5-Stage Pipeline

The agent runs as a LangGraph state graph with five stages:

                    ┌─────────────────┐
                    │   User Query    │
                    │ "Taylor Swift   │
                    │  Eras Tour NYC" │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  Stage 1: SERP  │
                    │  Search via     │
                    │  Bright Data    │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───┐  ┌──────▼─────┐  ┌────▼────────┐
     │  Stage 2:  │  │  Stage 2:  │  │  Stage 2:   │
     │  Browser 1 │  │  Browser 2 │  │  Browser 3  │
     │ Ticketmstr │  │  StubHub   │  │  SeatGeek   │
     └────────┬───┘  └──────┬─────┘  └────┬────────┘
              │              │              │
     ┌────────▼───┐  ┌──────▼─────┐  ┌────▼────────┐
     │  Stage 3:  │  │  Stage 3:  │  │  Stage 3:   │
     │ n1 Browse  │  │ n1 Browse  │  │  n1 Browse  │
     │ (15 steps) │  │ (15 steps) │  │  (15 steps) │
     └────────┬───┘  └──────┬─────┘  └────┬────────┘
              │              │              │
     ┌────────▼───┐  ┌──────▼─────┐  ┌────▼────────┐
     │  Stage 4:  │  │  Stage 4:  │  │  Stage 4:   │
     │  Extract   │  │  Extract   │  │  Extract    │
     │  Tickets   │  │  Tickets   │  │  Tickets    │
     └────────┬───┘  └──────┬─────┘  └────┬────────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
                    ┌────────▼────────┐
                    │  Stage 5: Merge │
                    │  Deduplicate &  │
                    │  Rank by Price  │
                    └─────────────────┘

Let me walk through each stage.

Stage 1: SERP Search — Finding the Right Platforms

The agent doesn't hardcode URLs. It searches Google via Bright Data's SERP API:

import { BrightDataClient } from "@brightdata/sdk";

const bdclient = new BrightDataClient({ token: process.env.BRIGHT_DATA_API_TOKEN });
const results = await bdclient.scrape({
  url: `https://www.google.com/search?q=${encodeURIComponent(query + " tickets buy")}&brd_json=1`,
  zone: "serp",
  format: "raw",
});

The raw SERP JSON gets parsed through a scoring algorithm that:

Detects known ticket platforms (Ticketmaster, StubHub, SeatGeek, VividSeats, AXS, TickPick, Gametime, Eventbrite, Viagogo, TicketNetwork, Tickets.com)
Scores each URL by relevance — event-specific URLs score higher than homepage results
Checks for query token matches in titles and snippets
Returns the top 3 platform URLs

This means the agent adapts to any event. Search "Lakers vs Celtics" and it finds basketball ticket pages. Search "Hamilton Broadway" and it finds theater listings. No hardcoded routes.

Stage 2: Browser Connection — Opening Real Browsers

For each of the top 3 URLs, the agent opens a Bright Data Browser API session via CDP:

const browser = await chromium.connectOverCDP(cdpUrl);
const context = browser.contexts()[0] ?? await browser.newContext();
const page = context.pages()[0] ?? await context.newPage();
await page.setViewportSize({ width: 1280, height: 800 });

Before navigating, the agent injects a script that prevents popups (ticket sites love target="_blank" links that would break the single-tab agent flow):

// Override target="_blank" to keep navigation in same tab
await page.evaluate(() => {
  const observer = new MutationObserver((mutations) => {
    for (const m of mutations) {
      m.addedNodes.forEach((node) => {
        if (node instanceof HTMLAnchorElement && node.target === "_blank") {
          node.target = "_self";
        }
      });
    }
  });
  observer.observe(document.body, { childList: true, subtree: true });
});

Navigation includes retry logic with exponential backoff. If the page returns a 403, 429, or shows CAPTCHA indicators, the agent retries — and Bright Data's infrastructure handles the actual unblocking behind the scenes.

Stage 3: n1 Agentic Browse Loop — The Heart of the Agent

This is where n1 shines. For each platform, the agent runs a loop of up to 15 steps:

for (let step = 0; step < maxSteps; step++) {
  // 1. Take screenshot
  const screenshot = await page.screenshot({ type: "png" });
  const dataUrl = `data:image/png;base64,${screenshot.toString("base64")}`;

  // 2. Send to n1 with task context + screenshot
  const response = await n1Client.chat.completions.create({
    model: "n1-latest",
    messages: conversationHistory,
    tools: BROWSER_TOOLS,
  });

  // 3. If no tool calls, n1 is done — extract final answer
  if (!response.choices[0].message.tool_calls?.length) {
    finalAnswer = response.choices[0].message.content;
    break;
  }

  // 4. Execute each action
  for (const toolCall of response.choices[0].message.tool_calls) {
    await executeAction(page, toolCall); // click, type, scroll, etc.
  }

  // 5. Stream screenshot to the UI
  emitEvent({ type: "screenshot", data: dataUrl, source: platform });
}

n1 supports 14 action types: left_click, double_click, triple_click, right_click, type, key_press, scroll, hover, drag, goto_url, go_back, refresh, and wait. Coordinates come in a normalized 0-1000 space and get scaled to the 1280x800 viewport.

What makes this powerful: n1 sees the page. It doesn't care if Ticketmaster renames its CSS classes. It doesn't break when StubHub redesigns their layout. It looks at the screenshot and figures out what to click, just like you would.

Stage 4: Structured Extraction — From Text to Data

When n1 finishes browsing, it returns a natural-language summary of what it found. To get structured data, the agent sends this to OpenRouter's Gemini Flash with a strict JSON schema:

const response = await openrouterClient.chat.completions.create({
  model: "google/gemini-3-flash-preview",
  messages: [
    { role: "system", content: "Extract ticket listings as structured JSON..." },
    { role: "user", content: n1FinalAnswer }
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "ticket_results",
      strict: true,
      schema: {
        type: "object",
        properties: {
          tickets: {
            type: "array",
            items: {
              type: "object",
              properties: {
                eventName: { type: "string" },
                eventDate: { type: "string" },
                venue: { type: "string" },
                section: { type: "string" },
                price: { type: "string" },
                platform: { type: "string" },
                url: { type: "string" },
                // ... 7 more fields
              }
            }
          }
        }
      }
    }
  }
});

If extraction fails (it happens — n1 might report "no tickets found" for a sold-out event), the agent falls back to displaying n1's raw text answer.

Stage 5: Merge & Rank — The Final Output

All tickets from all platforms get merged, deduplicated (by a composite key of platform + section + row + seats + price + URL), and sorted by price ascending. The user sees a clean grid of ticket cards.

The Real-Time Streaming Architecture

One of the things I'm most proud of is the live streaming. While the agent works, users see:

Live browser screenshots updating in real-time across up to 3 panels
An activity log with color-coded events (SERP search started, browser connected, n1 clicking "View Tickets", extraction complete...)
Agent statistics (SERP queries made, pages browsed, AI actions taken, total steps)

This uses Server-Sent Events (SSE) from the Next.js API route:

// API route: POST /api/search
const stream = new ReadableStream({
  async start(controller) {
    const emit = (event: AgentEvent) => {
      controller.enqueue(`data: ${JSON.stringify(event)}\n\n`);
    };

    // Run the agent with the emitter
    await runTicketHunterAgent(initialState, emit);

    emit({ type: "done" });
    controller.close();
  }
});

return new Response(stream, {
  headers: { "Content-Type": "text/event-stream" }
});

On the client side, a custom useTicketSearch hook consumes the SSE stream and manages React state for browser sessions, status entries, and tickets.

The emitter function propagates through the async call stack using Node.js AsyncLocalStorage — this way, deeply nested functions (like the n1 browse loop) can emit events without passing callbacks through every function signature.

What Can Go Wrong (and What I Learned)

Popup chaos

Ticket sites open everything in new tabs. An AI agent navigating with left_click suddenly has its page go blank because the click opened a new window. I solved this with DOM MutationObservers that rewrite target="_blank" to target="_self" in real-time, plus logic to detect when Playwright spawns a popup and switch to it gracefully.

Coordinate translation bugs

n1 returns coordinates in a 0-1000 normalized space. Early on I had off-by-one scaling bugs that caused the agent to click 10 pixels to the right of every button. Small enough to not notice on large buttons, devastating on dropdown menus. The fix: explicit clamping after scaling.

const scaledX = Math.round((x / 1000) * 1280);
const scaledY = Math.round((y / 1000) * 800);
const clampedX = Math.max(0, Math.min(scaledX, 1279));
const clampedY = Math.max(0, Math.min(scaledY, 799));

Hard blocks vs. soft blocks

Not all "blocked" states are the same. A 403 response is a hard block — retry with a new session. But a Cloudflare "checking your browser" interstitial is a soft block that Bright Data's infrastructure will resolve if you wait a few seconds. I had to build detection for 11 hard-block patterns and 9 soft-block patterns, with different retry strategies for each.

The 15-step limit

I set the n1 browse loop to max 15 steps per platform. Too few and the agent can't navigate complex flows (Ticketmaster's event → date selection → section → seat picker is 6+ clicks minimum). Too many and you burn tokens and time. 15 was the sweet spot — enough for most flows, with a forced final-answer call if the limit is reached.

Extraction failures

Sometimes n1 finds tickets but describes them in a way that the structured extraction LLM can't parse cleanly. The fallback — showing n1's raw text answer — saves the user experience. Better to show "Section 102, Row A, $185 each" as plain text than nothing at all.

The Result

A single search takes 30-90 seconds and typically returns 10-30 structured ticket listings across 3 platforms, ranked by price. The user watches the agent work in real-time — clicking through Ticketmaster's seat map, scrolling StubHub's listings, navigating SeatGeek's filters — which is genuinely fun to watch.

The full pipeline:

1 SERP query → top 3 platform URLs
3 parallel browser sessions on Bright Data's infrastructure
Up to 45 n1 vision steps (15 per platform)
3 structured extractions via Gemini Flash
1 merged, deduplicated, price-ranked result set

Running It Yourself

The project is open source:

git clone https://github.com/brightdata/ticket-hunter-agent
cd ticket-hunter-agent
npm install

You'll need API keys for:

Bright Data — Browser API zone + SERP API zone (sign up here)
Yutori — n1 API key (docs.yutori.com)
OpenRouter — for Gemini Flash extraction (openrouter.ai)
Upstash Redis (optional) — for rate limiting

Create a .env.local:

BRD_CDP_URL=wss://brd-customer-<id>-zone-<zone>:<password>@brd.superproxy.io:9222
BRIGHT_DATA_API_TOKEN=your-token
YUTORI_API_KEY=your-key
OPENROUTER_API_KEY=your-key
UPSTASH_REDIS_REST_URL=your-url    # optional
UPSTASH_REDIS_REST_TOKEN=your-token # optional

Then:

npm run dev

Open http://localhost:3000 and search for any event.

Key Takeaways

Vision-based browsing > DOM parsing for sites that fight automation. Yutori n1 doesn't care about class name changes or shadow DOM — it sees pixels.
Infrastructure matters. Bright Data's Browser API handles the hardest part (anti-bot bypass, CAPTCHAs, proxy rotation) so you can focus on agent logic instead of unblocking.
LangGraph's Send API makes parallel branching clean — fan out to 3 browser sessions and merge results back without callback spaghetti.
Always build fallbacks. Structured extraction fails sometimes. Have a graceful degradation path.
Stream everything. Users are far more patient when they can watch the agent work.

FAQ

Q: Does this actually buy tickets?
A: No — Ticket Hunter finds and compares available tickets across platforms. It doesn't complete purchases. The result cards link directly to the platform where you can buy.

Q: How much does it cost to run one search?
A: Roughly $0.02-0.05 in Bright Data bandwidth (3 browser sessions), $0.01-0.03 in n1 tokens (up to 45 steps with screenshots), and <$0.01 for the SERP query and extraction. Total: under $0.10 per search.

Q: Can I add more ticket platforms?
A: Yes — the SERP search automatically discovers platforms. Add the domain to the platform detection map in bright-data.ts and the agent will start browsing it.

Q: Why not just scrape the HTML directly?
A: Ticket platforms use heavy JavaScript rendering, dynamic pricing, interactive seat maps, and anti-scraping measures. A vision-based agent that navigates like a human handles all of these naturally.

Q: What if Bright Data's Browser API is down?
A: Bright Data reports 99.99% uptime across their infrastructure. In 3 months of development, I've had zero Browser API outages.

Have questions about building browsing agents? Drop them in the comments — I'll answer all of them. If you build something cool with Bright Data's Browser API or Yutori n1, I'd love to see it.

GitHub Repo | Bright Data Browser API Docs | Yutori n1 Docs

DEV Community