DEV Community

Cover image for How I shipped a cross-platform watch arbitrage tracker on Apify in 2 weeks (and the 5 production bugs that almost killed the launch)
KazKN
KazKN

Posted on

How I shipped a cross-platform watch arbitrage tracker on Apify in 2 weeks (and the 5 production bugs that almost killed the launch)

TL;DR — I built Watch Arbitrage Tracker (Apify Store, GitHub): a Crawlee + Camoufox actor that scrapes 6 luxury-watch marketplaces in parallel, computes the cross-platform median price for any Patek/Rolex/AP reference, and pings Telegram the moment a listing drops more than X% below market. Sub-$1/month for typical dealer usage. Doubles as an MCP server so Claude Desktop / Cursor / ChatGPT can query the live feed in plain English.

The interesting part isn't the build — it's the 5 bugs I had to debug in production after pushing public, and the cross-platform median math that turns "scraped data" into a real arbitrage signal.


The problem (real, validated, painful)

Pro watch dealers — the people who flip pre-owned Patek 5711, Rolex Daytona, AP Royal Oak — spend 3+ hours a day refreshing 6 dealer marketplaces looking for mispriced inventory. The job is mechanical: open Chrono24, search 10 reference numbers, compare prices, switch to WatchBox, repeat, switch to Bobs Watches, repeat...

There are existing tools (Watchcharts $79/mo, ChronoPulse $500/mo, Bezel Club) but they all have the same flaw: single-platform anchoring. They tell you the median price on Chrono24, not across the market. That's useless for arbitrage — the whole point is finding spreads between platforms.

The math that actually matters:

spread = cross_platform_median(refX) - listing_price(refX, platformY)
Enter fullscreen mode Exit fullscreen mode

If a Submariner 124060 is listed at $11,900 on WatchBox but the true market median (computed across Chrono24 + WatchBox + Bobs) is $13,988 — that's a 28.2% spread. That's the alert worth waking a dealer at 3am for.

No tool I could find computes a TRUE cross-platform median. So I built one.


The stack

Standard Apify stack with one custom twist:

  • Crawlee + Camoufox (stealthy Firefox fork) for anti-bot resilience. Chrono24 + Bobs Watches sit behind Cloudflare; Camoufox + Apify proxy rotation handles them reliably.
  • TypeScript everywhere (strict mode, Node 24).
  • Per-platform crawler files (src/crawlers/{chrono24,watchbox,bobs,...}.ts) — each ~100 lines, all conform to the same Listing shape so the aggregator doesn't care which platform a listing came from.
  • Aggregator (src/aggregator.ts) — groups listings by extracted sub-reference (more on this in Bug #5 below), computes a trimmed median, detects spreads.
  • Alert dispatcher (src/alerts.ts) — Telegram per-opportunity with 24h dedup.
  • Dual mode — same codebase runs as a batch crawler (scheduled cron) AND as an MCP server in Apify Standby mode, exposing 3 HTTP tools for AI agents.

Total: ~2000 LOC across 25 files. Repo: github.com/DataKazKN/watch-arbitrage-mcp.


The 5 bugs I had to fix LIVE in production

I shipped the actor as a paid public Pay-Per-Event Apify Actor after my last test run looked clean. Then I ran the actor with my own real Telegram bot token + 3 references the day after launch, and immediately found 5 bugs that would have made the actor look broken to first-time users.

Bug #1 — WatchBox redirected every search to a splash page (0 listings extracted)

The crawler URL was:

https://www.the1916company.com/search/pre-owned/?q=rolex+116500LN
Enter fullscreen mode Exit fullscreen mode

In our 2026-05-04 verification, this returned a tile grid with 8 listings. Two days later: zero. Why?

Live DOM inspection via Playwright revealed: WatchBox now redirects any query containing a brand keyword (rolex, patek, audemars) to a brand-suggest splash page that has NO product tiles. The previous URL pattern broke silently.

The fix was tiny but only findable by going hands-on:

// BEFORE — included brand prefix → redirect to splash → 0 tiles
return `https://www.the1916company.com/search/pre-owned/?q=${encodeURIComponent(`${brand} ${ref}`)}`;

// AFTER — bare ref → lands on real /search/?q= results page
return `https://www.the1916company.com/search/?q=${encodeURIComponent(ref)}`;
Enter fullscreen mode Exit fullscreen mode

Lesson: never trust documented URL patterns past 30 days for sites you don't control. Schedule monthly DOM verification runs, even on stable platforms.

Bug #2 — Bobs Watches wrong search endpoint (0 products rendered)

Same pattern, different cause. Bobs Watches' homepage form:

<form action="/shop" method="get">
  <input name="query" type="text">
</form>
Enter fullscreen mode Exit fullscreen mode

The actual search endpoint is /shop?query=124060 — but I had been routing through their old /{brand}-{model}-{page}.html catalog URLs (which only covered top 7 collections AND inflated sample size beyond the user's exact ref).

// BEFORE — stale catalog URL routing
return `https://www.bobswatches.com/rolex-submariner-1.html`;

// AFTER — actual search endpoint
return `https://www.bobswatches.com/shop?query=${encodeURIComponent(ref)}`;
Enter fullscreen mode Exit fullscreen mode

Bonus: Cloudflare's "Un instant..." interstitial takes ~8s to clear with Camoufox. The original 30s waitForSelector timeout was occasionally too tight; bumped to 45s. Fewer false-zero runs.

Bug #3 — A previous defensive filter was now stripping 100% of legitimate data

This one was sneaky. I had added a strict ref-matching filter in every crawler to defend against an earlier bug where WatchBox returned Calatrava listings tagged with the wrong reference. The filter was:

// "5711/1A-010" → normalized → "57111a010" → required substring in title+href
const refCore = refLower.replace(/[^\w]/g, '');
const haystack = `${title} ${href}`.toLowerCase().replace(/[^\w]/g, '');
if (!haystack.includes(refCore)) continue;
Enter fullscreen mode Exit fullscreen mode

Worked on chrono24 (which lists titles with full sub-variants like 5711/1A-010). Destructive on European Watch Co, where titles use base refs only (e.g. 5711/1A):

INFO  europeanwatch: 312 raw cards pre-dedupe for ref="116500LN"
INFO  europeanwatch: extracted 0 listings for ref="116500LN"
Enter fullscreen mode Exit fullscreen mode

312 cards found, 0 extracted. The filter was working as designed but the design was wrong for brand-grid platforms. Fix:

// Match BASE prefix instead of full sub-variant.
// "5711/1A-010" → match "57111a"; aggregator's extractSubRef() then
// groups detected sub-variants for accurate median.
const baseMatch = refLower.replace(/[^\w]/g, '').match(/^(\d{4,6}[a-z]{0,3})/);
const basePrefix = baseMatch ? baseMatch[1] : refLower.replace(/[^\w]/g, '');
Enter fullscreen mode Exit fullscreen mode

Lesson: defensive code added to fix bug N can cause bug N+M months later. Keep filters per-platform when the platforms have meaningfully different data shapes.

Bug #4 — Actor.call('apify/send-mail') fails silently for public actors

I had wired up email digests for users who didn't want Telegram. Worked perfectly when I tested as the developer. Failed for every public-actor user with:

ApifyApiError: Insufficient permissions for the Actor.
Make sure you're passing a correct API token and that it has the required permissions.
Enter fullscreen mode Exit fullscreen mode

After research: Apify injects a sandboxed runtime token for public Actor runs. That token doesn't have actor:write scope, so Actor.call('apify/send-mail') returns 403. There's no warning at build time — the failure happens at runtime, per-user, silently.

Worse: the dispatcher was catching the error in a try/catch and reporting email_sent: true anyway. So users would think their emails were sent when they weren't.

I made two fixes:

  1. Honest reporting — return a boolean from sendEmailDigest() and propagate it upstream:
   async function sendEmailDigest(...): Promise<boolean> {
     try {
       await Actor.call('apify/send-mail', {...});
       return true;
     } catch (err) {
       log.warning(`Email send failed`, { err: String(err) });
       return false;
     }
   }
Enter fullscreen mode Exit fullscreen mode
  1. Drop email from the MVP — better to ship a smaller working feature set than a bigger one that lies. v0.2 will integrate Resend HTTP API directly (no actor-to-actor call needed).

Lesson: catch/log/return-true is the worst possible error handling pattern. If you can't recover, surface the failure.

Bug #5 — Sub-reference grouping (the one that actually mattered)

This isn't a "bug fixed in production" — it's the architectural decision that made the whole tool work.

For broad reference searches like Nautilus, the actor returns listings across multiple sub-models: 5711, 5810, 5990, 7118, 7011, 4700. All of these are technically "Nautilus", but their median prices differ by 5-10x:

  • 5711/1A-010 (men's stainless steel): $130K
  • 7118/1A (women's): $50K
  • 4700/1 (vintage): $25K

Aggregating one median across all sub-models would produce a misleading $80K median that triggers false arbitrage alerts every time a women's 7118 is listed.

The fix: extract sub-references from each listing's title using brand-specific regex:

export function extractSubRef(title: string, brand: string): string | null {
  if (brand === 'patek-philippe') {
    // Patek: 5711/1A-010, 5990/1A, 5810G-001, 7118/1200R-010, 5167A
    const m = title.match(/\b([56]\d{3}\/?\d{0,4}[A-Z]?[-\s]?\d{0,3})\b/);
    if (m) return m[1].replace(/\s+/g, '').toUpperCase();
  }
  if (brand === 'rolex') {
    // Rolex: 116500LN, 124060, 126710BLNR
    // 5+ digits min to skip year matches (2024, 2026) in titles
    const m6 = title.match(/\b(\d{6}[A-Z]{0,5})\b/);
    if (m6) return m6[1].toUpperCase();
    const m5 = title.match(/\b(\d{5}[A-Z]{0,5})\b/);
    if (m5) return m5[1].toUpperCase();
  }
  if (brand === 'audemars-piguet') {
    // AP: 15500ST, 67600ST.OO.1210ST.01, 26331ST.OO.1220ST.01
    const m = title.match(/\b(\d{5}(?:ST|OR|BC|SP|CE)(?:\.[A-Z0-9.]+)?)\b/);
    if (m) return m[1].toUpperCase();
  }
  return null;
}
Enter fullscreen mode Exit fullscreen mode

Then group listings by extracted sub-ref, NOT by user search term. Listings without a detectable sub-ref are kept in the dataset but excluded from median computation. Median is now per-sub-model, which is the only price that's actually comparable cross-platform.

This single change eliminated ~80% of the false-positive arbitrage signals in earlier builds.


Cross-platform median: the actual value prop

Once the bugs were fixed and 4 platforms were delivering listings reliably, the core math could finally do its job. From a real cloud run on 2026-05-06:

Sub-ref:    124060 (Rolex Submariner No Date)
Listings:   14 total
  Chrono24:        12 listings, range $11,872 – $15,729
  WatchBox:         1 listing,  $10,050
  Bobs Watches:     2 listings, $14,995 + $14,995
Median:     $13,988 (computed across all 14)

Spread alerts (>5% below median):
  1. WatchBox  $10,050 → 28.2% below median ✅ ALERT
  2. Chrono24  $11,872 → 15.1% below median ✅ ALERT
  3. Chrono24  $12,200 → 12.8% below median ✅ ALERT
Enter fullscreen mode Exit fullscreen mode

The 28.2% WatchBox spread is what the dealer flips for $3,938 profit. That single alert pays for ~12 months of the actor's runtime cost.


MCP integration: query your arbitrage feed from Claude Desktop

After getting the batch crawler stable, I wired the same data into a Model Context Protocol server using Apify Standby mode. Same Docker image; environment variable metaOrigin === 'STANDBY' switches the entry point from runBatch() to a small Express server with three HTTP tools:

Tool Purpose
get_arbitrage_snapshot Top N current arbitrage opportunities, optionally filtered by ref + min spread %
get_market_stats Per-ref median, min, max, count across platforms
get_listings_by_ref Raw listings for a ref, filterable by condition + box/papers, paginated

Add this to your Claude Desktop config:

{
  "mcpServers": {
    "watch-arbitrage": {
      "url": "https://kazkn--watch-arbitrage-mcp.apify.actor/mcp?token=apify_api_YOUR_TOKEN",
      "transport": "streamable-http"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

…and Claude can answer questions like:

  • "Show me the biggest Patek Nautilus spreads from the last 24h."
  • "What's the median price of a Rolex Daytona 116500LN this week?"
  • "Find me listings for the AP Royal Oak 15500ST under $35K."

The MCP server reads the latest dataset populated by the batch crawler. Same compute, different access pattern. Costs the same as the batch alerts: $0.50 per arbitrage query (you only pay for the value-extracting view).


Pricing as a Pay-Per-Event Apify Actor

I went with PPE rather than per-runtime because dealers care about per-alert ROI, not per-CPU-second cost:

Event Charge When
actor-start $0.05 Once per scheduled run
reference-monitored $0.01 Per ref scanned across all platforms
apify-default-dataset-item $0.001 Per listing scraped
spread-alert-triggered $0.50 Primary event — only when a real arbitrage opportunity is dispatched

Typical dealer profile (15 refs, hourly schedule):

  • Light usage (1-2 alerts/day): ~$15-30/month
  • Heavy usage (10+ alerts/day, 24/7 monitoring): ~$300-500/month

Compare to ChronoPulse at $500/mo flat regardless of signal volume. PPE means you pay for value extracted, not compute consumed. And the spending limit is respected — every charge call honors ACTOR_MAX_TOTAL_CHARGE_USD so a runaway alert spike never blows past the cap.


What I learned

  1. Ship public, then debug in production. I had test coverage. I had verified DOMs. Bugs surfaced anyway. The only way to find them was to run the actor with real user inputs at production scale.
  2. Defensive code rots. Bug #3 was a defense added to fix Bug #1 four weeks earlier. Today's defensive filter is tomorrow's data destroyer.
  3. try/catch/log is not error handling. It's burying the failure. Always propagate up to a layer that can act on the error.
  4. Per-platform crawler logic > universal crawler abstraction. Different sites have different DOM shapes, different anti-bot postures, different title formats. Pretending they're the same loses signal.
  5. Sub-reference grouping is the difference between a useful median and a misleading one. Generic "average price" tools don't do this. It's the entire moat.

Try it

Built by kazkn. If this approach (multi-source price arbitrage + Telegram alerts + MCP for AI agents) is useful in your domain, send me a note via Apify message — I'm building a portfolio of arbitrage actors across other verticals (sneakers, art, fine wine) using the same architecture.



Top comments (0)