KazKN

Posted on Apr 29

How I Built a Vinted Scraper That Survives Datadome (and 26 Country Redirects)

#webdev #scraping #javascript #apify

How I Built a Vinted Scraper That Survives Datadome (and 26 Country Redirects)

A reseller friend asked me a simple question last summer. "Can you write me something that watches when a Pokémon card drops on Vinted in Spain so I can buy it before someone in Italy snipes it?" I said yes. Two weeks of nights later I had something that worked on a single country and broke twice a day. Six months later it had run 97,900+ times across 26 EU markets, with 250+ active users and a 5.0 rating on Apify Store.

This is what I learned the hard way. I'm going to walk through the architecture, the Datadome strategy, the cross-country comparison engine that became the killer feature, and the bug that wiped out 60% of my MRR overnight. If you're thinking about scraping a Datadome-protected marketplace, this should save you about three months.

The actor lives here: Vinted Smart Scraper on Apify. Free tier covers ~9,000 results/month. Source patterns below are from the production codebase.

🎯 The problem in three lines

Vinted has no public API for indie devs. The Vinted Pro API requires a Pro account and manual approval from an internal allowlist that nobody outside enterprise gets onto. The public catalog is reachable in a browser, but every IP that sends more than ~3 requests in a row from a datacenter range gets flagged by Datadome inside 200ms. So you have a marketplace with €1.2B GMV, a real demand for cross-border arbitrage data, and zero programmatic access for everyone outside Vinted's own enterprise customers.

That's the gap. Let's fill it.

🧱 What "production-grade" actually means here

The first scraper I wrote was a requests.get() loop with a sleep. It died in 18 seconds. The second version added a residential proxy and a real User-Agent. It died in 4 minutes. The third version had a Playwright browser and a working JA3 fingerprint. It died after 80 requests because Vinted ramped its detection.

The thing that finally worked is not a single trick. It's six layers stacked. In rough order of impact:

Residential or mobile IPs, rotating per session, never reused inside the Datadome cookie window
Real browser fingerprint — TLS handshake, JA3 hash, Accept-Language matching the geo of the IP, viewport, navigator props
Cookie persistence per session — reusing the Datadome challenge response as long as it's valid
Geo-aware domain routing — vinted.fr is not the same backend as vinted.de, and the SPA payload differs
Adaptive backoff — 200 OK is not "you're good", it's "the layer above the bot detector said ok"
Multi-mode architecture so a single user request can fan out across 5+ countries in parallel without burning sessions

If you only do (1) and (2) you'll get to about 3-4 hour uptime. If you do all six you'll run for weeks. Below is how (3) through (6) actually look in code.

🔄 Mode design — why I split the actor into 7 input modes

This was the design decision that made the codebase tractable. Instead of one giant scrape() function with branches, the actor accepts a mode field and dispatches to one of seven self-contained handlers:

// vinted-actor/src/main.ts
type ScrapeMode =
  | "SEARCH"           // catalog search with filters
  | "ITEM_DETAIL"      // single item URL → full payload
  | "SELLER_PROFILE"   // seller URL → reviews, items, ratings
  | "CROSS_COUNTRY"    // 1 query × N countries → median/min/max
  | "PRICE_TRACK"      // recurring snapshot of items
  | "SOLD_ITEMS"       // recently sold, with realized price
  | "TRENDING";        // by favorites / engagement

const handlers: Record<ScrapeMode, Handler> = {
  SEARCH: searchHandler,
  ITEM_DETAIL: itemDetailHandler,
  SELLER_PROFILE: sellerHandler,
  CROSS_COUNTRY: crossCountryHandler,
  PRICE_TRACK: priceTrackHandler,
  SOLD_ITEMS: soldItemsHandler,
  TRENDING: trendingHandler,
};

export async function main(input: ActorInput) {
  const mode = input.mode ?? "SEARCH";
  const handler = handlers[mode];
  if (!handler) throw new Error(`Unknown mode: ${mode}`);
  return handler(input);
}

Splitting like this paid off twice. First, each mode evolves independently — when Vinted broke their item-detail SPA layout in v1.0.59, I patched only itemDetailHandler without touching the other six. Second, the input schema in Apify's UI becomes a clean dropdown, which is what 90% of non-dev users actually want.

{
  "mode": "CROSS_COUNTRY",
  "query": "nike air max 90",
  "countries": ["fr", "de", "es", "it", "nl"],
  "maxItemsPerCountry": 200
}

That input fans out into 5 parallel searchHandler calls under the hood, then a reducer in crossCountryHandler aggregates the results. Which leads to the killer feature.

🌍 The cross-country engine — what made the actor worth paying for

The single mode that turned this from a side project into something resellers paid for is CROSS_COUNTRY. It runs the same query across multiple Vinted markets, normalizes prices via the European Central Bank reference rate, and returns one summary object with bestBuyCountry, bestSellCountry, and arbitrageSpread.

Here's roughly what the reducer does:

// vinted-actor/src/routes/cross-country.ts
export async function crossCountryHandler(input: CrossCountryInput) {
  const { query, countries } = input;

  // Fan out — each country runs in parallel with its own session
  const results = await Promise.all(
    countries.map((c) =>
      runSearch({ query, country: c, maxItems: input.maxItemsPerCountry ?? 100 })
    )
  );

  // Normalize each country's prices to EUR via ECB rate
  const normalized = results.map((r, i) => ({
    country: countries[i],
    items: r.items,
    medianEur: medianPriceInEur(r.items),
    avgEur: avgPriceInEur(r.items),
    minEur: minPriceInEur(r.items),
    maxEur: maxPriceInEur(r.items),
    sampleSize: r.items.length,
  }));

  // Find the spread
  const cheapest = minBy(normalized, (n) => n.medianEur);
  const dearest  = maxBy(normalized, (n) => n.medianEur);
  const spread   = ((dearest.medianEur - cheapest.medianEur) / cheapest.medianEur) * 100;

  return {
    query,
    summary: {
      bestBuyCountry: cheapest.country,
      bestSellCountry: dearest.country,
      arbitrageSpread: `${spread.toFixed(1)}%`,
      sampledAt: new Date().toISOString(),
    },
    countries: normalized,
  };
}

Real output for "nike air max 90" last week:

{
  "query": "nike air max 90",
  "summary": {
    "bestBuyCountry": "es",
    "bestSellCountry": "it",
    "arbitrageSpread": "78%",
    "sampledAt": "2026-04-26T14:08:31Z"
  },
  "countries": [
    { "country": "es", "medianEur": 28, "sampleSize": 1240 },
    { "country": "de", "medianEur": 35, "sampleSize": 980 },
    { "country": "gb", "medianEur": 40, "sampleSize": 1120 },
    { "country": "fr", "medianEur": 42, "sampleSize": 1830 },
    { "country": "it", "medianEur": 50, "sampleSize": 660 }
  ]
}

A 78% spread between Spain and Italy on a single sneaker model, with sample sizes north of 600 per country. That kind of signal does not exist on Vinted's own UI. Resellers buy this report, fly to Madrid for a weekend, ship from Milan. The actor pays for itself on a single trip.

🛡️ The Datadome layer — what actually works

I need to be careful about how detailed I get here, because every public post on Datadome bypass leaks the bypass faster. The high-level shape, which you can find in any honest scraping discussion in 2026:

Datacenter IPs are dead. Anything routed through AWS, GCP, Azure, OVH, Hetzner gets fingerprinted in the first 1-2 requests. Use residential or mobile proxies.
TLS fingerprint matters more than User-Agent. A real User-Agent: Chrome/124 with a Python-style JA3 hash is a dead giveaway. You either use a curl-impersonate-like client (curl_cffi, tls-client) or a real browser engine (Playwright with chromium-stealth, Camoufox, SeleniumBase UC mode).
Rate is human, not constant. Spacing requests at 0.8-2.5s with jitter survives much longer than uniform 1s. Long pauses every 50-80 requests survive even longer.
Sessions are sticky per IP. Once you have a Datadome dd_cookie accepted, reuse that session for as many requests as it'll allow you (usually 50-200) before rotating IP.

What I do specifically inside the actor: lean on the Apify proxy infrastructure for residential rotation, run a headless engine with a known-good fingerprint, persist the Datadome cookie in the actor's key-value store between runs, and apply per-domain backoff. None of those four are individually clever. The combination is what survives.

💸 The pricing model evolution — and the bug that wiped 60% of MRR

The actor launched on Pay-Per-Result (PPR): users pay a flat $0.0005 per item returned. It was simple, the math was honest, and it worked for three months.

Then on April 22, 2026, my revenue cliffed by 60% in 24 hours. The dashboard graphs all turned south at the same time. Every reasonable hypothesis pointed at me: I had pushed a pricing change, I had broken a build, I had angered an algorithm.

What actually happened is one user — let's call them trim_kit — had reported an ITEM_DETAIL bug on April 13 that I responded to five days late, by which point they had already churned. trim_kit was a power user running 800+ runs a day. When they left, the curve dropped by exactly the percentage their usage represented.

Two lessons. First, revenue concentration is a silent risk in pay-per-use. With a flat free tier of $5 Apify credits, most users sit in the long tail at <50 results/month. A single power user at 800 runs/day represents a meaningful chunk of MRR, and you don't see it in the average.

Second, bug-report triage is part of the unit economics, not a customer-success cost. I now respond to Apify Console issues within 24 hours and post a public ETA even if the fix takes a week. Nothing stops a power user from churning silently like silence.

I also moved the actor from PPR to Pay-Per-Event (PPE) on May 12, with a $0.018 actor-start fee and a $0.0005 per result. The start fee covers the fixed cost of provisioning a session and warming the Datadome handshake; the per-result fee is what users actually want to pay for. Apify supports tier discounts (Bronze -10%, Silver -15%, Gold -25%), so heavy users still get their volume break.

🤖 Adding an MCP server — letting Claude run the scraper

The most fun extension was wiring a Model Context Protocol (MCP) server to the actor so that Claude Desktop or any MCP-compatible AI can run searches in plain English. The MCP wrapper lives in a sister actor, vinted-mcp-server, which exposes four tools to the LLM:

// vinted-mcp-server/src/tools.ts
export const tools = [
  {
    name: "find_arbitrage_opportunities",
    description: "Compare prices for a product across N Vinted markets and return the best buy/sell countries.",
    inputSchema: { /* { query, countries[], minSpreadPct } */ },
  },
  {
    name: "monitor_keyword",
    description: "Subscribe to new Vinted listings matching a keyword. Triggers a webhook when items appear.",
    inputSchema: { /* { keyword, country, webhookUrl } */ },
  },
  {
    name: "analyze_seller",
    description: "Return reputation, inventory, and price-distribution analytics for a Vinted seller URL.",
    inputSchema: { /* { sellerUrl } */ },
  },
  {
    name: "get_item_details",
    description: "Return the full payload for a Vinted item URL including all photos and seller meta.",
    inputSchema: { /* { itemUrl } */ },
  },
];

After this shipped, my conversation with Claude turned into things like "check if Nike Air Force 1 is cheaper in Spain than in Italy and tell me how big the spread is" — and Claude calls find_arbitrage_opportunities, gets the JSON, and replies with a one-paragraph summary. That's the part of the future I'm most excited about: scrapers as ambient tools for AI agents, not standalone CLIs.

📊 What "production" looks like at 250+ users

Six numbers from the Apify dashboard at the time of writing:

97,900+ total runs since launch
250+ active users (run-in-30-days)
26 Vinted markets supported (FR, DE, GB, IT, ES, NL, PL, BE, AT, PT, CZ, SK, HU, RO, HR, FI, DK, SE, EE, LT, LV, SI, GR, IE, LU, US)
5.0 / 5 average rating, n=3 reviews (small but unanimous)
6 seconds median run duration on a 5-result query
$0.018 + $0.0005/result current pricing — about €0.02 per run + €1.80 per 1,000 items

Run cost compares well: V-Tools is €80/month flat with weekend delays; Bright Data starts at $0.001/record but enterprise-only onboarding; ScrapingBee is ~$0.005/request. For variable-volume users below 50K items/month, this actor is the cheapest path to clean Vinted data, full stop.

🧠 What I'd do differently

If I started this over today:

Ship CROSS_COUNTRY first, not last. It was the killer feature buried in version 1.0.40. I should have made it the demo on day one.
Triage Apify Console issues like production incidents. Set a 24h SLA, post a public ETA, refund proactively when a regression bites a paying user.
Track concentration risk explicitly. A daily "top-10 users by run count" sanity check would have caught the trim_kit churn signal before the revenue cliff.
Multilingual README from week one. Vinted's three biggest markets are FR, DE, IT — all non-English. The 5-language README I shipped in week 22 should have been week 1.
Lean harder on Apify's free tier as the on-ramp. $5/month of platform credits gets a new user ~9,000 results free with this actor. That's a generous trial, and most users either stay free or convert in their second month.

🎬 If you want to try it

The actor is live at apify.com/kazkn/vinted-smart-scraper. Free Apify account → click Try for free → pick a mode → enter a query → click Start. ~30 seconds end-to-end on the cross-country mode against 5 countries.

If you build something interesting on top of it (especially with the MCP server for AI agents), I'd love to see it. Drop a comment or open an issue on the actor — I read everything that lands.

— kazkn

DEV Community

How I Built a Vinted Scraper That Survives Datadome (and 26 Country Redirects)

How I Built a Vinted Scraper That Survives Datadome (and 26 Country Redirects)

🎯 The problem in three lines

🧱 What "production-grade" actually means here

🔄 Mode design — why I split the actor into 7 input modes

🌍 The cross-country engine — what made the actor worth paying for

🛡️ The Datadome layer — what actually works

💸 The pricing model evolution — and the bug that wiped 60% of MRR

🤖 Adding an MCP server — letting Claude run the scraper

📊 What "production" looks like at 250+ users

🧠 What I'd do differently

🎬 If you want to try it

Top comments (0)