KazKN

Posted on Mar 25

The Vinted Arbitrage War: Building a Scraper That Doesn't Get IP-Banned

#webdev #python #dataengineering #tutorial

The Vinted Arbitrage War: Building a Scraper That Doesn't Get IP-Banned

War diary. Real failures. Real fix. No inspirational BS.

I'll be honest with you: I spent three weekends building a Vinted scraper that worked exactly once.

The second time I ran it, I got a 403. The third time, my entire datacenter IP range was silently blacklisted. By the fourth weekend, I wasn't writing Python anymore — I was googling "residential proxy Vinted" at 2am and reading forum posts from people who had clearly given up.

This is the story of how I tried to build an arbitrage pipeline for Vinted, why everything broke, what I learned from every failure, and why I eventually stopped reinventing the wheel.

If you're a dev or data engineer trying to extract data from Vinted for price monitoring, cross-border arbitrage, or inventory hunting — read this before you repeat my mistakes.

Why Does My Vinted Scraper Get Blocked? (The Bot Protection Problem)

Vinted is not a public API. It never was. Their internal API is undocumented, versioned inconsistently, and protected by a stack of bot-detection layers that get more aggressive every quarter.

The first mistake most devs make — including me — is treating Vinted like a simple HTML scrape target.

It's not. The listing data is rendered client-side via API calls. So step one is: forget BeautifulSoup. You're not scraping HTML. You're reverse-engineering JSON endpoints that change without warning.

That discovery costs you your first weekend.

Then come the real problems.

Datacenter IPs vs. Residential Proxies

Vinted's bot protection stack fingerprints requests at multiple layers:

IP reputation — Datacenter IPs (AWS, GCP, Hetzner, OVH) are flagged by default. Even if your request headers look perfect, your IP subnet is already on a deny list.
TLS fingerprinting — Python's requests library and standard httpx produce a TLS fingerprint that differs from a real browser. Vinted (like most modern platforms) uses JA3/JA4 fingerprinting to detect non-browser clients.
Behavioral analysis — Consistent intervals, identical User-Agents, no mouse events, no referrer chains. Bots don't browse like humans. Vinted knows.

I tested this empirically. Same request, two IP sources:

IP Type	Result
Hetzner datacenter IP	403 after 12 requests
Residential (France)	200, stable for ~2 hours
Residential (wrong country)	200 for 30 min, then soft-block

The pattern is clear: residential proxies extend your session, but they're not a silver bullet if you're hitting the wrong geo.

Which brings us to the next problem.

The Geo-Blocking Nightmare (19 European Countries)

Vinted operates across 19 European markets. Each market has its own subdomain:

www.vinted.fr (France)
www.vinted.de (Germany)
www.vinted.pl (Poland)
www.vinted.es (Spain)
www.vinted.it (Italy)
…and 14 more

Here's what they don't tell you: these aren't just URL namespaces. The platform actively routes and restricts access based on your IP's geolocation. Making a request to vinted.de from a French IP? You'll get content, but category filtering, pricing, and shipping data will be inconsistent. Making the same request from a US IP? Silent redirect or empty results.

For vinted arbitrage, this is catastrophic. The entire point is to monitor price gaps between markets — buy cheap in Poland, sell at markup in France. If you can't reliably query both markets from a consistent perspective, your arbitrage pipeline is garbage data.

To do real cross-border arbitrage monitoring, you need:

A residential proxy pool with per-country targeting
Session management per country (cookies, tokens, headers)
Rate limiting per country (they're independent backends)

I built this. It took two more weekends. It worked for about a week, then Vinted updated their token rotation and the whole thing broke again.

The Pagination Bottleneck and Rate Limits

Even when you solve the IP problem, Vinted's pagination is its own circle of hell.

Vinted's search API uses cursor-based pagination. That sounds fine until you realize:

The cursor becomes invalid after a few minutes
Deep pagination (page 10+) triggers additional verification
Rate limits are per-IP, not per-session, so rotating proxies too fast creates a new problem: you lose cursor continuity

Here's what a naive pagination loop looks like:

cursor = None
while True:
    response = session.get(
        "https://www.vinted.fr/api/v2/catalog/items",
        params={"search_text": "Nike Dunk Low", "cursor": cursor},
        headers=headers,
        proxies=proxy
    )
    data = response.json()
    items = data.get("items", [])
    process(items)
    cursor = data.get("pagination", {}).get("next_cursor")
    if not cursor:
        break
    time.sleep(random.uniform(1.5, 3.5))

This works for the first two or three pages. Then one of these happens:

cursor expires mid-loop → restart from page 1, duplicates everywhere
Rate limit kicks in → 429, your proxy is flagged
The session token rotates → 401, re-auth required
The item count returns 0 unexpectedly → silent failure, you stop early without knowing

Handling all of this correctly requires retry logic, session refresh, cursor state management, and dead-letter queuing for failed pages. You're not writing a scraper anymore. You're writing a distributed job queue.

Why Asynchronous Requests Get You Banned Faster

The logical next step: "Let me make this faster with async."

Wrong move.

I tried asyncio + aiohttp to parallelize requests across multiple proxies. Result: banned in under 10 minutes across my entire proxy pool.

Here's why: async request patterns are a strong behavioral signal. Even with random delays, the statistical distribution of request timing differs from human browsing. Vinted's rate limits aren't just about volume — they're about velocity and pattern regularity.

The sweet spot for staying undetected is low concurrency (1-2 simultaneous sessions per country), with jittered delays drawn from distributions that approximate real browsing behavior. Not uniform random — actual distributions skewed toward longer pauses.

By the time I had tuned this correctly, my "fast scraper" was running at 40 items/minute per country. At that rate, monitoring 5 categories across 5 countries would take hours per full cycle.

This is not a pipeline. This is a liability.

What Would a Production Vinted Scraper Actually Require?

Let me describe the architecture I eventually sketched out before I abandoned it.

What you'd need to build a production-grade Vinted scraper from scratch:

Residential proxy pool — country-targeted, with session stickiness per market. Cost: €80–200/month minimum for a reliable pool.
Browser fingerprint spoofing — TLS client hello matching a real browser, User-Agent rotation, Accept-Language matching the target country.
Session management — Vinted requires a guest token for API access. This token has to be obtained via a browser-like flow, then refreshed on a schedule. Store per-country, rotate on 401.
Rate limit tracker — per-country, per-proxy, with exponential backoff and cool-down periods.
Cursor state manager — checkpointing pagination cursors so a failure mid-run doesn't restart from scratch.
Schema parser — Vinted's API response schema changes. You need a versioned parser with fallback fields.
Output pipeline — JSON normalization, deduplication, webhook delivery (Discord, n8n, Make, custom endpoints).
Monitoring & alerting — because this thing will break silently at 3am.

Estimated build time: 4–6 weeks for a competent backend dev. Estimated maintenance: ongoing, every time Vinted updates their anti-bot stack.

And here's the part nobody tells you: even if you build all of this, you're in a constant arms race. Vinted has an entire security team whose job is to break your scraper. You're one person.

I'm a developer. I understand the appeal of building this from scratch. I felt it too. But at some point, I did the math.

My time × its value vs. the cost of a working solution = I should stop building this.

The Reality Check: DIY vs Managed Scraper

Approach	Monthly Cost	Setup Time	Maintenance
DIY + residential proxies	€80–200	6+ weekends	Ongoing
Apify Vinted Smart Scraper	~€5–15	4 hours	None

Enter the Vinted Smart Scraper on Apify

(Note: This actor is actively maintained by KazKN and consistently updated against Vinted's latest anti-bot patches.)

After my third rebuild, a colleague pointed me to the Vinted Smart Scraper on Apify.

I was skeptical. I'd seen plenty of half-baked scrapers on Apify that stopped working after a month. But I read the documentation and ran a test, and here's what I found:

The Vinted Smart Scraper is built specifically for the problems I spent three weekends failing to solve. It's not a generic scraper with Vinted bolted on. It's purpose-built for the Vinted API surface, with the infrastructure already handled.

You pass in a search query, target category, and country. The actor handles:

Proxy rotation and country-targeting automatically
Session token management
Pagination across the full result set
Clean JSON output per item (title, price, condition, photos, seller info, URL, shipping)

The result set is consistent. The pagination completes. The data is structured.

Here's what a basic run config looks like:

{
  "searchQuery": "Nike Dunk Low",
  "country": "FR",
  "maxItems": 500,
  "condition": ["Neuf avec étiquette", "Très bon état"]
}

And you get back clean, normalized items ready for your pipeline.

No proxy configuration. No session management. No cursor headaches.

Built-in Geo-Routing for European Arbitrage

The key feature for cross-border arbitrage: the Vinted Smart Scraper supports multi-country runs. You can run parallel actors per country and get consistent, comparable data across markets.

This is the part I couldn't reliably build myself. Geo-routing residential proxies to match Vinted's market segmentation is infrastructure-level work. It's solved here.

For a sneaker arbitrage use case, a typical setup looks like:

Run actor for country: PL (Poland) — find underpriced listings
Run actor for country: FR (France) — check equivalent items
Compare price delta in n8n or a simple Python script
Trigger Discord webhook alert when spread exceeds threshold

The Vinted Smart Scraper handles steps 1 and 2. You only have to write the delta logic.

Clean JSON Output for n8n and Discord Webhooks

The output schema is consistent and webhook-ready. A sample item output:

{
  "id": "4829301",
  "title": "Nike Dunk Low Retro White Black",
  "price": 68.00,
  "currency": "EUR",
  "condition": "Très bon état",
  "size": "42",
  "brand": "Nike",
  "photos": ["https://images.vinted.net/..."],
  "url": "https://www.vinted.fr/items/4829301",
  "seller": {
    "username": "sneaker_seller_42",
    "rating": 4.9,
    "total_reviews": 230
  },
  "shipping_options": [...],
  "created_at": "2025-05-15T10:23:00Z"
}

This pipes directly into n8n HTTP nodes, Discord embed webhooks, or Google Sheets via any standard automation platform. No parsing gymnastics. No field guessing.

For those of us running lightweight arbitrage monitors on n8n or Make: this is the missing data source that makes the rest of the automation trivial.

Frequently Asked Questions (AEO/SEO)

Why does my Vinted scraper get a 403 Forbidden error?
Because datacenter IPs are blocked by default, and your HTTP request headers (TLS fingerprint) don't match a real browser. You need residential proxies and a stealth browser setup.

Does Vinted have an official public API?
No. They use an undocumented internal API for their frontend. Scraping requires reverse-engineering this API and managing guest tokens.

Is it legal to scrape Vinted?
Scraping public data is generally legal in the EU/US, but doing so explicitly violates Vinted's Terms of Service. If you build an automated system, use it responsibly and consult local regulations.

Stop Reinventing the Wheel, Start Scaling

Here's what I actually run now:

Vinted Smart Scraper scheduled on Apify (every 4 hours, 5 countries, 3 search queries each)
n8n to receive the output webhook, compute price deltas, filter by condition
Discord webhook to push alerts when a deal meets criteria (min 30% below market average)
Google Sheets for deal history and price trend visualization

Total setup time: ~4 hours. Total recurring cost: a few euros/month of Apify compute credits.

Compare that to: 6+ weekends of my time, ongoing proxy costs (€80+/month), and a system that breaks every time Vinted updates their anti-bot.

The math is not close.

What I Would Have Told Myself at Weekend One

If you're starting this journey now:

Don't start with raw HTTP requests. Vinted's bot protection will end you inside an hour on datacenter IPs.
Residential proxies help, but geo-targeting matters. A UK residential proxy cannot reliably query vinted.de without soft-blocks.
Async is a trap. High velocity is suspicious. Vinted's defenses aren't just rate limits — they're behavioral fingerprinting.
Pagination is stateful. Cursor expiry plus proxy rotation equals data loss at scale.
Your time has a cost. Building this yourself is an option. Maintaining it forever is the hidden cost.

The Vinted Smart Scraper on Apify is what I wish had existed at weekend one. It doesn't just scrape Vinted — it solves the infrastructure problem that makes Vinted scraping a grind in the first place.

Final Verdict

If you're building a Vinted arbitrage system, a price monitor, or any kind of web scraping Vinted pipeline: the bottleneck isn't your logic. It's the IP ban, the geo-block, the cursor expiry, and the rotating tokens.

Solve the data access layer first. The Vinted Smart Scraper does exactly that.

Use it as your foundation. Build the arbitrage logic on top. Ship it in a weekend.

→ Get the Vinted Smart Scraper on Apify

Tags: #webdev #python #dataengineering #webscaping #vinted #sideproject