The Hybrid Vinted Scraping Architecture That Outperforms Pure Browser Crawls
When you scrape Vinted at scale, you quickly hit a wall.
Not a firewall metaphor. A literal one. Datadome. Cloudflare. Aggressive rate limits. Token rotation that invalidates your session mid-crawl. And if you are still running headless Chromium for every single request, you are burning proxy credits and clock cycles for no reason.
After months of iteration — and enough failed runs to fill a datacenter — the architecture that actually works is hybrid: use a real browser only where Vinted forces you to, then switch to lightweight HTTP for the actual data extraction.
This is how Vinted Turbo Scraper implements that hybrid model, what makes it faster than pure-browser approaches, and why the architecture is the real product.
Why Pure Browser Crawling Is a Trap
Most tutorials tell you to fire up Playwright or Puppeteer, navigate to a Vinted search page, scroll endlessly, and extract DOM nodes. This works for five items. It collapses at scale.
Here is why:
| Problem | Browser-Only Impact |
|---|---|
| Proxy cost | Every image, font, and JS asset loads through your proxy. Bandwidth is not free. |
| Memory bloat | Chromium instances chew 200-500MB each. At concurrency 5, you are eating gigabytes. |
| Fingerprint fatigue | Datadome profiles browser behavior. Repeating the same navigation pattern = flag. |
| Session decay | Cookies and tokens expire. A pure browser crawl does not gracefully re-authenticate. |
| Speed ceiling | Rendering a full React-powered catalog page takes 2-5 seconds. Per page. |
A pure browser crawl is not "robust." It is expensive, slow, and detectable.
The insight is simple: Vinted serves catalog data via an internal JSON API. Once you have a valid session cookie, you can query that API directly with HTTP requests. No rendering. No DOM traversal. No asset loading.
The challenge is getting that cookie in the first place.
The Hybrid Model: Browser for Session, HTTP for Extraction
Vinted Turbo Scraper uses a two-phase approach:
- Phase One: Session initialization via Playwright — Navigate to the target catalog page once, let Datadome validate the browser fingerprint, capture cookies, and grab the user agent string.
- Phase Two: HTTP API extraction via got-scraping — Use the captured session to fire lightweight JSON API requests, paginating through results at ~200 items per minute.
This is not theoretical. Here is how the crawler initialization blocks media assets to keep proxy usage minimal:
preNavigationHooks: [
async ({ page }) => {
await page.route('**/*', (route) => {
const type = route.request().resourceType();
// Block images, media, fonts to save proxy bandwidth
if (['image', 'media', 'font'].includes(type)) {
route.abort().catch(() => {});
} else {
route.continue().catch(() => {});
}
});
}
]
By aborting image and font requests before they hit the proxy, we cut bandwidth consumption by roughly 70%. On metered residential proxies, that translates directly to cost savings.
Translating Vinted Search URLs into API Calls
Vinted search URLs encode filter parameters in query strings: catalog[], brand_id[], size_id[], color_id[], status[], and more.
The internal API expects these same values but with slightly different parameter names and array bracket syntax. The Turbo Scraper extracts and rewrites these parameters automatically:
function translateToApiUrl(urlStr: string, domain: string): string | null {
const u = new URL(urlStr);
const params = new URLSearchParams(u.searchParams);
const arrayMaps: Record<string, string> = {
'catalog[]': 'catalog_ids',
'color_id[]': 'color_ids',
'size_id[]': 'size_ids',
'status[]': 'status_ids',
'brand_id[]': 'brand_ids',
};
const STRIP = new Set([
'search_id', 'time', 'search_by_image_uuid',
'search_by_image_id', 'currency', 'page', 'per_page'
]);
const apiParams = new URLSearchParams();
const accumulated: Record<string, string[]> = {};
for (const [k, v] of params.entries()) {
if (STRIP.has(k)) continue;
if (arrayMaps[k]) {
if (!accumulated[arrayMaps[k]]) accumulated[arrayMaps[k]] = [];
accumulated[arrayMaps[k]].push(v);
} else {
apiParams.set(k, v);
}
}
// Critical fix: append brackets for multi-value arrays
for (const [key, vals] of Object.entries(accumulated)) {
for (const v of vals) apiParams.append(`${key}[]`, v);
}
return `https://www.${domain}/api/v2/catalog/items?${apiParams.toString()}`;
}
This translator is the bridge between the URL your user copies from their browser and the internal API endpoint that returns raw JSON. Without it, you would need users to manually map catalog IDs — which defeats the purpose of a "zero-config" scraper.
The Human-Friendly Mapping Layer
Vinted uses numeric IDs for filters. Users do not know that "Nike" maps to brand ID 53 or that "new with tags" maps to status ID 6.
The actor maintains internal dictionaries that resolve plain text to these IDs:
const BRAND_MAP: Record<string, number> = {
'nike': 53, 'zara': 12, 'h&m': 7, 'adidas': 14,
'levis': 10, 'ralph lauren': 88, 'calvin klein': 33,
'guess': 35, 'puma': 15, 'vans': 16, 'converse': 17,
'tommy hilfiger': 94, 'lacoste': 93, 'the north face': 114,
'asics': 631, 'new balance': 267, 'carhartt': 362, 'dickies': 1007
};
const CONDITION_MAP: Record<string, number> = {
'neuf avec étiquette': 6, 'new': 6, 'new_with_tags': 6,
'neuf sans étiquette': 3, 'new_without_tags': 3,
'très bon état': 2, 'very_good': 2,
'bon état': 1, 'good': 1,
'satisfaisant': 4, 'satisfactory': 4,
};
const SIZE_MAP: Record<string, number> = {
'35': 54, '36': 55, '37': 56, '38': 57, '39': 58, '40': 59,
'41': 60, '42': 61, '43': 62, '44': 63, '45': 64, '46': 65, '47': 66,
'xxs': 205, 'xs': 206, 's': 207, 'm': 208, 'l': 209, 'xl': 210, 'xxl': 211
};
This lets users pass intuitive inputs like ["Nike", "Adidas"] or ["new", "very_good"] instead of reverse-engineering Vinted's internal taxonomy. The actor falls back to raw numeric IDs for anything not in the map, so power users are not constrained either.
HTTP Extraction Loop: Where the Speed Lives
Once the session cookie is captured, the actor switches to got-scraping for the heavy lifting:
const res = await gotScraping({
url: apiReqUrl,
responseType: 'json',
proxyUrl,
headers: {
'User-Agent': userAgent,
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7',
'Cookie': cookieStr,
'Referer': `https://www.${domain}/`,
'X-Money-Object-Enabled': 'true',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
},
timeout: { request: 15000 }
});
The Sec-Fetch-* headers are not decoration. They signal to Vinted's edge that this is a same-origin AJAX request, not an external scraper. Combined with a matching Referer and the validated Cookie string, the request sails through.
Each page returns 96 items. The loop paginates until data.pagination.current_page >= data.pagination.total_pages or the maxItems limit is hit.
Result: ~200 items per minute sustained, with a memory footprint under 512MB per worker.
Input Schema Deep Dive
The actor accepts minimal but precise JSON input. Here is the exact schema:
{
"maxItems": 100,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"startUrls": "https://www.vinted.co.uk/catalog?catalog[]=1844&brand_ids[]=53&size_ids[]=207&status_ids[]=6&price_from=20&price_to=50¤cy=GBP&order=price_low_to_high"
}
| Field | Type | Required | Description |
|---|---|---|---|
startUrls |
string or array | Yes | One or more Vinted search URLs. Supports batch processing. |
maxItems |
number | No (default: 100) | Cap on results per run. Use for cost control. |
proxyConfiguration |
object | No (recommended) | Defaults to Apify residential proxies. Essential for Datadome evasion. |
You can pass multiple URLs as a comma-separated string or an array of objects with url keys. The actor processes them sequentially in a single run, combining outputs into one unified dataset.
Integration Patterns: From Scraper to Pipeline
Raw data is worthless without a destination. The actor integrates with Apify's ecosystem for downstream automation:
| Destination | Trigger | Use Case |
|---|---|---|
| Google Sheets | Apify integration | Live inventory tracking |
| Slack | Webhook | Alert team on new listings |
| Airtable | Zapier/Make bridge | Visual database for resellers |
| Custom API | Dataset webhook | Push to your own backend |
| CSV/Excel | Manual download | One-off market analysis |
For recurring monitoring, pair the actor with Apify Scheduler. Set it to run every 15 minutes against a filtered search URL and pipe results to a Slack channel or Google Sheet. You catch new listings before manual browsers refresh the page.
Real-World Performance Benchmarks
Here are observed numbers from production runs across different proxy tiers:
| Proxy Type | Speed | Reliability | Cost per 1k Items | Best For |
|---|---|---|---|---|
| Apify Proxy (Datacenter) | ~300 items/min | Low (blocks after ~500) | ~$0.30 | Quick tests |
| Apify Proxy (Residential) | ~200 items/min | High (rarely blocked) | ~$1.50 | Production runs |
| Custom Proxy | Variable | Depends on quality | Variable | Power users |
The residential proxy is the sweet spot: fast enough for real-time workflows, reliable enough for continuous monitoring, and priced predictably per result.
Architecture Comparison: Browser vs Hybrid vs Pure HTTP
| Approach | Speed | Cost | Reliability | Complexity |
|---|---|---|---|---|
| Pure Browser | ~20-40 items/min | High (full asset load) | Medium (detectable patterns) | Low |
| Pure HTTP | ~300+ items/min | Minimal | Low (session requires bootstrapping) | High |
| Hybrid (Turbo) | ~200 items/min | Low (blocked assets) | High (session + retry logic) | Medium |
Pure HTTP is fastest on paper, but without a valid session cookie, every request returns a 403. The hybrid approach trades absolute speed for operational reliability — the metric that actually matters when you are running automated workflows.
When to Use Turbo vs Smart Scraper
Vinted Turbo Scraper is part of a two-tool ecosystem. Choose based on your use case:
| Feature | Turbo Scraper | Smart Scraper |
|---|---|---|
| URL-based input | Yes | No (form-based) |
| Batch URL processing | Yes | No |
| Cross-country comparison | No | Yes |
| Seller analysis | No | Yes |
| Sold items tracking | No | Yes |
| Trending discovery | No | Yes |
| Price monitoring | Yes | Yes (cross-border) |
| Speed | Faster | Slower (richer data) |
| Cost | Lower | Higher |
Use Turbo when you have a Vinted search URL ready and need structured data fast. Use Smart when you are doing deep market intelligence, seller profiling, or cross-country arbitrage.
Anti-Ban Mechanisms Beyond Proxies
Proxy rotation is table stakes. The actor adds three additional layers:
- Request fingerprint rotation via Crawlee — Built-in proxy configuration rotates IPs per session.
-
Aggressive retry with exponential backoff —
maxRequestRetries: 5with a 30-second handler timeout. - Graceful session recycling — If an HTTP request fails with a 403, the Playwright session is refreshed before retry.
The output is a clean JSON schema with optional lightweight mode:
{
"id": 8464268321,
"title": "Levi black skinny jeans 33\" waist",
"url": "https://www.vinted.co.uk/items/8464268321-levi-black-skinny-jeans-33-waist",
"price": 20,
"currency": "GBP",
"brand": "Levi's",
"size": "M / UK 12-14",
"condition": "New with tags",
"photos": ["..."],
"favouriteCount": 1,
"seller": {
"id": 73959532,
"username": "maxi83199",
"profileUrl": "https://www.vinted.co.uk/member/73959532-maxi83199"
},
"scrapedAt": "2026-03-24T10:25:41.604Z"
}
Structured. Timestamped. Ready for pipelines.
FAQ: Technical Details
Q: Does this use headless browsers for every request?
A: No. Only for initial session bootstrap. Data extraction uses lightweight HTTP requests via got-scraping.
Q: How many items can I extract per run?
A: The maxItems parameter lets you cap runs. We have tested up to 10,000 items in a single run without memory issues.
Q: Is there a Vinted API this connects to?
A: Vinted does not offer a public API for catalog data. This actor acts as a practical alternative by reverse-engineering the internal endpoints.
Q: Will my IP get banned?
A: With residential proxies and the hybrid architecture, blocks are rare. The actor implements retry logic and session refresh for edge cases.
Q: Can I run this on a schedule?
A: Yes, via Apify Scheduler or cron triggers. Ideal for monitoring new listings.
Q: What output formats are available?
A: JSON (structured), CSV, Excel, or direct API export to integrations.
The Honest Bottom Line
No scraper is "unbanable." Platforms evolve. What the hybrid architecture buys you is time — time between Vinted deploying a new detection mechanism and you pushing an update.
Because this is packaged as an Apify Actor, that update propagates to every user instantly. No pip upgrade. No breaking dependency chains. No "works on my machine."
If you are still maintaining a custom Python Selenium script that breaks every two weeks, you are not scraping Vinted. You are debugging Vinted.
Switch to infrastructure that was built to survive the platform, not chase it.
Ready to extract Vinted data at scale?
- Actor: Vinted Turbo Scraper on Apify
- Pricing: $1.50 per 1,000 results. No subscription. Free plan covers thousands of items.
Questions about the architecture or want to integrate this into a pipeline? Drop a comment below.

Top comments (0)