I built an AutoTrader.ca scraper for a friend and it broke three times before shipping

Camilo Aguilar — Wed, 17 Jun 2026 19:09:21 +0000

A couple months ago a friend hit me up. He needed structured car data from AutoTrader.ca. Prices, specs, dealer contacts. He was building something around used car leads in Canada and wanted a clean feed of listings he could work with.

I said sure, how hard could it be.

Famous last words.

I'm mainly an iOS developer. Python scraping isn't really my day job, but I've done enough side projects to be comfortable with httpx and BeautifulSoup. A car listing site shouldn't be complicated, right?

First issue: half the examples I found online were outdated. AutoTrader.ca had quietly migrated to a new backend. They're now running on AutoScout24's infrastructure and the whole site is Next.js SSR. Old scrapers were hitting window['ngVdpModel'], which doesn't exist anymore. Just gone.

So I spent a couple days figuring out what does exist. Turns out Next.js inlines all the page data into a <script id="__NEXT_DATA__"> tag on every page. Every listing, every price, all the dealer info and equipment lists. It's sitting right there as JSON. You don't even have to parse HTML beyond finding that one script tag.

from bs4 import BeautifulSoup
import json

soup = BeautifulSoup(html, "html.parser")
tag = soup.find("script", {"id": "__NEXT_DATA__"})
data = json.loads(tag.string)
# everything you need is under data["props"]["pageProps"]

Once I found that pattern, the rest was just mapping dict keys to output fields. Annoying but mechanical work.

The proxy situation

My first runs were direct connections and worked fine on my machine. Then started getting rate limited. I ended up routing through a residential proxy (DataImpulse, Canadian IPs) and that fixed it.

But it added latency. Direct was ~0.5s per request. Through proxy: 1–3s. I was using asyncio.gather with a semaphore to run 50–100 requests in parallel, so the total runtime was still okay, but per-request the numbers looked worse.

First thing my friend said when I showed him the results:

"this other scraper is faster"

Cool.

The speed thing

He sent me a benchmark from a competing scraper. Looked quicker on the surface.

I dug into it and the difference was simple: the other scraper only hits search pages and skips individual listing detail pages. One search request gets you 100 listings. Fast. But detail pages are where the good stuff is: GPS coordinates, equipment lists, accident-free flag, Carfax links, dealer Google rating.

So I made it configurable:

Search only: one request per 100 listings, 5–6 seconds for 100 results. Good if you just need prices and basic specs.
Full detail: one extra request per listing, 8–10 seconds for 100 results with everything enriched.

For lead gen my friend mostly needed prices, dealer contacts, and location data. Search-only was enough for his use case. Full detail is there when you need it.

He stopped mentioning the competitor after that.

What the output looks like

Each listing comes back with 50+ fields. The ones I find most useful for anything lead gen related:

price_cad + average_market_price: instant signal on whether something is over or under market
dealer_phone, dealer_address_full, dealer_google_rating: structured dealer contact, ready to use
accident_free + carfax_url: quality signal per listing
latitude / longitude: useful for radius filtering or mapping
all_equipment: flat list of every feature, easy to grep

The schema is consistent across every record. Missing fields return null so you don't blow up downstream with KeyErrors.

Where it ended up

I packaged it as an Apify actor so my friend can run it without setting up Python environments or dealing with proxy config. He pastes a search URL, hits run, gets a dataset back. I handle it when AutoTrader changes something on their end.

If you want to try it: apify.com/kmiloaguilar/autotrader-ca-scraper

The __NEXT_DATA__ extraction pattern is pretty portable too. If you're scraping any Next.js SSR site, this approach works the same way.

I'm also curious if anyone's building lead gen tools around marketplace data. Dealer outreach, price alerts, that kind of thing. Drop a comment if you're working on something in that space.

DEV Community: Camilo Aguilar

I built an AutoTrader.ca scraper for a friend and it broke three times before shipping