<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Camilo Aguilar</title>
    <description>The latest articles on DEV Community by Camilo Aguilar (@camilo_aguilar_36aa304bde).</description>
    <link>https://dev.to/camilo_aguilar_36aa304bde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3989662%2F39047239-b2fd-4e7d-90e8-360cebb139d9.jpg</url>
      <title>DEV Community: Camilo Aguilar</title>
      <link>https://dev.to/camilo_aguilar_36aa304bde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/camilo_aguilar_36aa304bde"/>
    <language>en</language>
    <item>
      <title>I built an AutoTrader.ca scraper for a friend and it broke three times before shipping</title>
      <dc:creator>Camilo Aguilar</dc:creator>
      <pubDate>Wed, 17 Jun 2026 19:09:21 +0000</pubDate>
      <link>https://dev.to/camilo_aguilar_36aa304bde/i-built-an-autotraderca-scraper-for-a-friend-and-it-broke-three-times-before-shipping-2h3a</link>
      <guid>https://dev.to/camilo_aguilar_36aa304bde/i-built-an-autotraderca-scraper-for-a-friend-and-it-broke-three-times-before-shipping-2h3a</guid>
      <description>&lt;p&gt;A couple months ago a friend hit me up. He needed structured car data from AutoTrader.ca. Prices, specs, dealer contacts. He was building something around used car leads in Canada and wanted a clean feed of listings he could work with.&lt;/p&gt;

&lt;p&gt;I said sure, how hard could it be.&lt;/p&gt;

&lt;p&gt;Famous last words.&lt;/p&gt;




&lt;p&gt;I'm mainly an iOS developer. Python scraping isn't really my day job, but I've done enough side projects to be comfortable with httpx and BeautifulSoup. A car listing site shouldn't be complicated, right?&lt;/p&gt;

&lt;p&gt;First issue: half the examples I found online were outdated. AutoTrader.ca had quietly migrated to a new backend. They're now running on AutoScout24's infrastructure and the whole site is Next.js SSR. Old scrapers were hitting &lt;code&gt;window['ngVdpModel']&lt;/code&gt;, which doesn't exist anymore. Just gone.&lt;/p&gt;

&lt;p&gt;So I spent a couple days figuring out what &lt;em&gt;does&lt;/em&gt; exist. Turns out Next.js inlines all the page data into a &lt;code&gt;&amp;lt;script id="__NEXT_DATA__"&amp;gt;&lt;/code&gt; tag on every page. Every listing, every price, all the dealer info and equipment lists. It's sitting right there as JSON. You don't even have to parse HTML beyond finding that one script tag.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;script&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__NEXT_DATA__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# everything you need is under data["props"]["pageProps"]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once I found that pattern, the rest was just mapping dict keys to output fields. Annoying but mechanical work.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The proxy situation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My first runs were direct connections and worked fine on my machine. Then started getting rate limited. I ended up routing through a residential proxy (DataImpulse, Canadian IPs) and that fixed it.&lt;/p&gt;

&lt;p&gt;But it added latency. Direct was ~0.5s per request. Through proxy: 1–3s. I was using &lt;code&gt;asyncio.gather&lt;/code&gt; with a semaphore to run 50–100 requests in parallel, so the total runtime was still okay, but per-request the numbers looked worse.&lt;/p&gt;

&lt;p&gt;First thing my friend said when I showed him the results:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"this other scraper is faster"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Cool.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The speed thing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He sent me a benchmark from a competing scraper. Looked quicker on the surface.&lt;/p&gt;

&lt;p&gt;I dug into it and the difference was simple: the other scraper only hits search pages and skips individual listing detail pages. One search request gets you 100 listings. Fast. But detail pages are where the good stuff is: GPS coordinates, equipment lists, accident-free flag, Carfax links, dealer Google rating.&lt;/p&gt;

&lt;p&gt;So I made it configurable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search only&lt;/strong&gt;: one request per 100 listings, 5–6 seconds for 100 results. Good if you just need prices and basic specs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full detail&lt;/strong&gt;: one extra request per listing, 8–10 seconds for 100 results with everything enriched.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For lead gen my friend mostly needed prices, dealer contacts, and location data. Search-only was enough for his use case. Full detail is there when you need it.&lt;/p&gt;

&lt;p&gt;He stopped mentioning the competitor after that.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What the output looks like&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each listing comes back with 50+ fields. The ones I find most useful for anything lead gen related:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;price_cad&lt;/code&gt; + &lt;code&gt;average_market_price&lt;/code&gt;: instant signal on whether something is over or under market&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dealer_phone&lt;/code&gt;, &lt;code&gt;dealer_address_full&lt;/code&gt;, &lt;code&gt;dealer_google_rating&lt;/code&gt;: structured dealer contact, ready to use&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;accident_free&lt;/code&gt; + &lt;code&gt;carfax_url&lt;/code&gt;: quality signal per listing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;latitude&lt;/code&gt; / &lt;code&gt;longitude&lt;/code&gt;: useful for radius filtering or mapping&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;all_equipment&lt;/code&gt;: flat list of every feature, easy to grep&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The schema is consistent across every record. Missing fields return &lt;code&gt;null&lt;/code&gt; so you don't blow up downstream with KeyErrors.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Where it ended up&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I packaged it as an Apify actor so my friend can run it without setting up Python environments or dealing with proxy config. He pastes a search URL, hits run, gets a dataset back. I handle it when AutoTrader changes something on their end.&lt;/p&gt;

&lt;p&gt;If you want to try it: &lt;a href="https://apify.com/kmiloaguilar/autotrader-ca-scraper" rel="noopener noreferrer"&gt;apify.com/kmiloaguilar/autotrader-ca-scraper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;__NEXT_DATA__&lt;/code&gt; extraction pattern is pretty portable too. If you're scraping any Next.js SSR site, this approach works the same way.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm also curious if anyone's building lead gen tools around marketplace data. Dealer outreach, price alerts, that kind of thing. Drop a comment if you're working on something in that space.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>scrappe</category>
      <category>data</category>
    </item>
  </channel>
</rss>
