<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Omar Eldeeb</title>
    <description>The latest articles on DEV Community by Omar Eldeeb (@odeeb).</description>
    <link>https://dev.to/odeeb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3961363%2F988e15d2-f7cd-489d-8bb8-4e3cfae6e4e2.png</url>
      <title>DEV Community: Omar Eldeeb</title>
      <link>https://dev.to/odeeb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/odeeb"/>
    <language>en</language>
    <item>
      <title>How to Scrape Reddit Without the API (After the 2023 Price Changes)</title>
      <dc:creator>Omar Eldeeb</dc:creator>
      <pubDate>Sun, 31 May 2026 16:11:10 +0000</pubDate>
      <link>https://dev.to/odeeb/how-to-scrape-reddit-without-the-api-after-the-2023-price-changes-3nhm</link>
      <guid>https://dev.to/odeeb/how-to-scrape-reddit-without-the-api-after-the-2023-price-changes-3nhm</guid>
      <description>&lt;p&gt;If you've landed here, you already know the backstory: in 2023 Reddit's API went from free-and-generous to metered-and-expensive, third-party apps shut down, and a lot of data pipelines broke overnight. So the practical question for developers and data folks is no longer "should I use the API?" but &lt;strong&gt;how to scrape Reddit without the API&lt;/strong&gt; at all — cleanly, legally-aware, and without burning hours on requests that silently return &lt;code&gt;403&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This article walks through what genuinely works in 2026, what &lt;em&gt;looks&lt;/em&gt; like it works but doesn't, and the constraints you'll hit no matter which path you choose. The code paths you can verify yourself in a terminal; the rate limits, the ~250 search cap and the Pushshift/terms details are drawn from Reddit's docs and widely-reported community experience (links where it matters), and real-world enforcement is more erratic than any documented figure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing everyone tries first (and why it fails)
&lt;/h2&gt;

&lt;p&gt;The classic "no API" trick is appending &lt;code&gt;.json&lt;/code&gt; to any Reddit URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.reddit.com/r/programming/.json
https://www.reddit.com/r/programming/comments/&amp;lt;id&amp;gt;/.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a real, undocumented JSON view of the page. The problem is &lt;em&gt;where&lt;/em&gt; you call it from.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;From a browser (client-side JS):&lt;/strong&gt; it's &lt;strong&gt;CORS-blocked&lt;/strong&gt;. Reddit doesn't send &lt;code&gt;Access-Control-Allow-Origin&lt;/code&gt; for these endpoints, so &lt;code&gt;fetch()&lt;/code&gt; from your web app throws before you ever see data. No amount of header tweaking fixes CORS from the browser — it's enforced by the browser, not by your code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;From a datacenter server (AWS, GCP, a VPS):&lt;/strong&gt; the &lt;code&gt;.json&lt;/code&gt; endpoints increasingly return &lt;strong&gt;HTTP 403&lt;/strong&gt; from datacenter IP ranges. Reddit tightened this after the API changes specifically to stop the "just hit &lt;code&gt;.json&lt;/code&gt; from a Lambda" pattern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the &lt;code&gt;.json&lt;/code&gt; approach dies in the two places people most want to use it: the browser and cheap cloud servers. You can sometimes get it to work from a residential IP with a sane &lt;code&gt;User-Agent&lt;/code&gt;, but it's fragile and rate-limited, and it is not a foundation you want a pipeline on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually works: old.reddit.com server-rendered HTML
&lt;/h2&gt;

&lt;p&gt;The most reliable no-API path is the &lt;strong&gt;old Reddit interface&lt;/strong&gt;, &lt;code&gt;old.reddit.com&lt;/code&gt;. Unlike the modern React SPA (which hydrates data client-side and is painful to parse), old Reddit ships &lt;strong&gt;fully server-rendered HTML, cookie-free&lt;/strong&gt;. You request a page, you get the listing already in the markup.&lt;/p&gt;

&lt;p&gt;Two important nuances I want to be honest about:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Subreddit listings and user-profile pages&lt;/strong&gt; parse fine and often work even &lt;strong&gt;from datacenter IPs&lt;/strong&gt;. These are the easy wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search results and comment threads&lt;/strong&gt; are stricter — in practice you'll need &lt;strong&gt;residential IPs&lt;/strong&gt; to fetch them reliably, because Reddit rate-limits and challenges those routes harder.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a minimal, correct example that pulls the front page of a subreddit from old Reddit and extracts post titles and links. It uses &lt;code&gt;requests&lt;/code&gt; + &lt;code&gt;BeautifulSoup&lt;/code&gt;, with a real User-Agent (Reddit reliably rejects the default &lt;code&gt;python-requests&lt;/code&gt; UA):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;

&lt;span class="n"&gt;HEADERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# A real, descriptive UA. Reddit blocks the default python-requests UA.
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-bot/1.0 (contact: you@example.com)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape_subreddit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subreddit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://old.reddit.com/r/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;subreddit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;HEADERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# 403/429 will surface here
&lt;/span&gt;
    &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div.thing[data-fullname]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;title_el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a.title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;title_el&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data-fullname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;title_el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;permalink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data-permalink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data-score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data-author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subreddit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data-subreddit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;scrape_subreddit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;programming&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;div.thing&lt;/code&gt; element carries most of what you need as &lt;code&gt;data-*&lt;/code&gt; attributes — &lt;code&gt;data-fullname&lt;/code&gt; (the post ID like &lt;code&gt;t3_abc123&lt;/code&gt;), &lt;code&gt;data-score&lt;/code&gt;, &lt;code&gt;data-author&lt;/code&gt;, &lt;code&gt;data-permalink&lt;/code&gt;. That's why old Reddit is so pleasant: the structure is stable and the data is right there in attributes instead of buried in a hydration blob.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pagination
&lt;/h3&gt;

&lt;p&gt;Old Reddit paginates with a &lt;code&gt;?count=25&amp;amp;after=&amp;lt;fullname&amp;gt;&lt;/code&gt; query string. The "next" button's &lt;code&gt;href&lt;/code&gt; gives you the URL directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;next_btn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span.next-button a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;next_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;next_btn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;next_btn&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Follow that link to walk listings. Add a polite delay (1–2 seconds) between requests and reuse a &lt;code&gt;requests.Session&lt;/code&gt; so connections are kept alive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard limits you cannot engineer around
&lt;/h2&gt;

&lt;p&gt;Before you build anything ambitious, internalize these constraints. They're properties of Reddit, not of your scraper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search caps at ~250 results (observed).&lt;/strong&gt; In practice Reddit's search — whether via the API or the HTML interface — appears to return roughly the top ~250 matches for a query and then stops, with no deep pagination past that. It's widely-observed behavior rather than an officially documented number, but it's consistent enough to plan around. If your use case is "give me every post ever mentioning X," search alone will not deliver it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comment indexing is weak.&lt;/strong&gt; Reddit search indexes &lt;em&gt;post&lt;/em&gt; titles and bodies far better than it indexes &lt;em&gt;comments&lt;/em&gt;. A keyword that lives only in comment threads will frequently not surface in search at all. This trips up sentiment and brand-monitoring projects constantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pushshift is gone for you (probably).&lt;/strong&gt; Pushshift used to be the answer for historical, full-text, deep Reddit search. Since 2023 it has been &lt;strong&gt;restricted to verified subreddit moderators&lt;/strong&gt;. Unless you're a mod with approved access, treat Pushshift as unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The official Data API is metered and commercial-use-restricted.&lt;/strong&gt; For completeness: the official route allows roughly &lt;strong&gt;100 requests/minute with OAuth&lt;/strong&gt; (about &lt;strong&gt;10/minute unauthenticated&lt;/strong&gt;), and Reddit's terms &lt;strong&gt;restrict commercial use&lt;/strong&gt; without a separate licensing/paid agreement. So even if you go "official," you're capped and legally boxed in for anything revenue-adjacent.&lt;/p&gt;

&lt;p&gt;Put together: there is no magic endpoint that gives you unlimited, deep, full-text Reddit history for free. Anyone who tells you otherwise is selling something or about to get blocked.&lt;/p&gt;

&lt;h2&gt;
  
  
  A sane workflow: build the query first, then export
&lt;/h2&gt;

&lt;p&gt;A mistake I see often is jumping straight to code, then discovering the query was wrong after burning a bunch of requests. Because search is capped at ~250 results and comment indexing is weak, &lt;strong&gt;the precision of your query matters more than the speed of your scraper&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So the workflow I'd recommend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compose and preview the query before you fetch anything.&lt;/strong&gt; A free, no-signup helper for this is the &lt;a href="https://datatooly.xyz/reddit-search-builder/" rel="noopener noreferrer"&gt;Reddit Search Builder&lt;/a&gt;. It lets you assemble a precise Reddit query (subreddit filters, time windows, sort, exact-phrase syntax) and previews the result schema so you know exactly which fields you'll get back before committing to a run. Getting the query right up front is the single biggest lever given the 250-result ceiling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run small from a residential context to validate&lt;/strong&gt; the HTML parser against real markup (selectors drift; verify before scaling).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scale the export with proper IP rotation.&lt;/strong&gt; This is where a DIY scraper gets painful — you need datacenter IPs for cheap subreddit/user listings, residential IPs for search and comments, retry/backoff on &lt;code&gt;403&lt;/code&gt;/&lt;code&gt;429&lt;/code&gt;, and dedup across pages. Maintaining that yourself is a real project.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you'd rather not run and maintain the proxy + retry + parsing stack, the &lt;a href="https://apify.com/constructive_calm/reddit-scraper-pro?fpr=v77kxu" rel="noopener noreferrer"&gt;Reddit Scraper Pro&lt;/a&gt; actor on Apify is the do-this-at-scale option I built around exactly the constraints above (disclosure: it's my actor). It runs &lt;strong&gt;five modes&lt;/strong&gt; (subreddit posts, search, comment threads, user profiles, and a monitor mode) and handles &lt;strong&gt;datacenter-first with residential fallback&lt;/strong&gt; so the easy routes stay cheap and the hard routes still work, with retry/backoff on &lt;code&gt;403&lt;/code&gt;/&lt;code&gt;429&lt;/code&gt; to keep success rates high. Pricing is &lt;strong&gt;$0.0025 per post with 10 free per run&lt;/strong&gt;, so you can validate output on a real query before spending anything. It's the same &lt;code&gt;old.reddit.com&lt;/code&gt; strategy described here, just with the IP rotation, backoff, and schema normalization already wired up.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick decision guide
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need a few subreddit or user listings, occasionally?&lt;/strong&gt; The &lt;code&gt;old.reddit.com&lt;/code&gt; + BeautifulSoup snippet above is genuinely enough. Run it from a residential IP, be polite, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need search results or comment trees at any volume?&lt;/strong&gt; Plan for residential IPs and accept the ~250-result search ceiling. Build your query carefully first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need scale, reliability, or scheduled monitoring?&lt;/strong&gt; Either invest serious time in a rotating-proxy pipeline, or hand it to a managed actor and spend your time on the analysis instead of the plumbing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One honest closing note
&lt;/h2&gt;

&lt;p&gt;Whatever path you pick, respect the source. Reddit's terms prohibit unauthorized commercial use of its data, the official API is rate-limited for a reason, and aggressive scraping gets IPs and projects banned. Scrape conservatively, identify your bot honestly in the &lt;code&gt;User-Agent&lt;/code&gt;, cache what you fetch so you don't re-hammer the same pages, and don't republish content in ways that violate users' or Reddit's rights. "Without the API" is a technical choice — it isn't a license to ignore the terms behind it. Build accordingly, and your pipeline will outlast the next round of changes.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>api</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to Export Google Patents to CSV (Honest Guide to Every Real Path)</title>
      <dc:creator>Omar Eldeeb</dc:creator>
      <pubDate>Sun, 31 May 2026 16:02:23 +0000</pubDate>
      <link>https://dev.to/odeeb/how-to-export-google-patents-to-csv-honest-guide-to-every-real-path-2o9a</link>
      <guid>https://dev.to/odeeb/how-to-export-google-patents-to-csv-honest-guide-to-every-real-path-2o9a</guid>
      <description>&lt;p&gt;If you've ever needed to pull a few thousand patents into a spreadsheet — every filing by a competitor, every patent citing your portfolio, the legal status of an entire technology cluster — you've probably searched &lt;strong&gt;how to export Google Patents to CSV&lt;/strong&gt; and found a maze of half-answers. This guide cuts through it. I'll show you exactly what works, what's capped, and what's quietly impossible, with verified facts and a runnable example.&lt;/p&gt;

&lt;p&gt;Let me start with the single most important thing, because it shapes every decision below:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Google Patents has no public REST API.&lt;/strong&gt; There is no documented, supported HTTP endpoint you can hit to query patents programmatically. This is the root cause of nearly every frustration people run into.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With that established, here are the three real paths, from simplest to most powerful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 1: The built-in CSV download (fast, but capped at 1,000)
&lt;/h2&gt;

&lt;p&gt;Google Patents &lt;em&gt;does&lt;/em&gt; have an export button, and for small jobs it's perfect. Run a search at &lt;a href="https://patents.google.com" rel="noopener noreferrer"&gt;patents.google.com&lt;/a&gt;, then look for the &lt;strong&gt;Download (CSV)&lt;/strong&gt; link near the results.&lt;/p&gt;

&lt;p&gt;It works. But it has a hard ceiling:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The built-in CSV export returns only the top 1,000 results.&lt;/strong&gt; If your query matches 40,000 patents, you get the first 1,000 by relevance and nothing more.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The exported columns are also fairly thin — typically id, title, assignee, inventor, priority/filing/publication/grant dates, and result link. You do &lt;strong&gt;not&lt;/strong&gt; get the abstract, the claims text, the full citation graph, or detailed legal-status events. For a quick competitor snapshot, this is fine. For analysis, it's a teaser.&lt;/p&gt;

&lt;p&gt;A practical tip: tighten your query so the 1,000 you get are the 1,000 you want. Combine fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;assignee:&lt;/span&gt;&lt;span class="s2"&gt;"Tesla"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;inventor:&lt;/span&gt;&lt;span class="s2"&gt;"Straubel"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;AND&lt;/span&gt; &lt;span class="py"&gt;before:&lt;/span&gt;&lt;span class="nl"&gt;priority&lt;/span&gt;&lt;span class="dl"&gt;:&lt;/span&gt;&lt;span class="m"&gt;20200101&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Google Patents supports field-qualified search — &lt;code&gt;assignee:&lt;/code&gt;, &lt;code&gt;inventor:&lt;/code&gt;, &lt;code&gt;before:&lt;/code&gt;/&lt;code&gt;after:&lt;/code&gt; with &lt;code&gt;priority&lt;/code&gt;/&lt;code&gt;filing&lt;/code&gt;/&lt;code&gt;publication&lt;/code&gt;, country codes, CPC classifications, and free text. Narrowing first is the difference between a useful 1,000-row export and a useless one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 2: BigQuery — the only Google-supported bulk path
&lt;/h2&gt;

&lt;p&gt;When 1,000 rows isn't enough, there is exactly one path Google itself supports for bulk patent data, and it's a good one: the &lt;strong&gt;&lt;code&gt;patents-public-data&lt;/code&gt;&lt;/strong&gt; dataset on Google BigQuery.&lt;/p&gt;

&lt;p&gt;This is a genuinely first-class resource. The main table, &lt;code&gt;patents-public-data.patents.publications&lt;/code&gt;, contains bibliographic information on tens of millions of patent publications worldwide, with structured fields for assignees, inventors, titles, abstracts, claims, CPC/IPC classifications, citations, and priority/filing/publication dates — far richer than the CSV button.&lt;/p&gt;

&lt;p&gt;Two things to know before you commit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It requires SQL.&lt;/strong&gt; There's no point-and-click here. You write queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing is generous but real.&lt;/strong&gt; On-demand BigQuery gives you the &lt;strong&gt;first 1 TiB of query data processed free every month&lt;/strong&gt;; beyond that, queries are billed per TiB scanned (Google has historically documented the patents dataset access at $5/TB, and current general on-demand US pricing is $6.25/TiB — check the &lt;a href="https://cloud.google.com/bigquery/pricing" rel="noopener noreferrer"&gt;official BigQuery pricing page&lt;/a&gt; for the rate that applies to you). The patents tables are large, so a careless &lt;code&gt;SELECT *&lt;/code&gt; can chew through your free tier in a single query. Always select only the columns you need and filter early.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a real, runnable example. It pulls US patents matching an assignee, flattens the repeated fields (titles and assignees are nested arrays in this schema), and writes a clean CSV. You'll need a Google Cloud project and &lt;code&gt;pip install google-cloud-bigquery pandas db-dtypes&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;bigquery&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bigquery&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-gcp-project-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# title_localized and assignee_harmonized are REPEATED records, so UNNEST them.
# Filter by country and date FIRST to limit the bytes scanned (and the cost).
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
SELECT
  pub.publication_number,
  title.text          AS title,
  assignee.name       AS assignee,
  pub.filing_date,
  pub.publication_date,
  pub.grant_date
FROM `patents-public-data.patents.publications` AS pub,
  UNNEST(pub.title_localized)      AS title,
  UNNEST(pub.assignee_harmonized)  AS assignee
WHERE pub.country_code = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;US&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
  AND title.language = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
  AND assignee.name LIKE &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%TESLA%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
  AND pub.filing_date BETWEEN 20150101 AND 20231231
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Dry run FIRST — see how many bytes this will scan before you pay a cent.
&lt;/span&gt;&lt;span class="n"&gt;dry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bigquery&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;QueryJobConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dry_run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This query will scan &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_bytes_processed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; GB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# If that looks acceptable, run it for real.
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_dataframe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tesla_patents.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exported &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; rows to tesla_patents.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;dry_run&lt;/code&gt; step is the habit that saves your bill. It returns the exact byte count &lt;em&gt;without&lt;/em&gt; running the query, so you always know the cost before you spend it. Dates in this dataset are stored as integers in &lt;code&gt;YYYYMMDD&lt;/code&gt; form (e.g. &lt;code&gt;20150101&lt;/code&gt;), which trips up newcomers — note the comparison style above.&lt;/p&gt;

&lt;p&gt;BigQuery is the right answer for academic analysis, large-scale landscaping, and anything where you control a GCP project and are comfortable with SQL. Its main downsides: the SQL learning curve for the nested schema, and the fact that some legal-status events and full citation context require joining additional tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 3 (the one people &lt;em&gt;expect&lt;/em&gt; to work): browser scraping — and why it doesn't
&lt;/h2&gt;

&lt;p&gt;This is where most tutorials go wrong, so let me be precise.&lt;/p&gt;

&lt;p&gt;Google Patents search is powered internally by an XHR endpoint (the one your browser hits as you type a query). The intuitive idea is: "I'll just &lt;code&gt;fetch()&lt;/code&gt; that endpoint from a little web page and read the JSON." It feels like it should work. It does not, and here's the exact reason:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The query endpoint does not send a permissive CORS header.&lt;/strong&gt; A browser running on any other origin cannot read the response — the browser blocks it before your JavaScript ever sees the data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a bug you can header-hack around from client-side JS; CORS is enforced by the browser, not the server. So a pure in-browser scraper served from your own domain is a dead end. Combined with "no public REST API," this is why client-side patent tools can only ever &lt;em&gt;build&lt;/em&gt; a query and show you a curated sample — a browser on another origin can't read live results, so the fetch has to happen server-side.&lt;/p&gt;

&lt;p&gt;To actually fetch results at scale you need a &lt;strong&gt;server-side&lt;/strong&gt; process (your own backend, a cloud function, or a hosted scraper) that makes the request without a browser's CORS enforcement, handles pagination, parses the response, and respects rate limits. That's real work, and it's the gap the two tools below fill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tying it together: a builder + a scraper
&lt;/h2&gt;

&lt;p&gt;If you just need to construct a precise query and preview what the data looks like — without writing SQL or standing up a backend — the free &lt;a href="https://datatooly.xyz/google-patents-search-builder/" rel="noopener noreferrer"&gt;Google Patents Search Builder&lt;/a&gt; lets you compose searches by assignee, inventor, and keyword and see a real sample of the structured output. Because of the CORS reality above, it's honest about what it is: a &lt;strong&gt;query builder with a real sample preview&lt;/strong&gt;, not a live in-browser scraper. It's a great way to nail your query before you spend BigQuery bytes or kick off a larger run.&lt;/p&gt;

&lt;p&gt;When you need the full export — thousands of rows, across 100+ patent offices, &lt;em&gt;with&lt;/em&gt; the fields the built-in CSV omits — the &lt;a href="https://apify.com/constructive_calm/google-patents-intelligence?fpr=v77kxu" rel="noopener noreferrer"&gt;Google Patents Intelligence actor on Apify&lt;/a&gt; (disclosure: I build it, and the free Search Builder above) runs the live search server-side and returns the citation graph, legal status, and claims count as CSV, JSON, Excel, or an API endpoint. It's the do-this-at-scale option for the cases where the 1,000-row cap bites and you'd rather not maintain SQL pipelines or your own scraping infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which path should you pick?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A quick competitor snapshot, under 1,000 results?&lt;/strong&gt; Use the built-in CSV button. Narrow your query first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large-scale analysis and you know SQL?&lt;/strong&gt; BigQuery's &lt;code&gt;patents-public-data&lt;/code&gt; is the gold standard. Dry-run every query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thousands of enriched rows without SQL or servers?&lt;/strong&gt; A hosted scraper is the pragmatic choice.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A closing note on doing this responsibly: scraping any site, Google Patents included, lives under its Terms of Service and applicable law. For bulk needs, the BigQuery dataset is the explicitly Google-supported route and the cleanest one to stand behind — prefer it when SQL is on the table, and keep request volumes reasonable when you don't. Build the right query once, and the export takes care of itself.&lt;/p&gt;

</description>
      <category>python</category>
      <category>api</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Read Company Hiring Signals From Public Job Board APIs (with code)</title>
      <dc:creator>Omar Eldeeb</dc:creator>
      <pubDate>Sun, 31 May 2026 16:02:11 +0000</pubDate>
      <link>https://dev.to/odeeb/read-company-hiring-signals-from-public-job-board-apis-with-code-18i8</link>
      <guid>https://dev.to/odeeb/read-company-hiring-signals-from-public-job-board-apis-with-code-18i8</guid>
      <description>&lt;p&gt;A company's open roles are the most honest document it publishes. The careers page is marketing; the job board is the budget. If you learn to read &lt;strong&gt;company hiring signals&lt;/strong&gt; straight from the open requisitions, you can infer where a business is investing months before it shows up in a press release.&lt;/p&gt;

&lt;p&gt;And the best part for developers: most of the data is sitting behind public, no-auth JSON APIs. The applicant tracking systems (ATS) that power those careers pages — Greenhouse, Lever, Ashby, SmartRecruiters — expose job boards as plain endpoints. You can fetch them, parse them, and classify the role mix yourself.&lt;/p&gt;

&lt;p&gt;This article shows you how to do that, with a snippet that actually runs in a browser console.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why open roles encode strategy
&lt;/h2&gt;

&lt;p&gt;Headcount is the clearest expression of intent a company has. Every requisition is a funded decision someone fought for in a planning meeting. So the &lt;em&gt;mix&lt;/em&gt; of roles, not just the count, tells a story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A wave of Account Executives and Sales Engineers&lt;/strong&gt; → they have a product that works and are pouring fuel on go-to-market. Likely just raised, or hitting a revenue inflection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A spike in backend / infra / platform engineers&lt;/strong&gt; → scaling pains. The thing is growing faster than the architecture can handle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New "AI", "ML", or "Applied Scientist" titles where there were none&lt;/strong&gt; → a strategic bet that didn't exist last quarter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Roles concentrated in a new city or country&lt;/strong&gt; → geographic expansion. A "Country Manager, Germany" is a market-entry announcement disguised as a job post.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recruiters and People Ops hiring&lt;/strong&gt; → they expect to hire a &lt;em&gt;lot&lt;/em&gt; soon. Recruiting hires are often a leading indicator of broader expansion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First Compliance / Legal / Finance leadership&lt;/strong&gt; → maturing toward a fundraise, audit, or exit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of intelligence that sales teams pay for under the label "hiring intent" or "buying signals." You can derive a useful slice of it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data source: public ATS job boards
&lt;/h2&gt;

&lt;p&gt;Greenhouse runs a dedicated read-only API for board content. The shape is dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET https://boards-api.greenhouse.io/v1/boards/{board_token}/jobs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;board_token&lt;/code&gt; is usually the company's slug — &lt;code&gt;stripe&lt;/code&gt;, &lt;code&gt;airbnb&lt;/code&gt;, etc. No API key, no OAuth, no header dance. It returns &lt;code&gt;200 OK&lt;/code&gt; with &lt;code&gt;Content-Type: application/json&lt;/code&gt; and, crucially for front-end code, &lt;code&gt;Access-Control-Allow-Origin: *&lt;/code&gt;. That wildcard CORS header means the request genuinely succeeds from a browser on any origin — you can paste the fetch below straight into DevTools and it works.&lt;/p&gt;

&lt;p&gt;Here's the response shape (illustrative values — run it yourself for live data), so you know what you're parsing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234567&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Account Executive, Enterprise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"updated_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-20T16:58:18-04:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"San Francisco, CA"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"absolute_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/jobs/search?gh_jid=1234567"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each job gives you &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;location.name&lt;/code&gt;, &lt;code&gt;updated_at&lt;/code&gt;, and a link. That's all you need to map role mix to intent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fetch + classify in ~50 lines
&lt;/h2&gt;

&lt;p&gt;Below is a self-contained function. It pulls a board, buckets each role into a category by keyword, and returns a sorted intent profile plus a naive "primary signal." Drop it in your browser console with any Greenhouse board token.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SIGNALS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;account executive|ae|sales|business development|bdr|sdr|revenue&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;marketing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;marketing|growth|demand gen|content|brand|seo&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;engineering&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;engineer|developer|sre|devops|infrastructure|platform|backend|frontend&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ai_ml&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;machine learning|ml engineer|applied scientist|&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;ai&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;|research scientist|nlp&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;product manager|&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;pm&lt;/span&gt;&lt;span class="se"&gt;\b&lt;/span&gt;&lt;span class="sr"&gt;|product designer|ux|ui designer&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;recruiting&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;recruiter|talent|people ops|hr business partner&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;finance_legal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;finance|accounting|controller|legal|counsel|compliance&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;support&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(&lt;/span&gt;&lt;span class="sr"&gt;support|customer success|csm|implementation|onboarding&lt;/span&gt;&lt;span class="se"&gt;)\b&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;re&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;SIGNALS&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;other&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;hiringSignals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;boardToken&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://boards-api.greenhouse.io/v1/boards/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;boardToken&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/jobs`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Board "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;boardToken&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;" returned &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;jobs&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;byCity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;byCity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;byCity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topCities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;byCity&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;board&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boardToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;totalRoles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;roleMix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;topLocations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;topCities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;primarySignal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Try it:&lt;/span&gt;
&lt;span class="nf"&gt;hiringSignals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;stripe&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it and you get something shaped like this (example numbers — boards change daily, so your run will differ):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;board:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stripe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;totalRoles:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;470&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;roleMix:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;"engineering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;160&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sales"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;topLocations:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s2"&gt;"San Francisco, CA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"New York, NY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;38&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;primarySignal:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"engineering"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note &lt;code&gt;primarySignal&lt;/code&gt; is just &lt;code&gt;roleMix[0][0]&lt;/code&gt; — the highest-count category — and &lt;code&gt;classify()&lt;/code&gt; files each title under its &lt;em&gt;first&lt;/em&gt; matching pattern, so treat both as a rough first read, not gospel. From there, the interesting analysis isn't the snapshot — it's the &lt;strong&gt;delta&lt;/strong&gt;. Save today's &lt;code&gt;roleMix&lt;/code&gt; and diff it next week. A category that jumps from 3 to 18 roles is the signal. A new city appearing in &lt;code&gt;topLocations&lt;/code&gt; is the signal. Absolute counts are noisy; &lt;em&gt;changes&lt;/em&gt; are where intent lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharpen the read
&lt;/h2&gt;

&lt;p&gt;A few things to layer on once the basics work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weight by recency.&lt;/strong&gt; Roles with a fresh &lt;code&gt;updated_at&lt;/code&gt; reflect current priorities more than ones reposted for months. Filter to roles updated in the last 30 days.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch for &lt;em&gt;firsts&lt;/em&gt;.&lt;/strong&gt; The first role in a category (first "Enterprise AE", first "Solutions Architect") often matters more than the tenth. Track which categories crossed from zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seniority skew.&lt;/strong&gt; A batch of "Head of" / "Director" / "VP" postings signals a layer being built out — usually ahead of an org's scaling phase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-reference with funding.&lt;/strong&gt; Sales-and-marketing hiring spikes that line up with a recent raise are the strongest go-to-market-expansion tell.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The regexes above are deliberately simple. Real titles are messy ("Staff Software Engineer, Payments Risk Platform"), and a keyword bucket will misfile some. For anything beyond exploration, an LLM classifier handling each title against your taxonomy is far more robust than brittle patterns — but start with regex to understand your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Want to eyeball one company right now?
&lt;/h2&gt;

&lt;p&gt;If you just want to point at a single Greenhouse board and see the role mix without writing code, there's a free browser tool that runs the same idea live: &lt;a href="https://datatooly.xyz/company-hiring-signals/" rel="noopener noreferrer"&gt;datatooly.xyz/company-hiring-signals&lt;/a&gt; (disclosure: I built it, and the Apify actor mentioned later). It fetches the public board client-side (thanks to that wildcard CORS header) and renders the intent breakdown. Good for a quick check on one prospect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The other ATS platforms (and the hard one)
&lt;/h2&gt;

&lt;p&gt;Greenhouse is the easiest, but it's not alone. Several major ATS platforms expose public job boards:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Endpoint shapes and headers drift over time — test each ATS before depending on it in production.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lever&lt;/strong&gt; — &lt;code&gt;https://api.lever.co/v0/postings/{company}?mode=json&lt;/code&gt; returns a plain JSON array with &lt;code&gt;text&lt;/code&gt;, &lt;code&gt;categories.team&lt;/code&gt;, &lt;code&gt;categories.location&lt;/code&gt;, and &lt;code&gt;hostedUrl&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ashby&lt;/strong&gt; — a public posting API keyed by job-board name, and it also sends &lt;code&gt;Access-Control-Allow-Origin: *&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SmartRecruiters&lt;/strong&gt; — a public postings endpoint per company.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each has a slightly different response shape, so you'd normalize them into one schema (title, location, team, updated date, url) before classifying.&lt;/p&gt;

&lt;p&gt;Then there's &lt;strong&gt;Workday&lt;/strong&gt;, which is the genuinely hard one. Workday tenants serve postings through a per-tenant CXS endpoint that you have to discover, and pagination is done via POST with an offset body rather than a clean GET — no friendly wildcard CORS, no single base URL. A meaningful share of large enterprises run on Workday, so any "company hiring signals" pipeline that ignores it has a blind spot exactly where the biggest budgets are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing this at scale
&lt;/h2&gt;

&lt;p&gt;Reading one board by hand is a five-minute task. Tracking &lt;strong&gt;25,000+ companies&lt;/strong&gt;, normalizing four-plus ATS schemas (including the Workday pagination dance), running an AI classifier over messy titles, and diffing week-over-week to fire alerts when a category spikes — that's a data pipeline, not a console snippet.&lt;/p&gt;

&lt;p&gt;If you'd rather not build and maintain all of that, the &lt;a href="https://apify.com/constructive_calm/ats-hiring-intent-scraper?fpr=v77kxu" rel="noopener noreferrer"&gt;ATS Hiring-Intent Scraper on Apify&lt;/a&gt; does the heavy lifting: it pulls across the major ATS platforms, classifies role mix into intent categories, and is built for running on a schedule so you catch the &lt;em&gt;changes&lt;/em&gt; rather than just snapshots. Useful if hiring signals feed a sales or research workflow and you need them reliably, not as a one-off.&lt;/p&gt;

&lt;p&gt;But for learning the concept and prototyping on a handful of targets, the fetch-and-classify snippet above is all you need — and it's a genuinely fun afternoon of code.&lt;/p&gt;




&lt;p&gt;One honest note: these endpoints are public because companies &lt;em&gt;want&lt;/em&gt; their jobs found, but they're meant for candidates, not bulk harvesting. Keep request rates polite, cache aggressively, respect each platform's Terms of Service and &lt;code&gt;robots.txt&lt;/code&gt;, and don't republish personal data. Read the strategy, not the people.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>api</category>
      <category>sales</category>
    </item>
  </channel>
</rss>
