<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: James Taylor</title>
    <description>The latest articles on DEV Community by James Taylor (@james_taylor_037c857e0299).</description>
    <link>https://dev.to/james_taylor_037c857e0299</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3968529%2F0bdcd4df-d862-471b-a5a1-549d0d27b334.png</url>
      <title>DEV Community: James Taylor</title>
      <link>https://dev.to/james_taylor_037c857e0299</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/james_taylor_037c857e0299"/>
    <language>en</language>
    <item>
      <title>How we built a Reddit comment-tree scraper that returns upvote scores — through a residential proxy</title>
      <dc:creator>James Taylor</dc:creator>
      <pubDate>Thu, 04 Jun 2026 15:26:08 +0000</pubDate>
      <link>https://dev.to/james_taylor_037c857e0299/how-we-built-a-reddit-comment-tree-scraper-that-returns-upvote-scores-through-a-residential-proxy-565d</link>
      <guid>https://dev.to/james_taylor_037c857e0299/how-we-built-a-reddit-comment-tree-scraper-that-returns-upvote-scores-through-a-residential-proxy-565d</guid>
      <description>&lt;p&gt;Most "Reddit scrapers" quietly lie to you. They hand back a flat list of top-level comments with no upvote scores, no nesting, and no idea which reply was buried at the bottom of a 200-comment thread. That's because they're reading Reddit's RSS feed — the one endpoint Reddit still serves cheaply — and RSS throws away almost everything that makes a Reddit discussion &lt;em&gt;interesting&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;We needed the real thing: every comment, with its &lt;strong&gt;author, body, upvote score, depth, and parent&lt;/strong&gt;, plus the post's score and upvote ratio. So we built it, published it on the Apify Store as &lt;a href="https://apify.com/signalengine/reddit-deep-comments" rel="noopener noreferrer"&gt;Reddit Comment Tree Scraper&lt;/a&gt;, and this post walks through exactly how it works — the 403 wall, why a residential proxy is non-negotiable, and the one trick that keeps the cost sane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Reddit is hard to scrape (and why RSS is a cop-out)
&lt;/h2&gt;

&lt;p&gt;Reddit used to have a famously friendly JSON API: append &lt;code&gt;.json&lt;/code&gt; to any thread URL and you'd get the whole tree. Then they locked it down. Today, if you &lt;code&gt;fetch()&lt;/code&gt; a thread's &lt;code&gt;.json&lt;/code&gt; from a server, you get a &lt;code&gt;403&lt;/code&gt;. It's gated on two things at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;IP reputation.&lt;/strong&gt; Datacenter IPs (AWS, GCP, Hetzner, the usual suspects) are blocked outright. A residential IP from a real ISP passes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TLS / client fingerprint.&lt;/strong&gt; Even from a residential IP, a plain HTTP client gets challenged. Reddit fingerprints the TLS handshake and headers and can tell a &lt;code&gt;node-fetch&lt;/code&gt; from a real browser.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A datacenter IP + a real browser still &lt;code&gt;403&lt;/code&gt;s. A residential IP + &lt;code&gt;curl&lt;/code&gt; still gets challenged. You need &lt;strong&gt;both&lt;/strong&gt;: a residential IP &lt;em&gt;and&lt;/em&gt; a real browser. That's the whole problem in one sentence, and it's why the cheap actors don't bother — they fall back to RSS, which is unauthenticated and gives you flat, scoreless comments.&lt;/p&gt;

&lt;p&gt;If all you need is "what are the new posts in r/SaaS," RSS is fine (and we use it ourselves for cheap discovery — more on that below). But if you need the &lt;em&gt;engagement data&lt;/em&gt; — which comment actually resonated, how deep the thread went, what the sentiment looked like at each level — RSS can't help you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approach: warm a real browser, then read the canonical JSON
&lt;/h2&gt;

&lt;p&gt;Here's the core insight that makes the actor both reliable &lt;em&gt;and&lt;/em&gt; affordable:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You don't need to &lt;em&gt;render&lt;/em&gt; every page. You need a real browser to &lt;strong&gt;clear Reddit's gate once&lt;/strong&gt;, and then you can fetch the lightweight &lt;code&gt;.json&lt;/code&gt; from inside that same browser context as many times as you like.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So the flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spin up a headless Chromium through a &lt;strong&gt;residential proxy&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Navigate to &lt;code&gt;old.reddit.com&lt;/code&gt; once — this clears the anti-bot gate and warms the session (cookies, fingerprint, the works).&lt;/li&gt;
&lt;li&gt;From inside that warmed page, &lt;code&gt;fetch()&lt;/code&gt; each thread's canonical &lt;code&gt;.json&lt;/code&gt;. Because the request now originates from a real, gate-cleared browser context, Reddit serves it.&lt;/li&gt;
&lt;li&gt;Parse the JSON into a clean post + comment tree.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key line is the in-page fetch. We use Playwright's &lt;code&gt;page.evaluate()&lt;/code&gt; to run the fetch &lt;em&gt;in the browser's own JS context&lt;/em&gt;, so it inherits the warmed session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Accept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;__status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;jsonUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;jsonUrl&lt;/code&gt; is just the thread URL with &lt;code&gt;?limit=200&amp;amp;raw_json=1&lt;/code&gt; tacked on. &lt;code&gt;raw_json=1&lt;/code&gt; stops Reddit from HTML-escaping the comment bodies, so you get clean text instead of &lt;code&gt;&amp;amp;amp;&lt;/code&gt; soup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting the &lt;em&gt;whole&lt;/em&gt; tree, not just the first page
&lt;/h2&gt;

&lt;p&gt;Reddit serves roughly the top 200 comments per thread and collapses the rest into "load more comments" stubs. If you stop there, you silently lose the deepest, often most candid replies.&lt;/p&gt;

&lt;p&gt;Those stubs aren't dead ends — each one carries the IDs of the comments it's hiding. We collect those IDs and POST them to Reddit's &lt;code&gt;/api/morechildren&lt;/code&gt; endpoint (again, from inside the warmed browser context), 100 at a time, until we hit the user's &lt;code&gt;maxComments&lt;/code&gt; cap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;link_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;linkId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// t3_&amp;lt;postId&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;children&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;// up to 100 comment IDs&lt;/span&gt;
  &lt;span class="na"&gt;api_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;confidence&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;raw_json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the difference between a scraper that returns "the 200 comments Reddit felt like showing" and one that returns the actual discussion. Each comment comes back with its &lt;code&gt;depth&lt;/code&gt; and &lt;code&gt;parentId&lt;/code&gt;, so you can rebuild the exact nesting — or just use the flat list with scores attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem — and the trick that solves it
&lt;/h2&gt;

&lt;p&gt;Residential proxy bandwidth is the floor on cost for any serious Reddit scrape. Apify's residential proxy runs about &lt;strong&gt;$8/GB&lt;/strong&gt;. If you naively launched a fresh browser and a fresh proxy IP for every single thread, you'd pay for a full page render and a new IP rotation on every request. That gets expensive fast.&lt;/p&gt;

&lt;p&gt;Two levers fix this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Warm once per session, then batch.&lt;/strong&gt; Each worker opens &lt;em&gt;one&lt;/em&gt; proxy IP, clears the gate &lt;em&gt;once&lt;/em&gt;, then fires up to &lt;code&gt;threadsPerSession&lt;/code&gt; (default 15) thread-&lt;code&gt;.json&lt;/code&gt; fetches through that same warmed context before rotating to a fresh IP. Browser startup and gate-clearing — the expensive parts — get amortised across 15 threads instead of paid once per thread. After that, you're mostly paying for lightweight JSON payloads, not page renders.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;openWarmedContext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;   &lt;span class="c1"&gt;// one IP, gate cleared once&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;inSession&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;inSession&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;threadsPerSession&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchThreadInPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// cheap JSON fetch&lt;/span&gt;
      &lt;span class="nx"&gt;inSession&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                    &lt;span class="c1"&gt;// rotate IP, repeat&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Bring your own residential proxy.&lt;/strong&gt; This is the big one. The actor uses Apify's &lt;code&gt;createProxyConfiguration&lt;/code&gt;, which transparently accepts a &lt;strong&gt;"Custom proxies"&lt;/strong&gt; option in the proxy input. Paste your own residential proxy URLs — providers like IPRoyal sell residential bandwidth at &lt;strong&gt;$1–2/GB&lt;/strong&gt; — and you're typically &lt;strong&gt;3–5× cheaper&lt;/strong&gt; than Apify's residential, with zero code changes. The actor rotates your IPs per session exactly the same way.&lt;/p&gt;

&lt;p&gt;That BYO-proxy support is deliberate. We run this actor inside our own product at high volume, and the proxy economics are the whole game at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reliability: requeue on a fresh IP
&lt;/h2&gt;

&lt;p&gt;Residential IPs are flaky by nature — some are slow, some are already rate-limited by Reddit, some just die mid-session. The actor treats a blocked or stale fetch as retryable: a thread that fails gets pushed back onto the queue (up to 3 tries) and picked up by the &lt;em&gt;next&lt;/em&gt; warmed session on a &lt;em&gt;fresh&lt;/em&gt; IP. A thread that comes back valid-but-empty (deleted/removed post) is not retried — there's nothing there to get.&lt;/p&gt;

&lt;p&gt;This is the difference between "works in a demo" and "works on 10,000 threads overnight." You assume IPs will fail and design the retry around it, rather than treating every failure as fatal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery for free
&lt;/h2&gt;

&lt;p&gt;One more economy: you don't need the expensive browser path just to &lt;em&gt;find&lt;/em&gt; threads. Reddit's per-subreddit RSS listing is still served cheaply and unauthenticated. So when you give the actor a list of &lt;code&gt;subreddits&lt;/code&gt;, it pulls the listing via plain RSS to discover thread IDs, and only spends the residential-browser budget on the actual deep scrape of each thread. Cheap where you can be, expensive only where you must be.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get back
&lt;/h2&gt;

&lt;p&gt;One clean record per thread:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"post"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subreddit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SaaS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How we cut churn 30%"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;142&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"upvoteRatio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.97&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"numComments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"growth_greg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What did your onboarding look like before?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"depth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"parentId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"t3_abc123"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every comment carries the score and the tree position. That's the data sentiment models, social-listening tools, and trend analysts actually need — and the data RSS-based scrapers structurally cannot give you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compliance note
&lt;/h2&gt;

&lt;p&gt;The actor reads &lt;strong&gt;public Reddit data only&lt;/strong&gt;. It never logs in, posts, votes, or messages. Use the data in line with Reddit's terms and whatever laws apply to you. We built it for research, analysis, and social listening — not for spamming subreddits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The actor is live on the Apify Store: &lt;strong&gt;&lt;a href="https://apify.com/signalengine/reddit-deep-comments" rel="noopener noreferrer"&gt;Reddit Comment Tree Scraper — Full Threads + Scores&lt;/a&gt;&lt;/strong&gt;. Give it a subreddit or a list of thread URLs and you'll get back the full tree with scores. Drop in your own residential proxy to make it cheap at volume.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This scraper is one component of a much larger system. We use it inside &lt;a href="https://engine.signalsprint.io" rel="noopener noreferrer"&gt;SignalEngine&lt;/a&gt; — an autonomous outbound engine that turns Reddit (and other) conversations into qualified leads with AI-drafted, context-aware replies. If you'd rather have the conversations turned into pipeline automatically than wire up the data yourself, that's what the engine is for.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>javascript</category>
      <category>reddit</category>
      <category>apify</category>
    </item>
  </channel>
</rss>
