Devil Scrapes

Posted on Jun 2

Product Hunt Scraper: pull daily launches to JSON for $2/1K

#webscraping #python #apify #automation

Quick answer: Product Hunt publishes a public RSS feed at producthunt.com/feed covering each day's new launches. A Product Hunt scraper fetches that feed, parses every entry, and returns structured rows — name, tagline, maker, link, categories, posted-at timestamp — as typed JSON. The Apify Actor below does it for $0.002 per launch (~$2.00 per 1,000), with fingerprint rotation, residential proxy, and Pydantic-validated rows handled for you.

There is a narrow but high-signal window after a product launches on Product Hunt. SDRs want to reach a maker while launch-day attention is still warm. Makers watching competitor categories want to know the moment something new drops. Newsletter editors want yesterday's top launches formatted and ready. All of them need the same data — and Product Hunt's official API requires an OAuth application that most builders won't bother completing.

The free RSS feed is there. Parsing it reliably, from a cloud environment, under Product Hunt's rate limits — that part is where the work lives.

What is Product Hunt? 🔎

Product Hunt is a community platform where makers post new software products, hardware, and side-projects for community upvoting and discussion. Founded in 2013 by Ryan Hoover, it has become the de facto launch pad for SaaS products, developer tools, AI apps, and consumer software. The daily leaderboard surfaces the most-upvoted launches.

For data consumers, PH is a structured, time-stamped stream of the startup launch pipeline: every entry has a product name, a tagline, a maker, a launch URL, categories, and a posted-at timestamp — genuinely useful for sales prospecting, competitor monitoring, and trend analysis, provided you can get it out cleanly.

Does Product Hunt have a public API? 🤔

Partially. Product Hunt offers an official GraphQL API, but accessing it requires registering a developer application and obtaining OAuth credentials — a process gated on manual approval that many indie buyers skip. The free unauthenticated surface is the RSS feed at producthunt.com/feed, which exposes the most recent 20–50 launches with name, tagline, link, maker, categories, and published timestamp. Vote counts, comments, maker profile detail, and Hunter information live only in the authenticated API.

This Actor wraps the RSS layer — no OAuth dance, no token management. For votes and maker profiles, use the official API.

What the data looks like 📤

Each launch comes back as one flat, typed row — the exact fields from models.py:

{
  "feed_url": "https://www.producthunt.com/feed",
  "title": "Linktree Pro — One link for everything you create",
  "name": "Linktree Pro",
  "tagline": "One link for everything you create",
  "link": "https://www.producthunt.com/posts/linktree-pro",
  "author": "Alex Zaccaron",
  "content_html": "<p>We redesigned the link-in-bio...</p>",
  "categories": ["SaaS", "Marketing", "No-Code"],
  "published": "2026-05-30T08:00:00+00:00",
  "scraped_at": "2026-05-31T09:14:22+00:00"
}

Ten fields, the same shape every time, Pydantic-validated before the row is written. name and tagline are parsed from the RSS title by splitting on the — separator PH uses as its canonical title format; categories comes straight from the RSS <category> tags. It drops directly into Pandas, n8n, Make, or a Notion database with zero post-processing.

The naive approach (and why it falls apart) ⚙️

The first attempt most developers make: open the RSS URL in a browser (works), fire requests.get("https://www.producthunt.com/feed") (200 OK, parses fine), then schedule it on a server — where it starts 403ing within a day. Three things break, in this order:

1. IP reputation at the CDN layer. Product Hunt throttles cloud datacenter IPs far more aggressively than residential ones. We route every request through Apify residential proxies with sticky sessions — the target sees residential traffic, not a cloud scanner.

2. TLS fingerprint inspection. Python's urllib3 sends a TLS ClientHello that CDNs have learned to associate with scrapers. We replace the HTTP client with curl-cffi, impersonating real Chrome, Firefox, and Safari TLS + HTTP/2 handshakes, rotating across ("chrome131", "chrome124", "firefox147", "safari180") so no single fingerprint repeats often enough to flag.

3. Non-200 responses and partial-success status. When the feed returns a non-200 status, the naive script either crashes or writes a half-empty dataset and reports green. We treat a non-200 as a non-result: nothing garbage gets pushed, and we surface what actually happened via set_status_message so a scheduled run never reports "Done" over an empty dataset.

None of it is clever. All of it separates a script that worked once on your laptop from a feed that lands clean every morning.

The Actor 🛠️

The Actor is on the Apify Store: apify.com/DevilScrapes/producthunt-launches-scraper.

Paste the feed URL in the Apify Console and click Start, or call it from Python:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("DevilScrapes/producthunt-launches-scraper").call(
    run_input={
        "feedUrl": "https://www.producthunt.com/feed",
        "maxResults": 50,
        "includeContent": True,
        "proxyConfiguration": {"useApifyProxy": True},
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["name"], "—", item["tagline"])

Input parameters (all optional — defaults let you run with zero config):

Field	Type	Default	Notes
`feedUrl`	string	`https://www.producthunt.com/feed`	Global feed or any topic/collection RSS URL
`maxResults`	integer	20	Cap on rows returned; RSS exposes ~20–50 recent launches
`includeContent`	boolean	true	Whether to include the HTML body in `content_html`
`proxyConfiguration`	object	`{useApifyProxy: true}`	Residential proxy default

Pass any topic-specific feed URL — for example https://www.producthunt.com/topics/developer-tools/feed — to scope the output to a single category. Any valid Product Hunt RSS URL works without code changes.

Use cases 💡

Daily sales signal for SDRs. Schedule the Actor every 6 hours, filter categories for your ICP, and pipe new launches into your CRM before launch-day noise fades.

Competitor category monitoring. Run the Actor against your product's category feed, diff the link list against last week's run, and trigger a Slack alert when something new appears. One n8n workflow, no manual checking.

Newsletter feed for founder audiences. The title, tagline, link, author, and categories fields contain everything needed to draft a "Top 10 launches from yesterday" digest without visiting PH.

Cross-Actor lead enrichment. Pair this Actor with the YC Companies scraper or the ATS Tech Stack scraper: when a company you're tracking launches on PH, that's a high-intent outreach signal.

Trend analysis by category. Collect a week of launches, group by categories, and count — a rough but real signal of where product activity is concentrating.

Pricing — exact numbers 💰

Pay-per-event: you pay for launches that land in your dataset, nothing beyond a small start fee.

Event	Price	What it covers
Actor start	$0.005	One-off warm-up charge per run
Result emitted	$0.002	Per row written to the dataset

Cost examples:

Fetch	Cost
20 launches (single run)	$0.045
50 launches (single run)	$0.105
1,000 launches	$2.005
10,000 launches	$20.005

Apify's $5 free trial credit covers your first ~2,490 launches with no credit card. The Product Hunt GraphQL API gives you votes and maker profiles but requires OAuth registration and approval; this Actor is the no-auth path for the RSS-level data, priced per result.

The technical interesting bit

Product Hunt's RSS title format is Product Name — Tagline with Dublin Core extensions (dc:creator for the maker, category tags for topics) that standard parsers handle inconsistently. We use feedparser and apply a three-separator split (" — ", " - ", ": ") to extract name and tagline reliably. The content_html field, when enabled, comes from the entry's content with a summary fallback — typically the full description the maker wrote at launch.

All fields are Pydantic v2 validated before the row is pushed. Optional fields (name, tagline, author, content_html, categories, published) are typed str | None — never silently defaulted to empty strings.

Limitations 🚧

RSS exposes a shallow window. The feed typically covers 20–50 of the most recent launches — roughly 1–2 days of activity. Launches older than that are not accessible via RSS; for full historical archives, the official Product Hunt API is the right tool.

No vote counts, no comment counts. These metrics live on the rendered HTML and in the GraphQL API, not in the RSS feed. This Actor reads only the feed.

No maker profile detail. The author field is the dc:creator value from RSS — usually the primary maker's name as a plain string, not a link or a structured object.

Category taxonomy is PH's. The categories array reflects whatever tags PH assigned at submission time. No normalisation is applied; expect occasional duplicates or inconsistent capitalisation.

maxResults ceiling. The input schema caps maxResults at 200, but the RSS feed itself rarely exposes more than 50 entries regardless of what you pass — the cap is a safeguard, not a guarantee of depth.

FAQ ❓

Is it legal to scrape Product Hunt's RSS feed?
The RSS feed is a public, unauthenticated endpoint Product Hunt publishes for syndication. This Actor reads only what the feed exposes — no login, no scraping of individual product pages, no personal data beyond the maker name PH itself includes. As always, check your own jurisdiction and intended use case before building on this data.

Does Product Hunt have an API I should use instead?
Yes — Product Hunt offers an official GraphQL API with votes, comments, and maker profiles. That API requires OAuth registration. This Actor is the no-auth, low-configuration path for the RSS-level data: name, tagline, maker, link, categories, published timestamp. Need votes or deep maker profiles? The official API is the right choice.

Can I export to CSV or push to Google Sheets?
Yes — export JSON, CSV, or Excel directly from the Apify Console after a run. To push to Google Sheets automatically, wire the Apify Webhook on ACTOR.RUN.SUCCEEDED into a Make or n8n scenario.

Why is name sometimes null?
Some PH launch titles don't follow the Name — Tagline format. When none of the three separators (" — ", " - ", ": ") appear in the title, name and tagline are both null and the raw title field still contains the full string.

Try it

The Actor is live at apify.com/DevilScrapes/producthunt-launches-scraper.

Free $5 trial credit, no credit card. Run it and you'll have today's PH launches in your dataset in under 30 seconds. Questions about a use case, a missing field, or an integration pattern? Leave a comment — I ship based on what people actually need.

External references:

Product Hunt RSS feed documentation — the public endpoint this Actor wraps
Apify Actor platform documentation — how Apify Actors work, storage, scheduling
feedparser library documentation — the RSS parsing library used under the hood

Built by Devil Scrapes — Apify Actors that do the dirty work so your dataset stays clean. 😈

DEV Community