agenthustler

Posted on Mar 20

Best ProductHunt Scrapers in 2026: Get Launch Data Without the API Hassle

#webscraping #python #producthunt #api

If you track startup launches, scout for tools, or do competitive intelligence, ProductHunt is one of the richest data sources out there. Thousands of products launch every month with upvotes, descriptions, maker info, and community reactions.

The problem? Getting that data programmatically is harder than it should be.

Why Scrape ProductHunt?

A few common reasons people pull PH data:

Competitive intel — monitor what's launching in your category
VC/investor research — spot trending products and makers early
Tool discovery — find the best tools for a specific niche
Content research — track what's gaining traction for articles, newsletters, or social posts
Market analysis — understand launch frequency, category trends, and seasonal patterns

The API Problem

ProductHunt has an official API, but it comes with friction:

OAuth tokens required — you need to register an app and handle token flows
Rate limits — strict limits that make bulk collection painful
Incomplete data — some fields available on the website aren't exposed via the API
Maintenance burden — API versions change, tokens expire, scopes shift

For a one-off query, the API works. For ongoing monitoring or bulk collection, you'll spend more time fighting authentication than analyzing data.

Scraper Options Compared

Here's what's available on Apify right now for ProductHunt scraping:

Feature	ProductHunt Scraper (cryptosignals)	Competitor Actor
Today's launches	✅	✅
Search by keyword	✅	❌
Date-specific results	✅	❌
Product detail pages	✅	Partial
Apollo SSR parsing	✅	❌
Users	New	777
Reviews	—	2 (5.0★)
Modes	4 (today, search, date, product)	1

The established actor has a user base, but it covers a single use case: today's launches. If you need search, historical dates, or detailed product pages, it doesn't handle those.

Full disclosure: I built the cryptosignals actor. I'm biased, but the feature comparison is accurate — you can verify on both actor pages.

Deep Dive: ProductHunt Scraper (cryptosignals)

The actor has four modes:

1. Today's Launches

Pulls everything on the current front page — product name, tagline, description, vote count, topics, makers, links.

2. Search

Pass a keyword and get matching products. Useful for "find me all AI writing tools launched on PH."

3. Date-Specific

Pull launches from a specific date. Great for historical analysis — "what launched the same week as our competitor?"

4. Product Details

Pass a product URL or slug and get the full page data — description, media, maker profiles, related products.

How It Works

Instead of hitting the REST API, the actor parses ProductHunt's Apollo SSR state — the server-rendered GraphQL cache embedded in the page HTML. This gives you richer data than the public API, including fields that aren't in the official endpoints.

In testing, it extracted 53 products in 4.7 seconds. That's the full front page with complete metadata per product.

Code Example: Python

Here's how to use it with the Apify Python client:

from apify_client import ApifyClient

client = ApifyClient("your-apify-token")

# Get today's launches
run = client.actor("cryptosignals/producthunt-scraper").call(
    run_input={
        "mode": "today",
        "maxProducts": 50
    }
)

items = list(client.dataset(run["defaultDatasetId"]).iterate_items())

for product in items[:5]:
    print(f"{product['name']} — {product['tagline']}")
    print(f"  Votes: {product.get('votesCount', 'N/A')}")
    print(f"  Topics: {', '.join(product.get('topics', []))}")
    print()

Search by keyword:

run = client.actor("cryptosignals/producthunt-scraper").call(
    run_input={
        "mode": "search",
        "query": "developer tools",
        "maxProducts": 20
    }
)

results = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Found {len(results)} products matching 'developer tools'")

Use Case: Daily Competitor Monitoring

One practical setup: schedule the actor to run daily and track new launches in your category.

Create a scheduled run on Apify with mode: "today" — runs every morning
Filter results by topic or keyword in a post-processing step
Push to Slack/email using Apify's webhook integrations
Store in a dataset for trend analysis over time

This way you get a daily feed of "what launched in my space" without manual checking. The Apify scheduler handles retries and failures, so you don't babysit a cron job.

When Free curl Works (and When It Doesn't)

For simple, one-off checks you can curl ProductHunt directly:

curl -s 'https://www.producthunt.com' | grep -o '"name":"[^"]*"' | head -10

This breaks constantly. PH's HTML structure changes, JavaScript rendering hides data, and you get rate-limited fast.

Use curl when: you need one quick check and don't care about reliability.

Use a scraper actor when: you need structured data, monitoring over time, multiple search modes, or you're pulling more than a handful of products. The actor handles rendering, parsing, pagination, and retries — you just get clean JSON.

Conclusion

If you're doing anything beyond casual ProductHunt browsing, a dedicated scraper saves significant time. The API's auth overhead and limitations make it impractical for most data collection workflows.

I'd recommend the ProductHunt Scraper for anything involving search, date ranges, or scheduled monitoring — it covers use cases that the alternatives don't. For basic "grab today's front page," the established actor works too, though you'll miss the richer Apollo-parsed data.

Pick based on your use case. Both are on the Apify platform, so switching costs are minimal.

DEV Community