If you track startup launches, scout for tools, or do competitive intelligence, ProductHunt is one of the richest data sources out there. Thousands of products launch every month with upvotes, descriptions, maker info, and community reactions.
The problem? Getting that data programmatically is harder than it should be.
Why Scrape ProductHunt?
A few common reasons people pull PH data:
- Competitive intel — monitor what's launching in your category
- VC/investor research — spot trending products and makers early
- Tool discovery — find the best tools for a specific niche
- Content research — track what's gaining traction for articles, newsletters, or social posts
- Market analysis — understand launch frequency, category trends, and seasonal patterns
The API Problem
ProductHunt has an official API, but it comes with friction:
- OAuth tokens required — you need to register an app and handle token flows
- Rate limits — strict limits that make bulk collection painful
- Incomplete data — some fields available on the website aren't exposed via the API
- Maintenance burden — API versions change, tokens expire, scopes shift
For a one-off query, the API works. For ongoing monitoring or bulk collection, you'll spend more time fighting authentication than analyzing data.
Scraper Options Compared
Here's what's available on Apify right now for ProductHunt scraping:
| Feature | ProductHunt Scraper (cryptosignals) | Competitor Actor |
|---|---|---|
| Today's launches | ✅ | ✅ |
| Search by keyword | ✅ | ❌ |
| Date-specific results | ✅ | ❌ |
| Product detail pages | ✅ | Partial |
| Apollo SSR parsing | ✅ | ❌ |
| Users | New | 777 |
| Reviews | — | 2 (5.0★) |
| Modes | 4 (today, search, date, product) | 1 |
The established actor has a user base, but it covers a single use case: today's launches. If you need search, historical dates, or detailed product pages, it doesn't handle those.
Full disclosure: I built the cryptosignals actor. I'm biased, but the feature comparison is accurate — you can verify on both actor pages.
Deep Dive: ProductHunt Scraper (cryptosignals)
The actor has four modes:
1. Today's Launches
Pulls everything on the current front page — product name, tagline, description, vote count, topics, makers, links.
2. Search
Pass a keyword and get matching products. Useful for "find me all AI writing tools launched on PH."
3. Date-Specific
Pull launches from a specific date. Great for historical analysis — "what launched the same week as our competitor?"
4. Product Details
Pass a product URL or slug and get the full page data — description, media, maker profiles, related products.
How It Works
Instead of hitting the REST API, the actor parses ProductHunt's Apollo SSR state — the server-rendered GraphQL cache embedded in the page HTML. This gives you richer data than the public API, including fields that aren't in the official endpoints.
In testing, it extracted 53 products in 4.7 seconds. That's the full front page with complete metadata per product.
Code Example: Python
Here's how to use it with the Apify Python client:
from apify_client import ApifyClient
client = ApifyClient("your-apify-token")
# Get today's launches
run = client.actor("cryptosignals/producthunt-scraper").call(
run_input={
"mode": "today",
"maxProducts": 50
}
)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
for product in items[:5]:
print(f"{product['name']} — {product['tagline']}")
print(f" Votes: {product.get('votesCount', 'N/A')}")
print(f" Topics: {', '.join(product.get('topics', []))}")
print()
Search by keyword:
run = client.actor("cryptosignals/producthunt-scraper").call(
run_input={
"mode": "search",
"query": "developer tools",
"maxProducts": 20
}
)
results = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Found {len(results)} products matching 'developer tools'")
Use Case: Daily Competitor Monitoring
One practical setup: schedule the actor to run daily and track new launches in your category.
-
Create a scheduled run on Apify with
mode: "today"— runs every morning - Filter results by topic or keyword in a post-processing step
- Push to Slack/email using Apify's webhook integrations
- Store in a dataset for trend analysis over time
This way you get a daily feed of "what launched in my space" without manual checking. The Apify scheduler handles retries and failures, so you don't babysit a cron job.
When Free curl Works (and When It Doesn't)
For simple, one-off checks you can curl ProductHunt directly:
curl -s 'https://www.producthunt.com' | grep -o '"name":"[^"]*"' | head -10
This breaks constantly. PH's HTML structure changes, JavaScript rendering hides data, and you get rate-limited fast.
Use curl when: you need one quick check and don't care about reliability.
Use a scraper actor when: you need structured data, monitoring over time, multiple search modes, or you're pulling more than a handful of products. The actor handles rendering, parsing, pagination, and retries — you just get clean JSON.
Conclusion
If you're doing anything beyond casual ProductHunt browsing, a dedicated scraper saves significant time. The API's auth overhead and limitations make it impractical for most data collection workflows.
I'd recommend the ProductHunt Scraper for anything involving search, date ranges, or scheduled monitoring — it covers use cases that the alternatives don't. For basic "grab today's front page," the established actor works too, though you'll miss the richer Apollo-parsed data.
Pick based on your use case. Both are on the Apify platform, so switching costs are minimal.
Top comments (0)