How I Built a Product Hunt Scraper That Tracks Launches in Real-Time
Every day, hundreds of new products launch on Product Hunt. Keeping track of them manually? That's a full-time job nobody wants.
So I built a Product Hunt Scraper that extracts every launch — names, taglines, upvotes, topics, makers, and media — into clean structured JSON. Let me show you how it works.
What It Does
The scraper pulls data from Product Hunt using multiple strategies:
- Atom Feed: Today's front-page launches with full metadata
- Topic Pages: Browse any topic (AI, SaaS, developer tools)
- Search: Find products matching any keyword
- Direct URLs: Scrape specific product pages for rich detail
What You Get
Each scraped product includes:
{
"name": "Termi Protocol",
"tagline": "Watch your AI coding agents build, live in 3D",
"description": "Real-time visualisation of AI agent workflows",
"url": "https://www.producthunt.com/products/termi-protocol",
"upvotes": 0,
"topics": ["artificial-intelligence", "developer-tools"],
"makers": [],
"media": [{"type": "thumbnail", "url": "https://..."}],
"launchedAt": "2026-07-03T14:14:15-07:00",
"pricingType": "free",
"slug": "termi-protocol"
}
Use Cases
- Lead Generation: Find new startups and their makers before competitors
- Market Research: Track competitor launches and pricing models
- Trend Analysis: Monitor which topics are getting the most upvotes
- Investment Scouting: Discover high-upvote products early
- Newsletter Content: Auto-generate daily roundups of top launches
How to Use It
Option 1: Apify Store
The actor is deployed on Apify — just enter your parameters and run:
- Set
scrapeFrontPage: truefor today's launches - Set
topic: "artificial-intelligence, saas"for topic scraping - Set
searchQuery: "AI agents"for keyword search - Set
maxProducts: 50to limit results
Option 2: RapidAPI
The same engine powers the Multi-Tool Content API on RapidAPI — integrate it directly into your apps with a simple REST call.
Option 3: Self-Host
Clone the GitHub repo and run it yourself:
pip install httpx beautifulsoup4 feedparser
python main.py
Under the Hood
The scraper uses a hybrid approach:
Atom Feed Parsing: Product Hunt publishes a daily Atom feed at
/feedwith all launches. This gives us reliable, structured data without rendering JavaScript.HTML Slug Extraction: For topic and search pages, we parse the server-rendered HTML to extract product slugs from
/products/links.Page Enrichment: Stub products get enriched by fetching individual product pages and extracting OpenGraph meta tags, JSON-LD structured data, and embedded vote counts.
Concurrent Fetching: All HTTP requests run concurrently with
httpx.AsyncClientandasyncio.gather, making it fast even for 50+ products.
Why I Built This
I run a portfolio of content extraction tools — RSS Feed Aggregator, llms.txt Generator, RO Business Scraper, and Sitemap Content Extractor. Product Hunt scraping was the natural next step for lead generation workflows.
All tools are available as APIs on RapidAPI and the source code is on GitHub.
What's Next
I'm working on:
- HN/Reddit Sentiment Analyzer for AI-powered trend detection
- Property Listing Scraper for real estate data
- Press Release Monitor for PR tracking
Follow along or star the repo to stay updated!
Found this useful? Check out the Apify Store for more scraping tools, or the RapidAPI listing for API access.
Top comments (0)