Oaida Adrian

Posted on Jul 4 • Originally published at apify.com

How I Built a Product Hunt Scraper That Tracks Launches in Real-Time

#webdev #python #automation

How I Built a Product Hunt Scraper That Tracks Launches in Real-Time

Every day, hundreds of new products launch on Product Hunt. Keeping track of them manually? That's a full-time job nobody wants.

So I built a Product Hunt Scraper that extracts every launch — names, taglines, upvotes, topics, makers, and media — into clean structured JSON. Let me show you how it works.

What It Does

The scraper pulls data from Product Hunt using multiple strategies:

Atom Feed: Today's front-page launches with full metadata
Topic Pages: Browse any topic (AI, SaaS, developer tools)
Search: Find products matching any keyword
Direct URLs: Scrape specific product pages for rich detail

What You Get

Each scraped product includes:

{
  "name": "Termi Protocol",
  "tagline": "Watch your AI coding agents build, live in 3D",
  "description": "Real-time visualisation of AI agent workflows",
  "url": "https://www.producthunt.com/products/termi-protocol",
  "upvotes": 0,
  "topics": ["artificial-intelligence", "developer-tools"],
  "makers": [],
  "media": [{"type": "thumbnail", "url": "https://..."}],
  "launchedAt": "2026-07-03T14:14:15-07:00",
  "pricingType": "free",
  "slug": "termi-protocol"
}

Use Cases

Lead Generation: Find new startups and their makers before competitors
Market Research: Track competitor launches and pricing models
Trend Analysis: Monitor which topics are getting the most upvotes
Investment Scouting: Discover high-upvote products early
Newsletter Content: Auto-generate daily roundups of top launches

How to Use It

Option 1: Apify Store

The actor is deployed on Apify — just enter your parameters and run:

Set scrapeFrontPage: true for today's launches
Set topic: "artificial-intelligence, saas" for topic scraping
Set searchQuery: "AI agents" for keyword search
Set maxProducts: 50 to limit results

Option 2: RapidAPI

The same engine powers the Multi-Tool Content API on RapidAPI — integrate it directly into your apps with a simple REST call.

Option 3: Self-Host

Clone the GitHub repo and run it yourself:

pip install httpx beautifulsoup4 feedparser
python main.py

Under the Hood

The scraper uses a hybrid approach:

Atom Feed Parsing: Product Hunt publishes a daily Atom feed at /feed with all launches. This gives us reliable, structured data without rendering JavaScript.
HTML Slug Extraction: For topic and search pages, we parse the server-rendered HTML to extract product slugs from /products/ links.
Page Enrichment: Stub products get enriched by fetching individual product pages and extracting OpenGraph meta tags, JSON-LD structured data, and embedded vote counts.
Concurrent Fetching: All HTTP requests run concurrently with httpx.AsyncClient and asyncio.gather, making it fast even for 50+ products.

Why I Built This

I run a portfolio of content extraction tools — RSS Feed Aggregator, llms.txt Generator, RO Business Scraper, and Sitemap Content Extractor. Product Hunt scraping was the natural next step for lead generation workflows.

All tools are available as APIs on RapidAPI and the source code is on GitHub.

What's Next

I'm working on:

HN/Reddit Sentiment Analyzer for AI-powered trend detection
Property Listing Scraper for real estate data
Press Release Monitor for PR tracking

Follow along or star the repo to stay updated!

Found this useful? Check out the Apify Store for more scraping tools, or the RapidAPI listing for API access.

DEV Community

How I Built a Product Hunt Scraper That Tracks Launches in Real-Time

How I Built a Product Hunt Scraper That Tracks Launches in Real-Time

What It Does

What You Get

Use Cases

How to Use It

Option 1: Apify Store

Option 2: RapidAPI

Option 3: Self-Host

Under the Hood

Why I Built This

What's Next

Top comments (0)