DEV Community

Oaida Adrian
Oaida Adrian

Posted on • Originally published at apify.com

How I Built a Product Hunt Scraper That Tracks Launches in Real-Time

How I Built a Product Hunt Scraper That Tracks Launches in Real-Time

Every day, hundreds of new products launch on Product Hunt. Keeping track of them manually? That's a full-time job nobody wants.

So I built a Product Hunt Scraper that extracts every launch — names, taglines, upvotes, topics, makers, and media — into clean structured JSON. Let me show you how it works.

What It Does

The scraper pulls data from Product Hunt using multiple strategies:

  • Atom Feed: Today's front-page launches with full metadata
  • Topic Pages: Browse any topic (AI, SaaS, developer tools)
  • Search: Find products matching any keyword
  • Direct URLs: Scrape specific product pages for rich detail

What You Get

Each scraped product includes:

{
  "name": "Termi Protocol",
  "tagline": "Watch your AI coding agents build, live in 3D",
  "description": "Real-time visualisation of AI agent workflows",
  "url": "https://www.producthunt.com/products/termi-protocol",
  "upvotes": 0,
  "topics": ["artificial-intelligence", "developer-tools"],
  "makers": [],
  "media": [{"type": "thumbnail", "url": "https://..."}],
  "launchedAt": "2026-07-03T14:14:15-07:00",
  "pricingType": "free",
  "slug": "termi-protocol"
}
Enter fullscreen mode Exit fullscreen mode

Use Cases

  • Lead Generation: Find new startups and their makers before competitors
  • Market Research: Track competitor launches and pricing models
  • Trend Analysis: Monitor which topics are getting the most upvotes
  • Investment Scouting: Discover high-upvote products early
  • Newsletter Content: Auto-generate daily roundups of top launches

How to Use It

Option 1: Apify Store

The actor is deployed on Apify — just enter your parameters and run:

  • Set scrapeFrontPage: true for today's launches
  • Set topic: "artificial-intelligence, saas" for topic scraping
  • Set searchQuery: "AI agents" for keyword search
  • Set maxProducts: 50 to limit results

Option 2: RapidAPI

The same engine powers the Multi-Tool Content API on RapidAPI — integrate it directly into your apps with a simple REST call.

Option 3: Self-Host

Clone the GitHub repo and run it yourself:

pip install httpx beautifulsoup4 feedparser
python main.py
Enter fullscreen mode Exit fullscreen mode

Under the Hood

The scraper uses a hybrid approach:

  1. Atom Feed Parsing: Product Hunt publishes a daily Atom feed at /feed with all launches. This gives us reliable, structured data without rendering JavaScript.

  2. HTML Slug Extraction: For topic and search pages, we parse the server-rendered HTML to extract product slugs from /products/ links.

  3. Page Enrichment: Stub products get enriched by fetching individual product pages and extracting OpenGraph meta tags, JSON-LD structured data, and embedded vote counts.

  4. Concurrent Fetching: All HTTP requests run concurrently with httpx.AsyncClient and asyncio.gather, making it fast even for 50+ products.

Why I Built This

I run a portfolio of content extraction tools — RSS Feed Aggregator, llms.txt Generator, RO Business Scraper, and Sitemap Content Extractor. Product Hunt scraping was the natural next step for lead generation workflows.

All tools are available as APIs on RapidAPI and the source code is on GitHub.

What's Next

I'm working on:

  • HN/Reddit Sentiment Analyzer for AI-powered trend detection
  • Property Listing Scraper for real estate data
  • Press Release Monitor for PR tracking

Follow along or star the repo to stay updated!


Found this useful? Check out the Apify Store for more scraping tools, or the RapidAPI listing for API access.

Top comments (0)