3 Free Apify Actors for Scraping Bluesky, Substack, and Hacker News (No API Keys Needed)

#webscraping #scraping #tutorial #opensource

I built 3 free scrapers for platforms that researchers and developers commonly need data from. All use pay-per-event pricing (free until March 21), no API keys required.

If you've ever needed to pull data from Bluesky, Substack, or Hacker News, you know the drill: write a custom script, handle pagination, deal with rate limits, parse HTML. These three Apify Actors handle all of that out of the box.

1. Bluesky Scraper

Link: Bluesky Scraper on Apify Store

What it does: Scrapes posts, user profiles, and search results from Bluesky via the AT Protocol.

Why Bluesky: The AT Protocol is fully open — no authentication tokens needed for public data. With 30M+ users and growing, Bluesky is becoming a primary data source for social media researchers and trend analysts.

Example input:

{
  "searchTerms": ["web scraping", "data extraction"],
  "maxPosts": 100,
  "includeReplies": false
}

This pulls up to 100 posts matching your search terms. You can also scrape specific user profiles or full thread conversations.

2. Substack Scraper

Link: Substack Scraper on Apify Store

What it does: Scrapes newsletter posts, author metadata, and publication details from any public Substack.

Why Substack: Substack exposes an unofficial JSON API for public content — no auth required. This makes it straightforward to collect article text, subscriber counts, and publication metadata at scale.

Example input:

{
  "publicationUrls": [
    "https://platformer.news",
    "https://www.lennysnewsletter.com"
  ],
  "maxPostsPerPublication": 50
}

This scrapes the 50 most recent posts from each publication, including full article text, dates, likes, and author info.

3. Hacker News Scraper

Link: Hacker News Scraper on Apify Store

What it does: Scrapes stories, comments, and user profiles from Hacker News.

Why HN: Hacker News has an official Firebase API with no rate limits and no authentication. The scraper wraps this into a structured output with filtering, sorting, and comment threading built in.

Example input:

{
  "scrapeType": "search",
  "searchQuery": "LLM fine-tuning",
  "maxItems": 200,
  "includeComments": true
}

This searches HN for stories about LLM fine-tuning and includes the full comment trees — useful for sentiment analysis or finding expert opinions.

Why Use These vs. Building Your Own?

	DIY Script	Apify Actor
Setup time	Hours to days	Minutes
Pagination	You handle it	Built-in
Output format	Whatever you code	JSON, CSV, Excel, or direct to your DB
Scheduling	Cron jobs on your server	Built-in scheduler on Apify
Proxy rotation	You manage it	Handled automatically
Maintenance	You fix it when the site changes	Actor updates handle it

If you need a one-off data pull, a DIY script works. If you need recurring scrapes, structured output, or you just don't want to spend a day writing pagination logic, these Actors save real time.

Try Them Out

All three are live on the Apify Store with free trials:

Each Actor runs on pay-per-event pricing. You get results as structured JSON, ready for analysis, storage, or piping into your data pipeline.

If you have questions or feature requests, drop a comment or open an issue on the Actor page. Happy scraping.

Recommended Tools for Web Scraping

If you're building scrapers at scale, these tools can save you hours of dealing with proxies, CAPTCHAs, and rate limits:

ScraperAPI — Handles proxy rotation, browser rendering, and CAPTCHAs automatically. Great if you don't want to manage your own proxy infrastructure. Comes with 5,000 free API credits to get started.
ScrapeOps — A proxy aggregator that routes your requests through 20+ proxy providers and picks the best one for each target site. Useful when you need reliability across different domains.