I built 3 free scrapers for platforms that researchers and developers commonly need data from. All use pay-per-event pricing (free until March 21), no API keys required.
If you've ever needed to pull data from Bluesky, Substack, or Hacker News, you know the drill: write a custom script, handle pagination, deal with rate limits, parse HTML. These three Apify Actors handle all of that out of the box.
1. Bluesky Scraper
Link: Bluesky Scraper on Apify Store
What it does: Scrapes posts, user profiles, and search results from Bluesky via the AT Protocol.
Why Bluesky: The AT Protocol is fully open — no authentication tokens needed for public data. With 30M+ users and growing, Bluesky is becoming a primary data source for social media researchers and trend analysts.
Example input:
{
"searchTerms": ["web scraping", "data extraction"],
"maxPosts": 100,
"includeReplies": false
}
This pulls up to 100 posts matching your search terms. You can also scrape specific user profiles or full thread conversations.
2. Substack Scraper
Link: Substack Scraper on Apify Store
What it does: Scrapes newsletter posts, author metadata, and publication details from any public Substack.
Why Substack: Substack exposes an unofficial JSON API for public content — no auth required. This makes it straightforward to collect article text, subscriber counts, and publication metadata at scale.
Example input:
{
"publicationUrls": [
"https://platformer.news",
"https://www.lennysnewsletter.com"
],
"maxPostsPerPublication": 50
}
This scrapes the 50 most recent posts from each publication, including full article text, dates, likes, and author info.
3. Hacker News Scraper
Link: Hacker News Scraper on Apify Store
What it does: Scrapes stories, comments, and user profiles from Hacker News.
Why HN: Hacker News has an official Firebase API with no rate limits and no authentication. The scraper wraps this into a structured output with filtering, sorting, and comment threading built in.
Example input:
{
"scrapeType": "search",
"searchQuery": "LLM fine-tuning",
"maxItems": 200,
"includeComments": true
}
This searches HN for stories about LLM fine-tuning and includes the full comment trees — useful for sentiment analysis or finding expert opinions.
Why Use These vs. Building Your Own?
| DIY Script | Apify Actor | |
|---|---|---|
| Setup time | Hours to days | Minutes |
| Pagination | You handle it | Built-in |
| Output format | Whatever you code | JSON, CSV, Excel, or direct to your DB |
| Scheduling | Cron jobs on your server | Built-in scheduler on Apify |
| Proxy rotation | You manage it | Handled automatically |
| Maintenance | You fix it when the site changes | Actor updates handle it |
If you need a one-off data pull, a DIY script works. If you need recurring scrapes, structured output, or you just don't want to spend a day writing pagination logic, these Actors save real time.
Try Them Out
All three are live on the Apify Store with free trials:
Each Actor runs on pay-per-event pricing. You get results as structured JSON, ready for analysis, storage, or piping into your data pipeline.
If you have questions or feature requests, drop a comment or open an issue on the Actor page. Happy scraping.
Recommended Tools for Web Scraping
If you're building scrapers at scale, these tools can save you hours of dealing with proxies, CAPTCHAs, and rate limits:
ScraperAPI — Handles proxy rotation, browser rendering, and CAPTCHAs automatically. Great if you don't want to manage your own proxy infrastructure. Comes with 5,000 free API credits to get started.
ScrapeOps — A proxy aggregator that routes your requests through 20+ proxy providers and picks the best one for each target site. Useful when you need reliability across different domains.
Top comments (0)