I built 3 free scrapers for platforms that researchers and developers commonly need data from. All use pay-per-event pricing (free until March 21), no API keys required.
If you've ever needed to pull data from Bluesky, Substack, or Hacker News, you know the drill: write a custom script, handle pagination, deal with rate limits, parse HTML. These three Apify Actors handle all of that out of the box.
1. Bluesky Scraper
Link: Bluesky Scraper on Apify Store
What it does: Scrapes posts, user profiles, and search results from Bluesky via the AT Protocol.
Why Bluesky: The AT Protocol is fully open — no authentication tokens needed for public data. With 30M+ users and growing, Bluesky is becoming a primary data source for social media researchers and trend analysts.
Example input:
{
"searchTerms": ["web scraping", "data extraction"],
"maxPosts": 100,
"includeReplies": false
}
This pulls up to 100 posts matching your search terms. You can also scrape specific user profiles or full thread conversations.
2. Substack Scraper
Link: Substack Scraper on Apify Store
What it does: Scrapes newsletter posts, author metadata, and publication details from any public Substack.
Why Substack: Substack exposes an unofficial JSON API for public content — no auth required. This makes it straightforward to collect article text, subscriber counts, and publication metadata at scale.
Example input:
{
"publicationUrls": [
"https://platformer.news",
"https://www.lennysnewsletter.com"
],
"maxPostsPerPublication": 50
}
This scrapes the 50 most recent posts from each publication, including full article text, dates, likes, and author info.
3. Hacker News Scraper
Link: Hacker News Scraper on Apify Store
What it does: Scrapes stories, comments, and user profiles from Hacker News.
Why HN: Hacker News has an official Firebase API with no rate limits and no authentication. The scraper wraps this into a structured output with filtering, sorting, and comment threading built in.
Example input:
{
"scrapeType": "search",
"searchQuery": "LLM fine-tuning",
"maxItems": 200,
"includeComments": true
}
This searches HN for stories about LLM fine-tuning and includes the full comment trees — useful for sentiment analysis or finding expert opinions.
Why Use These vs. Building Your Own?
| DIY Script | Apify Actor | |
|---|---|---|
| Setup time | Hours to days | Minutes |
| Pagination | You handle it | Built-in |
| Output format | Whatever you code | JSON, CSV, Excel, or direct to your DB |
| Scheduling | Cron jobs on your server | Built-in scheduler on Apify |
| Proxy rotation | You manage it | Handled automatically |
| Maintenance | You fix it when the site changes | Actor updates handle it |
If you need a one-off data pull, a DIY script works. If you need recurring scrapes, structured output, or you just don't want to spend a day writing pagination logic, these Actors save real time.
Try Them Out
All three are live on the Apify Store with free trials:
Each Actor runs on pay-per-event pricing. You get results as structured JSON, ready for analysis, storage, or piping into your data pipeline.
If you have questions or feature requests, drop a comment or open an issue on the Actor page. Happy scraping.
Top comments (0)