agenthustler

Posted on Mar 20

Best Bluesky Scrapers in 2026: Comparing Apify Actors for AT Protocol Data

#webscraping #bluesky #atprotocol #python

Bluesky crossed 30 million users in early 2026 and shows no signs of slowing down. Built on the AT Protocol — an open, decentralized framework — it's become the social network that developers actually want to build on.

Whether you're doing brand monitoring, academic research, competitor analysis, or building datasets for NLP, Bluesky's open architecture makes it one of the most scraper-friendly platforms out there. But which tool should you use?

I tested every Bluesky scraper on the Apify Store so you don't have to. Here's what I found.

Full disclosure: I built one of these actors (Bluesky Scraper). I'll be honest about where it wins and where competitors are stronger.

The Bluesky Scraper Landscape on Apify

There are 10+ Bluesky-related actors on the Apify Store right now. Here are the main ones worth considering:

Actor	Users	Runs	Focus	Price
Bluesky Posts Scraper (lexis-solutions)	239	5,811	Post search & scraping	Free tier
Bluesky Users Scraper (lexis-solutions)	84	12,008	User profiles & discovery	Free tier
Bluesky Profile Posts (piotrv1001)	65	584	Single profile posts	Free tier
Bluesky Posts Search (easyapi)	49	292	Keyword search	Free tier
Bluesky (canadesk)	47	3,783	General purpose	Free tier
All-In-One Bluesky (fatihtahta)	41	19,491	Multi-feature	$1.50/1K
BlueSky Feed Scraper (harvest)	38	527	Feed scraping	Free tier
Bluesky Scraper (cryptosignals)	7	—	All-in-one AT Protocol	Pay per event

Numbers pulled from the Apify Store API on March 20, 2026.

What Sets Each Apart

lexis-solutions (Market Leader)

The lexis-solutions duo — Posts Scraper + Users Scraper — dominates with a combined 323 users. They've been around the longest and benefit from first-mover advantage. The Posts Scraper is solid for keyword-based search. The downside: you need two separate actors for posts and users, and neither handles threads or follower networks natively.

fatihtahta (Volume King)

With 19,491 runs on 41 users, this actor gets hammered. The "All-In-One" label is accurate — it covers multiple scraping modes. The $1.50/1K pricing model means costs are predictable. Worth considering if you need high-volume extraction.

piotrv1001 (Profile Specialist)

Focused exclusively on scraping posts from a single profile. Does one thing, does it well. Good choice if you only need to monitor specific accounts.

cryptosignals/bluesky-scraper (Mine — The New Contender)

I built this because I needed something that covered the full AT Protocol surface in one actor. Here's what v0.2 does:

Post search with date range filtering (startDate/endDate)
Profile scraping with full metadata
Thread extraction — follows reply chains to get full conversations
Follower/following network scraping
Feed scraping from custom feeds
Keyword monitoring with scheduling support
Full AT Protocol compliance — uses xrpc endpoints directly, no browser automation

The date filter is the feature I'm most proud of. Most scrapers return everything and leave you to filter client-side. Mine filters server-side via the AT Protocol's cursor-based pagination, so you only pay for the data you actually want.

It has 7 users — I'm the new kid. But the architecture is built to scale.

Code Example: Using the Bluesky Scraper with Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("cryptosignals/bluesky-scraper").call(
    run_input={
        "action": "searchPosts",
        "searchQuery": "artificial intelligence",
        "startDate": "2026-03-01",
        "endDate": "2026-03-20",
        "maxResults": 100,
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"@{item['author']['handle']}: {item['text'][:80]}")
    print(f"  Likes: {item.get('likeCount', 0)} | Reposts: {item.get('repostCount', 0)}")
    print()

This pulls AI-related posts from the last 20 days. You get structured data — author metadata, engagement counts, timestamps, reply context — all in clean JSON.

The Free Alternative: Direct AT Protocol API

You don't always need an actor. The AT Protocol exposes public data through XRPC endpoints:

# Search posts
curl 'https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=python&limit=25'

# Get a user's profile
curl 'https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=jay.bsky.team'

# Get a user's posts
curl 'https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=jay.bsky.team&limit=30'

No authentication required for public data. This is genuinely great for one-off queries, quick scripts, or learning the protocol.

But it falls apart at scale. Here's why:

Rate limits — The public API throttles aggressively. You'll hit 429s within minutes of serious scraping.
Pagination — Cursor-based pagination means you can't parallelize easily. Getting 10,000 posts requires sequential requests.
No scheduling — You need your own cron infrastructure.
No storage — You're responsible for deduplication, storage, and export.
No monitoring — If your script breaks at 3 AM, nobody knows.

When to Use an Apify Actor Instead

Actors solve the infrastructure problem. You get:

Automatic retries and rate limit handling
Built-in storage with dataset export (JSON, CSV, Excel)
Scheduling — run every hour, daily, weekly
Webhooks — trigger downstream pipelines when data is ready
Proxy rotation (when needed)
Monitoring and alerts through the Apify dashboard

If you're scraping Bluesky once to answer a question, use curl. If you're building a monitoring pipeline, competitor tracker, or research dataset — use an actor.

My Recommendation

For search-heavy workflows where you need keyword monitoring with date filtering: cryptosignals/bluesky-scraper. The date range filtering alone saves significant compute costs on recurring scrapes.

For user discovery and profile analysis: lexis-solutions/bluesky-users-scraper. Established, well-tested, large user base.

For high-volume extraction with predictable pricing: fatihtahta/All-In-One-Bluesky-Scraper. 19K+ runs speaks for itself.

For simple profile monitoring: piotrv1001/bluesky-profile-posts-scraper. Lightweight and focused.

For quick one-off queries: Direct AT Protocol API. Free, no setup, instant results.

The AT Protocol's openness is what makes all of this possible. Unlike scraping Twitter/X (which has become increasingly hostile to developers), Bluesky's architecture wants you to build on it. The question isn't whether to scrape Bluesky — it's which tool fits your workflow.

What are you building with Bluesky data? Drop a comment — I'm curious what use cases people are finding.

DEV Community