Why Scrape Bluesky?
Bluesky has grown to over 30 million users, built on the open AT Protocol. Unlike X/Twitter which has locked down its API behind expensive tiers, Bluesky's public API is completely free to use — no API key, no OAuth, no rate limit headaches.
This makes it a goldmine for:
- Social media monitoring — track brand mentions and sentiment
- Lead generation — find people discussing topics relevant to your business
- Market research — analyze conversations and trending topics
- Academic research — study public discourse on decentralized social media
In this tutorial, I'll show you how to scrape Bluesky posts and profiles at scale using an Apify actor I built.
The Easy Way: Use the Apify Actor
I published a free Bluesky Scraper on Apify that handles everything — pagination, data extraction, and export. Here's how to use it.
Step 1: Set Up Your Input
The actor accepts a simple JSON input:
{
"searchQuery": "artificial intelligence",
"handles": ["jay.bsky.team", "bsky.app"],
"maxItems": 200,
"scrapeType": "both"
}
Parameters:
-
searchQuery— keyword to search posts or profiles for -
handles— specific Bluesky handles to scrape -
maxItems— maximum results (up to 10,000) -
scrapeType—posts,profiles, orboth
Step 2: Run It
You can run it from the Apify Console UI, or programmatically:
JavaScript:
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('cryptosignals/bluesky-scraper').call({
searchQuery: 'startup funding',
maxItems: 500,
scrapeType: 'posts'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} posts`);
Python:
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/bluesky-scraper').call(run_input={
'searchQuery': 'startup funding',
'maxItems': 500,
'scrapeType': 'posts'
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(f'Got {len(items)} posts')
Step 3: Export Your Data
Every run outputs structured data you can export as JSON, CSV, or Excel directly from Apify.
Post data includes:
- Author handle, display name, and DID
- Full post text
- Engagement metrics: likes, reposts, replies, quotes, bookmarks
- Timestamps (created and indexed)
- Direct URL to the post
- Embedded images and external links
- Language detection
Profile data includes:
- Handle, display name, bio
- Follower/following/post counts
- Avatar and banner URLs
- Account creation date
The DIY Way: AT Protocol API Basics
If you want to build your own scraper, here are the key endpoints. The AT Protocol public API at https://public.api.bsky.app/xrpc requires no authentication:
Search Posts
curl "https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=AI&limit=25"
Get User Profile
curl "https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=jay.bsky.team"
Get Author Feed
curl "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=bsky.app&limit=50"
Search Users
curl "https://public.api.bsky.app/xrpc/app.bsky.actor.searchActors?q=developer&limit=25"
All endpoints return JSON and support cursor-based pagination. The cursor field in the response tells you how to fetch the next page.
Tips for Scraping at Scale
Respect rate limits. The public API is generous but not unlimited. Add small delays between requests if scraping thousands of items.
Use pagination cursors. Don't try to get everything in one request. Loop through pages using the cursor.
Handle errors gracefully. The API occasionally returns 429 (rate limit) or 502 (server error). Implement exponential backoff.
Filter by language. Posts include a
langsfield — use it to focus on content in your target language.Store data incrementally. Push results to your dataset as you go rather than collecting everything in memory.
Wrapping Up
Bluesky's open AT Protocol makes it one of the easiest social networks to scrape legally and ethically. Whether you use the Apify actor for a no-code solution or build your own scraper with the public API, you can extract valuable social data in minutes.
If you found this useful, give the actor a star on Apify — it helps others discover it.
Have questions about scraping Bluesky or the AT Protocol? Drop a comment below.
Top comments (0)