If you've been scraping Twitter/X lately, you know the pain. Rate limits, $42K/month API tiers, lawsuit threats. Meanwhile, Bluesky quietly built the most scraper-friendly social network on the planet.
Here's the thing: Bluesky runs on the AT Protocol, and public data requires zero authentication. No API keys. No OAuth dance. No terms-of-service landmines. Just HTTP requests to public endpoints.
I've been scraping Bluesky data for months now. Let me show you how it works.
The AT Protocol: Why It's Different
The AT Protocol (Authenticated Transfer Protocol) treats social data as a public utility. Every Bluesky user has a Personal Data Server (PDS) that hosts their posts, likes, and follows. The main PDS is bsky.social, but the protocol is federated — anyone can run one.
The public API lives at https://public.api.bsky.app. No auth headers needed for read operations. This isn't a loophole — it's by design.
Fetching a User's Profile
Let's start simple. Grab any public profile:
Python:
import httpx
handle = "jay.bsky.team" # Bluesky CEO
url = "https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile"
resp = httpx.get(url, params={"actor": handle})
profile = resp.json()
print(f"Display name: {profile['displayName']}")
print(f"Followers: {profile['followersCount']}")
print(f"Posts: {profile['postsCount']}")
JavaScript:
const handle = "jay.bsky.team";
const url = `https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=${handle}`;
const resp = await fetch(url);
const profile = await resp.json();
console.log(`Display name: ${profile.displayName}`);
console.log(`Followers: ${profile.followersCount}`);
That's it. No API key. No rate limit header. Just data.
Scraping Posts from a User's Feed
Want someone's recent posts? Use app.bsky.feed.getAuthorFeed:
import httpx
def get_posts(handle, limit=50):
url = "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed"
params = {"actor": handle, "limit": min(limit, 100)}
all_posts = []
while len(all_posts) < limit:
resp = httpx.get(url, params=params)
data = resp.json()
for item in data.get("feed", []):
post = item["post"]
all_posts.append({
"text": post["record"].get("text", ""),
"created_at": post["record"]["createdAt"],
"likes": post.get("likeCount", 0),
"reposts": post.get("repostCount", 0),
"uri": post["uri"],
})
cursor = data.get("cursor")
if not cursor:
break
params["cursor"] = cursor
return all_posts[:limit]
posts = get_posts("jay.bsky.team", limit=200)
print(f"Got {len(posts)} posts")
The API paginates with cursors. Each request returns up to 100 items, and you follow the cursor field to get more. Standard stuff.
Searching Posts
This is where it gets interesting. Bluesky exposes full-text search on public posts:
import httpx
def search_posts(query, limit=100):
url = "https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts"
params = {"q": query, "limit": min(limit, 100)}
resp = httpx.get(url, params=params)
data = resp.json()
results = []
for post in data.get("posts", []):
results.append({
"author": post["author"]["handle"],
"text": post["record"].get("text", ""),
"created_at": post["record"]["createdAt"],
"likes": post.get("likeCount", 0),
})
return results
posts = search_posts("large language models")
for p in posts[:5]:
print(f"@{p['author']}: {p['text'][:100]}...")
JavaScript version:
async function searchPosts(query, limit = 100) {
const params = new URLSearchParams({ q: query, limit: Math.min(limit, 100) });
const url = `https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?${params}`;
const resp = await fetch(url);
const data = await resp.json();
return data.posts.map(post => ({
author: post.author.handle,
text: post.record?.text || "",
createdAt: post.record.createdAt,
likes: post.likeCount || 0,
}));
}
const results = await searchPosts("large language models");
results.slice(0, 5).forEach(p => console.log(`@${p.author}: ${p.text.slice(0, 100)}...`));
Think about what this means. You can monitor brand mentions, track trending topics, analyze sentiment — all without paying a cent for API access.
Useful Endpoints Cheat Sheet
| Endpoint | What it does |
|---|---|
app.bsky.actor.getProfile |
User profile + stats |
app.bsky.actor.searchActors |
Search users by name |
app.bsky.feed.getAuthorFeed |
User's posts |
app.bsky.feed.searchPosts |
Full-text post search |
app.bsky.feed.getPostThread |
Post + replies |
app.bsky.graph.getFollowers |
User's followers |
app.bsky.graph.getFollows |
Who user follows |
All endpoints use the base URL https://public.api.bsky.app/xrpc/.
Rate Limits and Being a Good Citizen
Bluesky doesn't enforce hard rate limits on the public API the way X does, but they do throttle aggressive scrapers. Some practical advice:
- Add 200-500ms delays between requests. There's no rush.
- Use cursors properly. Don't re-fetch pages you already have.
- Cache aggressively. Profile data doesn't change every second.
- Set a User-Agent header. Be identifiable, not suspicious.
If you need to scrape at scale — thousands of profiles or millions of posts — you'll want proper infrastructure. Proxy rotation, request queuing, and retry logic add up fast. Services like ScrapeOps can handle the proxy management piece so you can focus on the data.
When You Just Want the Data
Writing scrapers is fun, but sometimes you just need the output. I built a Bluesky Posts Scraper on Apify that handles pagination, rate limiting, and exports to JSON/CSV/Excel. Useful if you want to skip the code and go straight to analysis.
What Can You Do With Bluesky Data?
A few ideas I've seen people build:
- Sentiment dashboards for crypto/stock tickers mentioned on Bluesky
- Influencer discovery — find accounts growing fastest in a niche
- Content research — what topics get the most engagement?
- Academic research — Bluesky's open protocol makes it ideal for social network studies
- Competitive monitoring — track what your competitors' audiences are saying
The Bottom Line
Bluesky is what Twitter's API used to be: open, accessible, and developer-friendly. The AT Protocol makes public data genuinely public, not locked behind a paywall.
If you're building anything that needs social media data in 2026, Bluesky should be your first stop. The data is there, the API is clean, and nobody's going to charge you $42,000 a month to access it.
Start with the code examples above. You'll have data flowing in 5 minutes.
Top comments (0)