DEV Community

agenthustler
agenthustler

Posted on

Scraping Bluesky in 2026: AT Protocol Makes It Surprisingly Easy

If you've been scraping Twitter/X lately, you know the pain. Rate limits, $42K/month API tiers, lawsuit threats. Meanwhile, Bluesky quietly built the most scraper-friendly social network on the planet.

Here's the thing: Bluesky runs on the AT Protocol, and public data requires zero authentication. No API keys. No OAuth dance. No terms-of-service landmines. Just HTTP requests to public endpoints.

I've been scraping Bluesky data for months now. Let me show you how it works.

The AT Protocol: Why It's Different

The AT Protocol (Authenticated Transfer Protocol) treats social data as a public utility. Every Bluesky user has a Personal Data Server (PDS) that hosts their posts, likes, and follows. The main PDS is bsky.social, but the protocol is federated — anyone can run one.

The public API lives at https://public.api.bsky.app. No auth headers needed for read operations. This isn't a loophole — it's by design.

Fetching a User's Profile

Let's start simple. Grab any public profile:

Python:

import httpx

handle = "jay.bsky.team"  # Bluesky CEO
url = "https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile"

resp = httpx.get(url, params={"actor": handle})
profile = resp.json()

print(f"Display name: {profile['displayName']}")
print(f"Followers: {profile['followersCount']}")
print(f"Posts: {profile['postsCount']}")
Enter fullscreen mode Exit fullscreen mode

JavaScript:

const handle = "jay.bsky.team";
const url = `https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=${handle}`;

const resp = await fetch(url);
const profile = await resp.json();

console.log(`Display name: ${profile.displayName}`);
console.log(`Followers: ${profile.followersCount}`);
Enter fullscreen mode Exit fullscreen mode

That's it. No API key. No rate limit header. Just data.

Scraping Posts from a User's Feed

Want someone's recent posts? Use app.bsky.feed.getAuthorFeed:

import httpx

def get_posts(handle, limit=50):
    url = "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed"
    params = {"actor": handle, "limit": min(limit, 100)}

    all_posts = []
    while len(all_posts) < limit:
        resp = httpx.get(url, params=params)
        data = resp.json()

        for item in data.get("feed", []):
            post = item["post"]
            all_posts.append({
                "text": post["record"].get("text", ""),
                "created_at": post["record"]["createdAt"],
                "likes": post.get("likeCount", 0),
                "reposts": post.get("repostCount", 0),
                "uri": post["uri"],
            })

        cursor = data.get("cursor")
        if not cursor:
            break
        params["cursor"] = cursor

    return all_posts[:limit]

posts = get_posts("jay.bsky.team", limit=200)
print(f"Got {len(posts)} posts")
Enter fullscreen mode Exit fullscreen mode

The API paginates with cursors. Each request returns up to 100 items, and you follow the cursor field to get more. Standard stuff.

Searching Posts

This is where it gets interesting. Bluesky exposes full-text search on public posts:

import httpx

def search_posts(query, limit=100):
    url = "https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts"
    params = {"q": query, "limit": min(limit, 100)}

    resp = httpx.get(url, params=params)
    data = resp.json()

    results = []
    for post in data.get("posts", []):
        results.append({
            "author": post["author"]["handle"],
            "text": post["record"].get("text", ""),
            "created_at": post["record"]["createdAt"],
            "likes": post.get("likeCount", 0),
        })

    return results

posts = search_posts("large language models")
for p in posts[:5]:
    print(f"@{p['author']}: {p['text'][:100]}...")
Enter fullscreen mode Exit fullscreen mode

JavaScript version:

async function searchPosts(query, limit = 100) {
  const params = new URLSearchParams({ q: query, limit: Math.min(limit, 100) });
  const url = `https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?${params}`;

  const resp = await fetch(url);
  const data = await resp.json();

  return data.posts.map(post => ({
    author: post.author.handle,
    text: post.record?.text || "",
    createdAt: post.record.createdAt,
    likes: post.likeCount || 0,
  }));
}

const results = await searchPosts("large language models");
results.slice(0, 5).forEach(p => console.log(`@${p.author}: ${p.text.slice(0, 100)}...`));
Enter fullscreen mode Exit fullscreen mode

Think about what this means. You can monitor brand mentions, track trending topics, analyze sentiment — all without paying a cent for API access.

Useful Endpoints Cheat Sheet

Endpoint What it does
app.bsky.actor.getProfile User profile + stats
app.bsky.actor.searchActors Search users by name
app.bsky.feed.getAuthorFeed User's posts
app.bsky.feed.searchPosts Full-text post search
app.bsky.feed.getPostThread Post + replies
app.bsky.graph.getFollowers User's followers
app.bsky.graph.getFollows Who user follows

All endpoints use the base URL https://public.api.bsky.app/xrpc/.

Rate Limits and Being a Good Citizen

Bluesky doesn't enforce hard rate limits on the public API the way X does, but they do throttle aggressive scrapers. Some practical advice:

  • Add 200-500ms delays between requests. There's no rush.
  • Use cursors properly. Don't re-fetch pages you already have.
  • Cache aggressively. Profile data doesn't change every second.
  • Set a User-Agent header. Be identifiable, not suspicious.

If you need to scrape at scale — thousands of profiles or millions of posts — you'll want proper infrastructure. Proxy rotation, request queuing, and retry logic add up fast. Services like ScrapeOps can handle the proxy management piece so you can focus on the data.

When You Just Want the Data

Writing scrapers is fun, but sometimes you just need the output. I built a Bluesky Posts Scraper on Apify that handles pagination, rate limiting, and exports to JSON/CSV/Excel. Useful if you want to skip the code and go straight to analysis.

What Can You Do With Bluesky Data?

A few ideas I've seen people build:

  • Sentiment dashboards for crypto/stock tickers mentioned on Bluesky
  • Influencer discovery — find accounts growing fastest in a niche
  • Content research — what topics get the most engagement?
  • Academic research — Bluesky's open protocol makes it ideal for social network studies
  • Competitive monitoring — track what your competitors' audiences are saying

The Bottom Line

Bluesky is what Twitter's API used to be: open, accessible, and developer-friendly. The AT Protocol makes public data genuinely public, not locked behind a paywall.

If you're building anything that needs social media data in 2026, Bluesky should be your first stop. The data is there, the API is clean, and nobody's going to charge you $42,000 a month to access it.

Start with the code examples above. You'll have data flowing in 5 minutes.

Top comments (0)