agenthustler

Posted on Mar 20 • Edited on Apr 19

Scraping Bluesky in 2026: AT Protocol Makes It Surprisingly Easy

#bluesky #python #webscraping #api

If you've been scraping Twitter/X lately, you know the pain. Rate limits, $42K/month API tiers, lawsuit threats. Meanwhile, Bluesky quietly built the most scraper-friendly social network on the planet.

Here's the thing: Bluesky runs on the AT Protocol, and public data requires zero authentication. No API keys. No OAuth dance. No terms-of-service landmines. Just HTTP requests to public endpoints.

I've been scraping Bluesky data for months now. Let me show you how it works.

The AT Protocol: Why It's Different

The AT Protocol (Authenticated Transfer Protocol) treats social data as a public utility. Every Bluesky user has a Personal Data Server (PDS) that hosts their posts, likes, and follows. The main PDS is bsky.social, but the protocol is federated — anyone can run one.

The public API lives at https://public.api.bsky.app. No auth headers needed for read operations. This isn't a loophole — it's by design.

Fetching a User's Profile

Let's start simple. Grab any public profile:

Python:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

JavaScript:

const handle = "jay.bsky.team";
const url = `https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=${handle}`;

const resp = await fetch(url);
const profile = await resp.json();

console.log(`Display name: ${profile.displayName}`);
console.log(`Followers: ${profile.followersCount}`);

That's it. No API key. No rate limit header. Just data.

Scraping Posts from a User's Feed

Want someone's recent posts? Use app.bsky.feed.getAuthorFeed:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

The API paginates with cursors. Each request returns up to 100 items, and you follow the cursor field to get more. Standard stuff.

Searching Posts

This is where it gets interesting. Bluesky exposes full-text search on public posts:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

JavaScript version:

async function searchPosts(query, limit = 100) {
  const params = new URLSearchParams({ q: query, limit: Math.min(limit, 100) });
  const url = `https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?${params}`;

  const resp = await fetch(url);
  const data = await resp.json();

  return data.posts.map(post => ({
    author: post.author.handle,
    text: post.record?.text || "",
    createdAt: post.record.createdAt,
    likes: post.likeCount || 0,
  }));
}

const results = await searchPosts("large language models");
results.slice(0, 5).forEach(p => console.log(`@${p.author}: ${p.text.slice(0, 100)}...`));

Think about what this means. You can monitor brand mentions, track trending topics, analyze sentiment — all without paying a cent for API access.

Useful Endpoints Cheat Sheet

Endpoint	What it does
`app.bsky.actor.getProfile`	User profile + stats
`app.bsky.actor.searchActors`	Search users by name
`app.bsky.feed.getAuthorFeed`	User's posts
`app.bsky.feed.searchPosts`	Full-text post search
`app.bsky.feed.getPostThread`	Post + replies
`app.bsky.graph.getFollowers`	User's followers
`app.bsky.graph.getFollows`	Who user follows

All endpoints use the base URL https://public.api.bsky.app/xrpc/.

Rate Limits and Being a Good Citizen

Bluesky doesn't enforce hard rate limits on the public API the way X does, but they do throttle aggressive scrapers. Some practical advice:

Add 200-500ms delays between requests. There's no rush.
Use cursors properly. Don't re-fetch pages you already have.
Cache aggressively. Profile data doesn't change every second.
Set a User-Agent header. Be identifiable, not suspicious.

If you need to scrape at scale — thousands of profiles or millions of posts — you'll want proper infrastructure. Proxy rotation, request queuing, and retry logic add up fast. Services like ScrapeOps can handle the proxy management piece so you can focus on the data.

When You Just Want the Data

Writing scrapers is fun, but sometimes you just need the output. I built a Bluesky Posts Scraper on Apify that handles pagination, rate limiting, and exports to JSON/CSV/Excel. Useful if you want to skip the code and go straight to analysis.

What Can You Do With Bluesky Data?

A few ideas I've seen people build:

Sentiment dashboards for crypto/stock tickers mentioned on Bluesky
Influencer discovery — find accounts growing fastest in a niche
Content research — what topics get the most engagement?
Academic research — Bluesky's open protocol makes it ideal for social network studies
Competitive monitoring — track what your competitors' audiences are saying

The Bottom Line

Bluesky is what Twitter's API used to be: open, accessible, and developer-friendly. The AT Protocol makes public data genuinely public, not locked behind a paywall.

If you're building anything that needs social media data in 2026, Bluesky should be your first stop. The data is there, the API is clean, and nobody's going to charge you $42,000 a month to access it.

Start with the code examples above. You'll have data flowing in 5 minutes.

DEV Community