How to Scrape Bluesky (Posts, Profiles & Followers) — Python + No-Code

#webscraping #bluesky #python #api

Everyone's building Instagram and TikTok scrapers — and competing with hundreds of
them. Meanwhile Bluesky, the fastest-growing open social network (30M+ users and
climbing), is wide open: its data is genuinely easy to get, and almost nobody is
serving it yet. If you want fresh social data for monitoring, research or lead-gen,
Bluesky is the most underrated source on the internet right now.

The reason it's easy: Bluesky runs on the AT Protocol, and most of its read
endpoints are public — no login, no API key, no scraping tricks.

Why scrape Bluesky?

Open by design — public posts, profiles and follower graphs come straight from a clean JSON API. No headless browser, no residential proxy.
Migration goldmine — accounts and audiences are moving from X/Twitter to Bluesky; tracking that shift is valuable to brands and researchers.
Low competition — the leading Bluesky actor on Apify has only a few hundred users, versus hundreds of thousands for TikTok. The demand is rising and the supply is thin.

A user's posts (Python, no auth)

Every public endpoint lives under https://public.api.bsky.app/xrpc/. To pull a
user's recent posts:

import httpx

handle = "bsky.app"   # or any handle / DID
url = "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed"
r = httpx.get(url, params={"actor": handle, "limit": 50}, timeout=30)
for item in r.json()["feed"]:
    post = item["post"]
    print(post["record"]["text"], "—", post["likeCount"], "likes")

That's it — no token. Profiles (app.bsky.actor.getProfile), followers
(app.bsky.graph.getFollowers) and follows (app.bsky.graph.getFollows) work the
same way.

Keyword / hashtag search

Search is the one endpoint Bluesky gates more tightly. The cleanest approach is to
authenticate with a free app password (Settings → App Passwords — never your main
password), get a token, then call searchPosts:

import httpx

s = httpx.post("https://bsky.social/xrpc/com.atproto.server.createSession",
               json={"identifier": "you.bsky.social", "password": "xxxx-xxxx-xxxx-xxxx"}).json()
headers = {"Authorization": f"Bearer {s['accessJwt']}"}
r = httpx.get("https://bsky.social/xrpc/app.bsky.feed.searchPosts",
              params={"q": "#bitcoin", "sort": "latest", "limit": 50}, headers=headers)

The catch: pagination, parsing & scale

One call is easy. A useful dataset means paginating with cursors, flattening the
nested post objects into clean rows (text, likes, reposts, replies, images, links,
author), handling reposts vs. originals, and politely backing off on rate limits —
across many accounts or search terms. That's the part worth automating.

The no-code option

The Bluesky Scraper on Apify
does all of it: pick a mode (posts, profile, followers, follows, or keyword search),
give it handles or search terms, click Run.

{
  "mode": "posts",
  "handles": ["bsky.app", "nytimes.com"],
  "maxItems": 500
}

Output is one clean row per post — text, created date, language, like/repost/reply/
quote counts, images, external link, and author info — ready for a spreadsheet, a
database, or an LLM. No proxy needed, so it's fast and cheap.

Common use cases

Social listening — track a brand, product or topic across the fastest-growing network.
Influencer & audience research — profile stats and follower graphs.
Trend & sentiment analysis — pipe keyword streams into an LLM.
Lead generation — find accounts posting about a problem you solve.

FAQ

Do I need a Bluesky account? No for posts, profiles and followers. Keyword search
works best with a free app password.

What's a DID? A Bluesky account's permanent ID (did:plc:...). You can pass a
handle or a DID anywhere.

Is it legal? You're reading publicly available data via Bluesky's own public API.
Use it responsibly and within Bluesky's terms.

Building something with social data? The Bluesky Scraper handles the API so you can focus on the product.

DEV Community