DEV Community

Cover image for Bluesky Starter Pack Scraper: export any community list for $2.05/1K
Devil Scrapes
Devil Scrapes

Posted on

Bluesky Starter Pack Scraper: export any community list for $2.05/1K

Quick answer: A Bluesky starter pack is a curated community list published on the AT Protocol. There is no official export button or bulk API endpoint for reading pack membership programmatically. A Bluesky starter pack scraper queries the public AT Protocol AppView (https://public.api.bsky.app/xrpc/) and returns every member's profile — handle, DID, follower count, post count — as clean, typed JSON. The Apify Actor below does it for $0.002 per member row (~$2.05 per 1,000 members), handling pagination, retries, and URL normalization for you.

Starter packs drove a reported 43% of all follows during Bluesky's 2024 growth surge (EurekAlert, 2024). That number matters: when someone publishes "ML Researchers on Bluesky" or "London Founders" as a starter pack, they are not just recommending accounts — they are shaping the social graph of an entire professional niche. For researchers, growth marketers, and social-graph analysts, those packs are some of the most signal-dense audience lists on any platform.

The problem: the Bluesky web UI shows a pack's members one scroll at a time. There is no CSV download, no endpoint labeled "give me all members", and no obvious way to turn a pack into a spreadsheet your CRM can ingest. Here is what it actually takes to get that data programmatically.

What is a Bluesky Starter Pack? 🔎

A Bluesky Starter Pack is a named, curator-published list of accounts, stored as a record in the AT Protocol — the open, federated protocol underpinning Bluesky. Any user can create one, and the platform surfaces popular packs on the onboarding screen and in discovery feeds. Each pack carries a title and optional description, a list of member DIDs (the protocol's decentralized account identifiers), the creator's handle (e.g. pfrazee.com), and a stable AT URI (at://did:plc:.../app.bsky.graph.starterpack/...) that identifies it across the federated network.

When a new user follows an entire pack, every member gains a follower at once. That viral follow-through is why the "43% of follows" stat exists — packs are follow-amplification engines, and the membership list is the core asset.

Does Bluesky have an API for starter packs?

Yes, but it does not expose keyword search or bulk export. The AT Protocol AppView offers three relevant lexicon methods:

  • app.bsky.graph.getStarterPack — fetch one pack by AT URI
  • app.bsky.graph.getActorStarterPacks — list all packs published by a creator handle or DID
  • app.bsky.graph.getList — retrieve member profiles from the list embedded in a pack

What you cannot do: search packs by keyword. The lexicon defines app.bsky.graph.searchStarterPacks but the public AppView returns XRPCNotSupported (HTTP 404). Discovery by plain-text query is not available on the public tier — to find packs in a topic area, identify a prominent curator and enumerate their packs via creatorHandle.

What the data looks like

Every member row comes back flat and typed — pack metadata is denormalized onto each row so a single CSV export is self-contained. The output shape from src/models.py:

{
  "pack_uri": "at://did:plc:abc123/app.bsky.graph.starterpack/xyz789",
  "pack_name": "AI Researchers on Bluesky",
  "pack_description": "Curated list of ML/AI researchers who migrated from Twitter.",
  "pack_creator_handle": "alice.bsky.social",
  "member_did": "did:plc:def456",
  "member_handle": "bob.bsky.social",
  "member_display_name": "Bob Smith",
  "member_followers_count": 1204,
  "member_following_count": 380,
  "member_posts_count": 841,
  "member_indexed_at": "2024-11-14T09:22:01.000Z",
  "scraped_at": "2026-05-16T12:00:00.000Z"
}
Enter fullscreen mode Exit fullscreen mode

Twelve fields per row. pack_uri, pack_name, pack_creator_handle, member_did, member_handle, and scraped_at are always present. The count fields, member_display_name, and member_indexed_at are nullable — if the API omits them for a profile, the Actor emits the row with those fields set to null rather than dropping it.

The naive approach (and why it falls apart)

The AT Protocol is designed for open access, which is genuinely unusual. The https://public.api.bsky.app/xrpc/ base URL needs no session token. A first response is minutes away. The complexity lives elsewhere.

Cursor pagination across chained endpoints. Getting all members is not a single call. You call getStarterPack to learn the embedded list AT URI, then getList with that URI, which returns a page of members plus a cursor. You loop until the response omits the cursor — for large packs, a dozen sequential calls that all need to succeed and reassemble in order. We thread that loop cleanly, applying the maxMembersPerPack cap as a client-side guard to bound run cost.

Nested record structure. The starterPack view object puts uri and creator at the top level, but name and description live inside a nested record sub-object. A naive parser reading pack["name"] returns None every time; the correct path is pack["record"]["name"]. We pin the parser against this shape and validate output with Pydantic before it reaches your dataset.

Retry discipline on 429/503. The AT Protocol does not publish rate limits, but it enforces them. On 429/503 we retry with exponential backoff — base 2 seconds, doubling, capped at 30 seconds, up to 5 attempts — honouring Retry-After when present. We rotate the curl-cffi browser fingerprint (Chrome / Firefox / Safari TLS profiles) so the handshake looks like a real browser, not a Python script.

Pydantic-validated rows or nothing. Every row passes through ResultRow.model_validate(...) before Actor.push_data(...) writes it. If the API contract drifts and a required field disappears, the Actor fails loud with a clear status message rather than emitting garbage. No data, no charge.

None of this is insurmountable to build yourself. All of it is what you skip by using the Actor.

The Actor

The packaged result is on the Apify Store: Bluesky Starter Pack Scraper.

Two modes — single pack by URI or URL, or bulk export of every pack owned by a creator handle.

Run it via the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

# Single-pack mode — paste a bsky.app URL directly; the Actor normalizes it
run = client.actor("DevilScrapes/bluesky-starter-pack").call(
    run_input={
        "starterPackUri": "https://bsky.app/starter-pack/pfrazee.com/3l2stmy4ote2b",
        "maxMembersPerPack": 500,
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["member_handle"], item["member_followers_count"])
Enter fullscreen mode Exit fullscreen mode

Or in creator mode — export every pack a curator has published:

run = client.actor("DevilScrapes/bluesky-starter-pack").call(
    run_input={
        "creatorHandle": "pfrazee.com",
        "maxPacks": 10,
        "maxMembersPerPack": 200,
    }
)
Enter fullscreen mode Exit fullscreen mode

The starterPackUri field accepts both AT URIs and Bluesky web URLs; the Actor normalizes web URLs to AT URI form before the first network call — no manual conversion required.

Use cases

Five concrete scenarios where this data earns its keep:

Academic community-topology research. Export a pack, load it into NetworkX, and run community detection. The member_followers_count and member_following_count fields give you edge weights without a second API call.

B2B growth marketing. Find packs in your industry (e.g. "ML Researchers on Bluesky", "London Founders") via their curator handles, export membership, filter by member_followers_count >= 500, and you have a list of niche-verified, high-signal accounts — warm context for relevant replies, not cold outreach against scraped emails.

Competitive intelligence. Track which accounts influential curators are recommending. A startup appearing in three separate "AI tools" packs in one week is a signal worth noticing. Schedule a weekly run and diff the membership list.

Social graph analysis. Compare follower distributions and post activity across a curated peer group. Export "AI Researchers" and "AI Founders" packs and measure whether the founder community posts more or follows more — one flat CSV, no joins.

OSINT and journalism. Document community formation around an event. A pack published during breaking news captures a snapshot of who organized around it — a record the AT Protocol's mutable graph will not preserve indefinitely.

Pricing — exact numbers 💰

Pay-per-event. You pay for rows written to the dataset, not for idle compute or pages fetched. No data, no charge (beyond the small start fee).

Event Price
Actor start (once per run) $0.05
Member row emitted $0.002
Members scraped Total cost
100 members $0.25
500 members $1.05
1,000 members $2.05
5,000 members (max per pack) $10.05

Apify's $5 free trial credit covers your first ~2,470 member rows — enough to export several typical starter packs — with no credit card required.

The technically interesting part

The AT Protocol's URL-normalization step is worth understanding if you build anything else on it.

A Bluesky web URL looks like https://bsky.app/starter-pack/alice.bsky.social/abc123rkey. The canonical form is at://alice.bsky.social/app.bsky.graph.starterpack/abc123rkey. The Actor's ActorInput model handles this in a @field_validator that fires before any network call: strip the https://bsky.app/starter-pack/ prefix, split on / for handle and rkey, reassemble the AT URI. Pass a malformed URL — missing rkey, extra segments — and Pydantic raises a ValidationError before the run bills you for the start event.

The broader point: AT Protocol tools must be careful about this translation layer. The web UI uses handles; the protocol uses DIDs. Denormalizing pack_creator_handle (human-readable) onto every row while storing member_did (machine-canonical) is deliberate — the handle can change; the DID cannot.

Limitations (the honest list)

  • No keyword search across packs. Discovery is by creator handle, not by topic keyword. The app.bsky.graph.searchStarterPacks lexicon method returns XRPCNotSupported on the public AppView.
  • Public packs only. Private or invite-only packs are not visible to the unauthenticated API.
  • Current membership only. The AT Protocol does not expose member join or leave history. This Actor captures a snapshot, not a timeline.
  • No post content. Member posts are out of scope. Use the companion Bluesky Feed Posts Actor for post data.
  • Member cap per pack. maxMembersPerPack defaults to 500 (max 5,000). At 5,000 members the per-run cost is $10.05.
  • 7-day dataset retention on Apify's free plan. Export your results or use a named dataset if you need longer retention.

FAQ

Is scraping Bluesky starter packs compliant with their Terms of Service?
Yes. The AT Protocol is an open protocol designed for interoperability and data portability. The public AppView (public.api.bsky.app) is the official mechanism for unauthenticated access. Bluesky has publicly proposed a scraping standard for AI training datasets (Slashdot, 2025). This Actor reads only what the public API exposes.

Is there an official Bluesky API I could use instead?
The three lexicon methods the Actor uses (getStarterPack, getActorStarterPacks, getList) are the official public API. The Actor saves you the pagination loop, URL normalization, retry handling, and Pydantic validation — not a workaround around a locked endpoint.

Can I export the data to Google Sheets or a data warehouse?
Yes. Export CSV, JSON, or Excel from the Apify Console, or webhook the dataset on ACTOR.RUN.SUCCEEDED into Make, Zapier, or n8n. The flat row schema loads into any spreadsheet or SQL table without a join.

Does this export a user's full follower list, or just pack members?
Pack members only. The Actor scopes to the membership of a specific pack (or all packs by a creator). It does not enumerate the general follower graph of any account.

Try it

The Actor is on the Apify Store: apify.com/DevilScrapes/bluesky-starter-pack.

Free $5 trial credit, no credit card. Paste a bsky.app URL, get a flat JSON dataset of every member in under a minute. Researching a niche, mapping a community, or building an outreach workflow? Start there.

If there is a field or an AT Protocol endpoint you wish this Actor covered, drop it in the comments.


Built by Devil Scrapes — Apify Actors for the data-hungry. Pay-per-event, honest pricing, typed rows. 😈

Top comments (0)