DEV Community

FairPrice
FairPrice

Posted on • Originally published at apify.com

How to Scrape Bluesky Posts and Profiles at Scale (No API Key Needed)

Why Scrape Bluesky?

Bluesky has grown to over 30 million users, built on the open AT Protocol. Unlike X/Twitter which has locked down its API behind expensive tiers, Bluesky's public API is completely free to use — no API key, no OAuth, no rate limit headaches.

This makes it a goldmine for:

  • Social media monitoring — track brand mentions and sentiment
  • Lead generation — find people discussing topics relevant to your business
  • Market research — analyze conversations and trending topics
  • Academic research — study public discourse on decentralized social media

In this tutorial, I'll show you how to scrape Bluesky posts and profiles at scale using an Apify actor I built.

The Easy Way: Use the Apify Actor

I published a free Bluesky Scraper on Apify that handles everything — pagination, data extraction, and export. Here's how to use it.

Step 1: Set Up Your Input

The actor accepts a simple JSON input:

{
    "searchQuery": "artificial intelligence",
    "handles": ["jay.bsky.team", "bsky.app"],
    "maxItems": 200,
    "scrapeType": "both"
}
Enter fullscreen mode Exit fullscreen mode

Parameters:

  • searchQuery — keyword to search posts or profiles for
  • handles — specific Bluesky handles to scrape
  • maxItems — maximum results (up to 10,000)
  • scrapeTypeposts, profiles, or both

Step 2: Run It

You can run it from the Apify Console UI, or programmatically:

JavaScript:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('cryptosignals/bluesky-scraper').call({
    searchQuery: 'startup funding',
    maxItems: 500,
    scrapeType: 'posts'
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} posts`);
Enter fullscreen mode Exit fullscreen mode

Python:

from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/bluesky-scraper').call(run_input={
    'searchQuery': 'startup funding',
    'maxItems': 500,
    'scrapeType': 'posts'
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(f'Got {len(items)} posts')
Enter fullscreen mode Exit fullscreen mode

Step 3: Export Your Data

Every run outputs structured data you can export as JSON, CSV, or Excel directly from Apify.

Post data includes:

  • Author handle, display name, and DID
  • Full post text
  • Engagement metrics: likes, reposts, replies, quotes, bookmarks
  • Timestamps (created and indexed)
  • Direct URL to the post
  • Embedded images and external links
  • Language detection

Profile data includes:

  • Handle, display name, bio
  • Follower/following/post counts
  • Avatar and banner URLs
  • Account creation date

The DIY Way: AT Protocol API Basics

If you want to build your own scraper, here are the key endpoints. The AT Protocol public API at https://public.api.bsky.app/xrpc requires no authentication:

Search Posts

curl "https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=AI&limit=25"
Enter fullscreen mode Exit fullscreen mode

Get User Profile

curl "https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=jay.bsky.team"
Enter fullscreen mode Exit fullscreen mode

Get Author Feed

curl "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=bsky.app&limit=50"
Enter fullscreen mode Exit fullscreen mode

Search Users

curl "https://public.api.bsky.app/xrpc/app.bsky.actor.searchActors?q=developer&limit=25"
Enter fullscreen mode Exit fullscreen mode

All endpoints return JSON and support cursor-based pagination. The cursor field in the response tells you how to fetch the next page.

Tips for Scraping at Scale

  1. Respect rate limits. The public API is generous but not unlimited. Add small delays between requests if scraping thousands of items.

  2. Use pagination cursors. Don't try to get everything in one request. Loop through pages using the cursor.

  3. Handle errors gracefully. The API occasionally returns 429 (rate limit) or 502 (server error). Implement exponential backoff.

  4. Filter by language. Posts include a langs field — use it to focus on content in your target language.

  5. Store data incrementally. Push results to your dataset as you go rather than collecting everything in memory.

Wrapping Up

Bluesky's open AT Protocol makes it one of the easiest social networks to scrape legally and ethically. Whether you use the Apify actor for a no-code solution or build your own scraper with the public API, you can extract valuable social data in minutes.

If you found this useful, give the actor a star on Apify — it helps others discover it.


Have questions about scraping Bluesky or the AT Protocol? Drop a comment below.

Top comments (0)